Statistical Methods and Data Analysis

Estimation Methods and SEM Fit Indices

Examining the Effect of Estimation Methods on SEM Fit Indices
Published: June 2, 2020 · Last reviewed:
📖1,535 words6 min read📚4 references cited

Structural equation modeling (SEM) reports its goodness of fit through a small set of indices that have, by convention, hardened into thresholds. The Hu and Bentler (1999) cutoffs — CFI ≥ 0.95, TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08 — are now the de facto standard against which SEM models are judged in psychology and education journals. The implicit assumption is that these cutoffs apply uniformly: a CFI of 0.94 is borderline regardless of how the model was estimated, and a model that fails the threshold under one estimator should also fail it under another. Shi and Maydeu-Olivares (2020), in Educational and Psychological Measurement, demonstrate that this assumption is wrong, with consequences for how SEM results should be interpreted and reported.

Why the estimator matters at all

SEM fits a parameterized covariance matrix to an observed data covariance matrix by minimizing some discrepancy function between the two. The standard estimators differ in what discrepancy they minimize. Maximum likelihood (ML) minimizes a function derived from the multivariate normal log-likelihood, which is appropriate when data are continuous and approximately normally distributed. Unweighted least squares (ULS) minimizes the sum of squared differences between observed and model-implied covariances, treating every covariance equally. Diagonally weighted least squares (DWLS) — implemented as WLSMV in Mplus and as the default robust estimator for categorical SEM in lavaan — weights each covariance by its asymptotic variance, which is appropriate for ordered-categorical or non-normal data.

The fit indices are computed from the same fitted model, but the values they return depend on the discrepancy function the estimator minimized. RMSEA is built around the chi-square statistic associated with the fitted model; CFI compares this chi-square to a baseline-model chi-square; SRMR is computed directly from the standardized residual covariances. When the estimator changes, the chi-square changes, the residual covariances may change, and the indices follow.

This is not a subtle effect. Beauducel and Herzberg (2006), looking at ordered-categorical SEM, showed that ML estimation applied to ordinal data produces systematically inflated chi-square values relative to the WLSMV alternative, leading to RMSEA values that look like poor fit when the underlying model is correctly specified. Their finding was a major step in establishing WLSMV as the recommended estimator for categorical SEM, but it also flagged the more general issue that fit-index values are not estimator-invariant.

What Shi and Maydeu-Olivares (2020) found

Shi and Maydeu-Olivares ran simulation studies that crossed estimator (ML, ULS, DWLS) with several types of model misspecification (incorrect dimensionality, omitted cross-loadings, ignored residual correlations) at various sample sizes. They tracked how RMSEA, CFI, and SRMR responded to each misspecification under each estimator.

The headline result: RMSEA and CFI behaved differently across estimators. The same true model and the same true misspecification produced different RMSEA and CFI values depending on whether ML, ULS, or DWLS was the estimator. The difference was large enough to change the verdict under the conventional Hu-Bentler thresholds: a model that passed CFI ≥ 0.95 under one estimator could fail under another, with the underlying data and the underlying misspecification unchanged. The implication is that “the model fits” or “the model does not fit” depends on the estimator, not just on the model and the data.

SRMR, by contrast, was substantially less sensitive to estimator choice. Because SRMR is computed from the standardized residual covariances directly, without going through the chi-square machinery that the other indices depend on, it is more nearly estimator-invariant. Shi and Maydeu-Olivares argue that SRMR therefore deserves more interpretive weight than the conventions allow, and that RMSEA and CFI should be interpreted with the estimator explicitly in mind.

The misspecification analysis sharpened the picture further. RMSEA and CFI were sensitive to misspecification type as well as estimator: omitted cross-loadings produced different fit-index responses than ignored residual correlations, and the patterns of response interacted with the estimator. The cleanest practical reading is that the Hu-Bentler thresholds are derived under specific conditions (ML estimation, continuous indicators, particular misspecification patterns) and do not transfer reliably to other estimator-data-misspecification combinations.

What this means for SEM practice

The actionable recommendations distill to:

  • Report the estimator explicitly. “We fit the model in lavaan with ML estimation” is the minimum disclosure; “We used WLSMV with robust standard errors and a robust scaled chi-square” is more useful. The reader needs to know which estimator was used to interpret the fit indices.
  • Treat the Hu-Bentler thresholds as estimator-specific guidance, not universal cutoffs. The original Hu and Bentler (1999) simulation used ML estimation on continuous data with specific misspecification patterns. Using their thresholds for DWLS or WLSMV on categorical data extrapolates beyond the original derivation.
  • Weight SRMR more heavily. Shi and Maydeu-Olivares (2020) and several preceding studies have shown SRMR to be more robust to estimator choice than RMSEA or CFI. SRMR-based assessment is more comparable across studies that used different estimators.
  • For ordered-categorical data, prefer WLSMV (DWLS) over ML. Beauducel and Herzberg (2006) established that ML on categorical data inflates the chi-square; the inflation distorts every chi-square-derived fit index. WLSMV is the standard alternative.
  • Run a sensitivity analysis when feasible. Refitting the model under an alternative estimator and reporting how the fit indices change provides direct evidence about the estimator-dependence of the conclusion. If the conclusion is robust across estimators, the model fit is more credible.

The deeper lesson is that fit-index reporting is one of the points where automated SEM software produces a single answer per fit and the user is encouraged to read it as definitive. The fit-index value is conditional on the estimator, the data structure, and the misspecification pattern; treating it as an unconditional verdict on model adequacy ignores all three conditioning structures.

Where this connects to broader latent-variable methodology

The estimator-fit-index interaction is one of several places where SEM practice has to be more careful than the standard reporting templates assume. The choice of fit-index cutoffs (Hu and Bentler conventional vs alternative thresholds), the treatment of factor-retention decisions in EFA that feed into the SEM specification, the handling of rotation indeterminacy in multidimensional models, and the treatment of item-distribution constraints in reliability all interact with the estimator choice. Each of these is documented in the methodological literature; each is ignored or mishandled in routine empirical practice.

The unifying recommendation is sensitivity analysis. SEM results that survive variation across reasonable methodological choices are more credible than results that exist only under one specific configuration of estimator, prior, fit-index threshold, and rotation criterion. The cost of the sensitivity analysis is mostly computational; the cost of failing to do it is publishing results that may not replicate when another team chooses a different default.

For high-stakes applications — clinical assessment validation, educational measurement, policy-relevant social science — the sensitivity standard should be the operating norm rather than a methodological extra. The Shi-Maydeu findings make a specific case for sensitivity to the estimator; they fit into a broader pattern of methodological decisions whose effects are non-trivial and whose proper handling is to disclose, vary, and report.

Frequently Asked Questions

Why is RMSEA sensitive to the estimator?

RMSEA is derived from the chi-square statistic of the fitted model. Different estimators (ML, ULS, DWLS) minimize different discrepancy functions and produce different chi-square values for the same model and data. The chi-square dependence carries through to RMSEA. CFI is similarly affected because it normalizes the model chi-square against a baseline-model chi-square.

Why is SRMR more robust?

SRMR is computed directly from the standardized residual covariances — the difference between observed and model-implied covariances after fitting. It does not go through the chi-square machinery and therefore does not inherit chi-square’s estimator dependence. It still depends on the estimator (different estimators produce different fitted parameters), but the dependence is much weaker.

Should I always use WLSMV for categorical data?

For ordinal data with five or fewer response categories, WLSMV (DWLS in lavaan, WLSMV in Mplus) is the methodological default. It treats the underlying continuous trait correctly and avoids the chi-square inflation that ML produces on categorical data. For continuous data with approximately normal distributions, ML is appropriate and slightly more efficient.

Are the Hu-Bentler cutoffs still useful?

As guidance for ML estimation on continuous data with the misspecification patterns Hu and Bentler studied, yes. As universal thresholds applicable across estimators and data types, no. They are conventionally cited as universal but the original derivation does not support that usage. Treat them as defaults that may not apply to your specific configuration.

How should I report SEM fit?

Report the estimator explicitly, all major fit indices (chi-square, RMSEA with confidence interval, CFI, TLI, SRMR), the sample size, and any robustness corrections (robust standard errors, scaled chi-square). When the estimator is non-default for the data type, justify the choice. When fit indices disagree across estimators in a sensitivity analysis, report the disagreement rather than picking the most favorable answer.

References

  • Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2
  • Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005
  • Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
  • Shi, D., & Maydeu-Olivares, A. (2020). The effect of estimation methods on SEM fit indices. Educational and Psychological Measurement, 80(3), 421–445. https://doi.org/10.1177/0013164419885164

Related Research

Statistical Methods and Data Analysis

Attenuation-Corrected Reliability Estimators

Most psychometrics textbooks teach the classical "correction for attenuation" — Spearman's century-old technique for estimating what the correlation between two psychological constructs would be if…

Nov 1, 2022
Psychological Measurement and Testing

Continuous Norming for Cognitive Tests

The standard practice in psychometric test publication is to develop norm tables by stratifying the standardization sample into age bands and computing percentile-rank tables within…

Apr 14, 2021
Statistical Methods and Data Analysis

Missing Data Methods in Educational Testing

Missing data is the rule, not the exception, in educational testing. Examinees skip items they don't know, run out of time on long tests, encounter…

Oct 10, 2020
Statistical Methods and Data Analysis

Item Distributions and Cronbach's Alpha

Cronbach's coefficient alpha is the most widely reported reliability statistic in psychology, education, and most other social sciences. Open almost any quantitative paper involving a…

Oct 2, 2020
Statistical Methods and Data Analysis

Factor Retention in Exploratory Factor Analysis

Choosing the number of factors to retain in an exploratory factor analysis is the methodological decision that most determines what the analysis reports. Retain too…

Aug 6, 2020

People Also Ask

What is group-theoretical symmetries in item response theory (irt)?

Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.

Read more →
What are refining reliability with attenuation-corrected estimators?

Jari Metsämuuronen’s (2022) article introduces a significant advancement in how reliability is estimated within psychological assessments. The study critiques traditional methods for their tendency to yield deflated results and proposes new attenuation-corrected estimators to address these limitations. This review examines the article’s contributions and its implications for improving measurement precision.

Read more →
How Continuous Norming Outperforms Conventional Methods?

Lenhard and Lenhard (2021) investigate how regression-based continuous norming can enhance the quality of norm scores in psychometric testing. Their study compares semiparametric continuous norming (SPCN) with conventional methods, evaluating performance across a wide range of simulated test conditions and sample sizes.

Read more →
What are assessing missing data handling methods in sparse educational datasets?

In educational assessments, missing data can distort ability estimation, affecting the accuracy of decisions based on test results. Xiao and Bulut addressed this issue by comparing the performances of full-information maximum likelihood (FIML), zero replacement, and multiple imputations using classification and regression trees (MICE-CART) or random forest imputation (MICE-RFI). The simulations assessed each method under varying proportions of missing data and numbers of test items.

Read more →
Why is background important?

SEM is widely used in psychology and education to test relationships between variables. Fit indices such as RMSEA, CFI, and SRMR play a crucial role in evaluating how well a model represents the observed data. While these indices are widely accepted, their sensitivity to estimation methods remains a topic of ongoing research. Shi and Maydeu-Olivares provide a comprehensive examination of this issue, addressing how estimation techniques affect model evaluation and interpretation.

How does key insights work in practice?

RMSEA and CFI Sensitivity: The study reveals that RMSEA and CFI are significantly influenced by the choice of estimation method. Different cutoff values may be needed depending on whether ML, ULS, or DWLS is applied. SRMR Robustness: Unlike RMSEA and CFI, the SRMR is less affected by estimation methods, making it

📋 Cite This Article

Jouve, X. (2020, June 2). Estimation Methods and SEM Fit Indices. PsychoLogic. https://www.psychologic.online/sem-estimation-methods-fit-indices/

Leave a Reply