What is significance?

This research emphasizes the importance of understanding the methodological choices underlying SEM. The variability in RMSEA and CFI underscores the need for researchers to apply fit indices thoughtfully and consider the implications of their chosen estimation method. Meanwhile, the robustness of SRMR provides a stable option for evaluating fit, offering a practical solution when interpreting SEM results.

What are future directions?

Future studies could expand on these findings by examining additional estimation methods and applying them to more complex models. There is also potential to explore how these indices perform across different sample sizes or under varying levels of data quality. This work could guide the development of improved fit indices that are less sensitive to estimation techniques.

The findings by Shi and Maydeu-Olivares contribute to a deeper understanding of SEM fit indices and their dependence on estimation methods. By shedding light on these dynamics, their research supports more informed decision-making in SEM analysis and highlights the importance of methodological transparency in research.

Shi, D., & Maydeu-Olivares, A. (2020). The Effect of Estimation Methods on SEM Fit Indices. Educational and Psychological Measurement, 80(3), 421-445. https://doi.org/10.1177/0013164419885164

Estimation Methods and SEM Fit Indices

Published: June 2, 2020 · Last reviewed: May 7, 2026

📖1,535 words⏱6 min read📚4 references cited

Structural equation modeling (SEM) reports its goodness of fit through a small set of indices that have, by convention, hardened into thresholds. The Hu and Bentler (1999) cutoffs — CFI ≥ 0.95, TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08 — are now the de facto standard against which SEM models are judged in psychology and education journals. The implicit assumption is that these cutoffs apply uniformly: a CFI of 0.94 is borderline regardless of how the model was estimated, and a model that fails the threshold under one estimator should also fail it under another. Shi and Maydeu-Olivares (2020), in Educational and Psychological Measurement, demonstrate that this assumption is wrong, with consequences for how SEM results should be interpreted and reported.

Why the estimator matters at all

SEM fits a parameterized covariance matrix to an observed data covariance matrix by minimizing some discrepancy function between the two. The standard estimators differ in what discrepancy they minimize. Maximum likelihood (ML) minimizes a function derived from the multivariate normal log-likelihood, which is appropriate when data are continuous and approximately normally distributed. Unweighted least squares (ULS) minimizes the sum of squared differences between observed and model-implied covariances, treating every covariance equally. Diagonally weighted least squares (DWLS) — implemented as WLSMV in Mplus and as the default robust estimator for categorical SEM in lavaan — weights each covariance by its asymptotic variance, which is appropriate for ordered-categorical or non-normal data.

The fit indices are computed from the same fitted model, but the values they return depend on the discrepancy function the estimator minimized. RMSEA is built around the chi-square statistic associated with the fitted model; CFI compares this chi-square to a baseline-model chi-square; SRMR is computed directly from the standardized residual covariances. When the estimator changes, the chi-square changes, the residual covariances may change, and the indices follow.

This is not a subtle effect. Beauducel and Herzberg (2006), looking at ordered-categorical SEM, showed that ML estimation applied to ordinal data produces systematically inflated chi-square values relative to the WLSMV alternative, leading to RMSEA values that look like poor fit when the underlying model is correctly specified. Their finding was a major step in establishing WLSMV as the recommended estimator for categorical SEM, but it also flagged the more general issue that fit-index values are not estimator-invariant.

What Shi and Maydeu-Olivares (2020) found

Shi and Maydeu-Olivares ran simulation studies that crossed estimator (ML, ULS, DWLS) with several types of model misspecification (incorrect dimensionality, omitted cross-loadings, ignored residual correlations) at various sample sizes. They tracked how RMSEA, CFI, and SRMR responded to each misspecification under each estimator.

The headline result: RMSEA and CFI behaved differently across estimators. The same true model and the same true misspecification produced different RMSEA and CFI values depending on whether ML, ULS, or DWLS was the estimator. The difference was large enough to change the verdict under the conventional Hu-Bentler thresholds: a model that passed CFI ≥ 0.95 under one estimator could fail under another, with the underlying data and the underlying misspecification unchanged. The implication is that “the model fits” or “the model does not fit” depends on the estimator, not just on the model and the data.

SRMR, by contrast, was substantially less sensitive to estimator choice. Because SRMR is computed from the standardized residual covariances directly, without going through the chi-square machinery that the other indices depend on, it is more nearly estimator-invariant. Shi and Maydeu-Olivares argue that SRMR therefore deserves more interpretive weight than the conventions allow, and that RMSEA and CFI should be interpreted with the estimator explicitly in mind.

The misspecification analysis sharpened the picture further. RMSEA and CFI were sensitive to misspecification type as well as estimator: omitted cross-loadings produced different fit-index responses than ignored residual correlations, and the patterns of response interacted with the estimator. The cleanest practical reading is that the Hu-Bentler thresholds are derived under specific conditions (ML estimation, continuous indicators, particular misspecification patterns) and do not transfer reliably to other estimator-data-misspecification combinations.

What this means for SEM practice

The actionable recommendations distill to:

Report the estimator explicitly. “We fit the model in lavaan with ML estimation” is the minimum disclosure; “We used WLSMV with robust standard errors and a robust scaled chi-square” is more useful. The reader needs to know which estimator was used to interpret the fit indices.
Treat the Hu-Bentler thresholds as estimator-specific guidance, not universal cutoffs. The original Hu and Bentler (1999) simulation used ML estimation on continuous data with specific misspecification patterns. Using their thresholds for DWLS or WLSMV on categorical data extrapolates beyond the original derivation.
Weight SRMR more heavily. Shi and Maydeu-Olivares (2020) and several preceding studies have shown SRMR to be more robust to estimator choice than RMSEA or CFI. SRMR-based assessment is more comparable across studies that used different estimators.
For ordered-categorical data, prefer WLSMV (DWLS) over ML. Beauducel and Herzberg (2006) established that ML on categorical data inflates the chi-square; the inflation distorts every chi-square-derived fit index. WLSMV is the standard alternative.
Run a sensitivity analysis when feasible. Refitting the model under an alternative estimator and reporting how the fit indices change provides direct evidence about the estimator-dependence of the conclusion. If the conclusion is robust across estimators, the model fit is more credible.

The deeper lesson is that fit-index reporting is one of the points where automated SEM software produces a single answer per fit and the user is encouraged to read it as definitive. The fit-index value is conditional on the estimator, the data structure, and the misspecification pattern; treating it as an unconditional verdict on model adequacy ignores all three conditioning structures.

Where this connects to broader latent-variable methodology

The estimator-fit-index interaction is one of several places where SEM practice has to be more careful than the standard reporting templates assume. The choice of fit-index cutoffs (Hu and Bentler conventional vs alternative thresholds), the treatment of factor-retention decisions in EFA that feed into the SEM specification, the handling of rotation indeterminacy in multidimensional models, and the treatment of item-distribution constraints in reliability all interact with the estimator choice. Each of these is documented in the methodological literature; each is ignored or mishandled in routine empirical practice.

The unifying recommendation is sensitivity analysis. SEM results that survive variation across reasonable methodological choices are more credible than results that exist only under one specific configuration of estimator, prior, fit-index threshold, and rotation criterion. The cost of the sensitivity analysis is mostly computational; the cost of failing to do it is publishing results that may not replicate when another team chooses a different default.

For high-stakes applications — clinical assessment validation, educational measurement, policy-relevant social science — the sensitivity standard should be the operating norm rather than a methodological extra. The Shi-Maydeu findings make a specific case for sensitivity to the estimator; they fit into a broader pattern of methodological decisions whose effects are non-trivial and whose proper handling is to disclose, vary, and report.

Frequently Asked Questions

Why is RMSEA sensitive to the estimator?

RMSEA is derived from the chi-square statistic of the fitted model. Different estimators (ML, ULS, DWLS) minimize different discrepancy functions and produce different chi-square values for the same model and data. The chi-square dependence carries through to RMSEA. CFI is similarly affected because it normalizes the model chi-square against a baseline-model chi-square.

Why is SRMR more robust?

SRMR is computed directly from the standardized residual covariances — the difference between observed and model-implied covariances after fitting. It does not go through the chi-square machinery and therefore does not inherit chi-square’s estimator dependence. It still depends on the estimator (different estimators produce different fitted parameters), but the dependence is much weaker.

Should I always use WLSMV for categorical data?

For ordinal data with five or fewer response categories, WLSMV (DWLS in lavaan, WLSMV in Mplus) is the methodological default. It treats the underlying continuous trait correctly and avoids the chi-square inflation that ML produces on categorical data. For continuous data with approximately normal distributions, ML is appropriate and slightly more efficient.

Are the Hu-Bentler cutoffs still useful?

As guidance for ML estimation on continuous data with the misspecification patterns Hu and Bentler studied, yes. As universal thresholds applicable across estimators and data types, no. They are conventionally cited as universal but the original derivation does not support that usage. Treat them as defaults that may not apply to your specific configuration.

How should I report SEM fit?

Report the estimator explicitly, all major fit indices (chi-square, RMSEA with confidence interval, CFI, TLI, SRMR), the sample size, and any robustness corrections (robust standard errors, scaled chi-square). When the estimator is non-default for the data type, justify the choice. When fit indices disagree across estimators in a sensitivity analysis, report the disagreement rather than picking the most favorable answer.

References

Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. https://doi.org/10.1207/s15328007sem1302_2
Browne, M. W., & Cudeck, R. (1992). Alternative ways of assessing model fit. Sociological Methods & Research, 21(2), 230–258. https://doi.org/10.1177/0049124192021002005
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
Shi, D., & Maydeu-Olivares, A. (2020). The effect of estimation methods on SEM fit indices. Educational and Psychological Measurement, 80(3), 421–445. https://doi.org/10.1177/0013164419885164

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Statistical Methods and Data Analysis

Attenuation-Corrected Reliability Estimators

Most psychometrics textbooks teach the classical "correction for attenuation" — Spearman's century-old technique for estimating what the correlation between two psychological constructs would be if…

Nov 1, 2022

Psychological Measurement and Testing

Continuous Norming for Cognitive Tests

The standard practice in psychometric test publication is to develop norm tables by stratifying the standardization sample into age bands and computing percentile-rank tables within…

Apr 14, 2021

Statistical Methods and Data Analysis

Missing Data Methods in Educational Testing

Missing data is the rule, not the exception, in educational testing. Examinees skip items they don't know, run out of time on long tests, encounter…

Oct 10, 2020

Statistical Methods and Data Analysis

Item Distributions and Cronbach's Alpha

Cronbach's coefficient alpha is the most widely reported reliability statistic in psychology, education, and most other social sciences. Open almost any quantitative paper involving a…

Oct 2, 2020

Statistical Methods and Data Analysis

Factor Retention in Exploratory Factor Analysis

Choosing the number of factors to retain in an exploratory factor analysis is the methodological decision that most determines what the analysis reports. Retain too…

Aug 6, 2020

Estimation Methods and SEM Fit Indices

Why the estimator matters at all

What Shi and Maydeu-Olivares (2020) found

What this means for SEM practice

Where this connects to broader latent-variable methodology