What is significance?

This study provides researchers with a nuanced understanding of statistical tools for EFA. By highlighting the conditions under which fit index difference values outperform parallel analysis, Finch’s findings help refine methodological choices in social sciences research. Improved factor retention decisions can lead to more accurate interpretations of data, ultimately enhancing the quality and validity of findings.

What are future directions?

Further research could expand on Finch’s work by exploring how fit index difference values perform across more diverse datasets and varying levels of factor complexity. Additionally, developing guidelines for when to prioritize this approach over parallel analysis could improve its practical application in research settings.

Finch’s study offers valuable contributions to the ongoing discussion about factor retention in exploratory factor analysis. By demonstrating the strengths of fit index difference values under specific conditions, the research supports more informed decision-making in statistical analyses. This work underscores the importance of tailoring methodological choices to the unique characteristics of each dataset.

Finch, W. H. (2020). Using Fit Statistic Differences to Determine the Optimal Number of Factors to Retain in an Exploratory Factor Analysis. Educational and Psychological Measurement, 80(2), 217-241. https://doi.org/10.1177/0013164419865769

Factor Retention in Exploratory Factor Analysis

Published: August 6, 2020 · Last reviewed: May 7, 2026

📖1,779 words⏱7 min read📚5 references cited

Choosing the number of factors to retain in an exploratory factor analysis is the methodological decision that most determines what the analysis reports. Retain too few and the model collapses real distinctions between constructs into a single muddled dimension; retain too many and the model dignifies measurement noise as substantive structure. The decision is unavoidable, the methods for making it are several, and decades of comparative research have produced no single approach that dominates across all conditions. Finch (2020), in Educational and Psychological Measurement, evaluates fit-index difference values against the field-standard parallel analysis under a wide range of simulation conditions and reports a result that sharpens which method to reach for in which case.

The factor-retention problem

EFA represents observed item correlations as the product of latent factor loadings plus residual variance. The number of factors is a free parameter; the analyst supplies it, and the analysis returns the loadings conditional on that choice. The job of a factor-retention method is to estimate a defensible number of factors from the data themselves rather than from the analyst’s prior expectations. The methods fall into three rough families: methods based on eigenvalues of the correlation matrix (Kaiser’s “eigenvalues > 1” rule, Cattell’s scree test), methods based on simulation (Horn’s parallel analysis, Velicer’s minimum average partial correlation), and methods based on goodness-of-fit comparison across models with different numbers of factors (the fit-index family that Finch focuses on).

The Kaiser rule has been thoroughly discredited by simulation work; it consistently overestimates the number of factors and is now treated as a baseline rather than a recommendation. Cattell’s scree test is visual and depends on subjective judgment, which makes it hard to teach and impossible to automate. Parallel analysis (Horn, 1965; Glorfeld, 1995) and Velicer’s (1976) minimum average partial (MAP) test are the field-standard simulation-based methods. The fit-index family is newer in this application and has not been comprehensively benchmarked until recently.

What parallel analysis does and where it succeeds

Parallel analysis compares the observed eigenvalues of the data correlation matrix to the eigenvalues that would be obtained from random data with the same number of items and respondents but no factor structure. The number of factors retained is the number of observed eigenvalues that exceed the corresponding random-data eigenvalues. The intuition is that a factor worth retaining must explain more variance than would be expected from chance.

Horn’s (1965) original formulation used the mean of simulated eigenvalues; Glorfeld (1995) showed that the 95th percentile is more conservative and reduces the over-extraction tendency that the mean version inherits. Modern implementations in the R psych package and in similar tools default to the Glorfeld 95th-percentile variant. Parallel analysis works well under continuous indicators, normally distributed data, and moderate-to-high factor loadings. It struggles when item distributions are categorical or skewed, when factor loadings are small, or when the underlying structure has highly correlated factors that the random-data baseline doesn’t capture.

Velicer’s (1976) MAP test takes a different approach: it computes the average squared partial correlation between items after sequentially extracting components, and retains as many components as minimize this quantity. MAP is less prone to over-extraction than parallel analysis but can under-extract in the presence of weak factors. Both methods are widely cited as defaults in introductory texts, with parallel analysis usually preferred for its simulation-based interpretability.

The fit-index difference approach

Confirmatory factor analysis (CFA) and structural equation modeling (SEM) routinely use fit indices — the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR) — to evaluate whether a specified factor model fits the data adequately. Hu and Bentler (1999) supplied the cutoff conventions that the field has been arguing about ever since: CFI ≥ 0.95, TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08 for adequate fit.

The fit-index difference approach to EFA factor retention treats the EFA as a series of CFA-like models with increasing numbers of factors and computes the change in each fit index as factors are added. The number of factors retained is the smallest number where adding another factor produces a negligible improvement in fit. The intuition is similar to the scree test but applied to fit indices rather than to eigenvalues, and it is automatable rather than dependent on visual judgment.

Finch (2020) operationalized this by setting threshold differences in CFI, TLI, RMSEA, and SRMR — for instance, the model is preferred over the smaller-factor alternative if CFI improves by more than 0.01 — and tested the approach against parallel analysis across a factorial simulation that crossed sample size, factor loadings, number of factors, and item type (continuous vs categorical).

What Finch (2020) found

Two practical conclusions emerged. First, fit-index difference values outperformed parallel analysis for categorical indicators across most conditions. Categorical data violate the continuity assumptions that parallel analysis implicitly relies on, and the random-data baseline is less informative when item responses are bounded by ordinal scales with limited values. Fit-index methods, when computed with appropriate estimation for categorical data (WLSMV or robust MLR), handle this case more gracefully.

Second, fit-index difference values outperformed parallel analysis when factor loadings were low, even with normally distributed indicators. Weak factors are the case where parallel analysis’s random-data baseline is closest to the real eigenvalues, so the discrimination between signal and noise becomes unreliable. Fit-index methods, by contrast, accumulate small improvements in fit across multiple indicators and can detect weak factors that parallel analysis misses.

For the bread-and-butter case — continuous indicators with moderate-to-high factor loadings — parallel analysis remained competitive with fit-index methods, and either approach was defensible. The advantage of fit-index methods was specific to the harder cases (categorical or weak loadings), not a general superiority that warrants displacing parallel analysis as the field’s default.

Practical implications for analysts

The methodological lesson is that no single factor-retention method is universally optimal, and analysts should choose based on the characteristics of their data:

Continuous indicators with moderate-to-high loadings: parallel analysis (Glorfeld 95th-percentile variant) is reliable and well-established.
Categorical or ordinal indicators: fit-index difference values with appropriate estimation (WLSMV) are more reliable than parallel analysis, which assumes continuity.
Weak factor loadings (under ~0.4): fit-index methods detect factors that parallel analysis misses; the cost is a higher false-positive rate that needs to be weighed against the substantive interpretability of the additional factor.
All cases: report the result of multiple methods, not a single one. When parallel analysis, MAP, and fit-index methods agree, the factor-retention decision is robust; when they disagree, the disagreement is the finding, and the analyst should report the alternatives and choose on substantive grounds.

The reproducibility-friendly practice is to specify the factor-retention method in advance, apply it mechanically, and report any post-hoc deviations explicitly. Choosing the method after seeing the result — running parallel analysis, getting two factors, running fit-index methods, getting three, reporting whichever supports the preferred narrative — is the analytic flexibility that gives EFA its bad reputation in confirmatory disciplines. The remedy is preregistration of the method, not abandonment of EFA.

Where this fits in the broader factor-analytic methodology

Factor retention is the first of several decisions that shape an EFA result. The choice of rotation criterion (oblimin, geomin, varimax) determines how the loadings are presented; choice of estimator (MLR, WLSMV, ULS) determines how robust the fit indices are to assumption violations; treatment of missing data, sample size, and item distributions all interact with the factor-retention decision in non-obvious ways. The Finch 2020 contribution sits at one corner of this multi-dimensional methodological space, and the lesson generalizes: the right factor-retention method is the one that matches the data’s properties, not the one that the analyst learned in graduate school.

The unifying theme across modern factor-analytic methodology is that automated software produces a single answer per dataset, and the user is encouraged to trust it. The right discipline is to run the analysis under multiple defensible specifications, examine where the results agree and where they diverge, and report the divergences as part of the substantive story. Factor retention is one of the easier methodological choices to subject to this multiple-method discipline; the cost of running parallel analysis, MAP, and fit-index methods on the same data is minutes, and the additional information about robustness is non-trivial.

Frequently Asked Questions

Why has Kaiser’s “eigenvalues greater than 1” rule been discredited?

It consistently overestimates the number of factors, sometimes by large margins. The rule was developed for a specific application (principal components on a particular type of correlation matrix) and does not generalize to factor analysis on real data. Modern simulation work has shown it is wrong substantially more often than it is right. It is now used only as a baseline against which better methods are compared.

Should I always run parallel analysis?

For continuous indicators with normally distributed items and moderate-to-high factor loadings, yes — it is a reliable default. For categorical or ordinal indicators, or for cases with weak loadings, parallel analysis can over- or under-extract, and fit-index difference values (Finch, 2020) are a defensible alternative.

What’s the difference between Horn’s parallel analysis and Glorfeld’s variant?

Horn (1965) compared observed eigenvalues to the mean of eigenvalues from simulated random data. Glorfeld (1995) replaced the mean with the 95th percentile, which is more conservative and reduces over-extraction. Modern implementations default to the Glorfeld variant; if the software calls it “parallel analysis”, check which percentile it uses.

How do I choose between EFA and CFA?

EFA is appropriate when the factor structure is unknown or contested; CFA is appropriate when a specific structure is hypothesized in advance. The decision is about the analyst’s prior knowledge, not about the data themselves. EFA findings should not be used to confirm a structure on the same data they were derived from; that is the standard EFA-then-CFA-on-different-samples workflow.

What if different methods give different numbers of factors?

Report the disagreement and reason about it substantively. If parallel analysis suggests two factors and fit-index methods suggest three, the third factor is likely a weak one that parallel analysis is missing or that fit-index methods are over-detecting; the substantive interpretability of the third factor is the deciding evidence. Pre-registering the method and reporting alternatives is the reproducibility-friendly practice.

References

Finch, W. H. (2020). Using fit statistic differences to determine the optimal number of factors to retain in an exploratory factor analysis. Educational and Psychological Measurement, 80(2), 217–241. https://doi.org/10.1177/0013164419865769
Glorfeld, L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55(3), 377–393. https://doi.org/10.1177/0013164495055003002
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327. https://doi.org/10.1007/BF02293557

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Intelligence Research and Cognitive Abilities

The G Factor: What General Intelligence Means

The g factor — Charles Spearman's name for the common variance that runs through all cognitive tests — is the most replicated and the most…

Apr 10, 2026

Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026

Psychological Measurement and Testing

WAIS-IV vs. WAIS-V: What Changed

Pearson released the Wechsler Adult Intelligence Scale, Fifth Edition (WAIS-5) in late 2024 — the first major revision since the WAIS-IV appeared in 2008. For…

Aug 7, 2025

Psychological Measurement and Testing

Validity of WISC-V Strengths and Weaknesses Profiles

When a child's WISC-V shows uneven index scores — say, a Verbal Comprehension Index of 115 and a Working Memory Index of 95 — clinicians,…

Jan 15, 2023

Statistical Methods and Data Analysis

Estimation Methods and SEM Fit Indices

Structural equation modeling (SEM) reports its goodness of fit through a small set of indices that have, by convention, hardened into thresholds. The Hu and…

Jun 2, 2020

Factor Retention in Exploratory Factor Analysis

The factor-retention problem

What parallel analysis does and where it succeeds

The fit-index difference approach

What Finch (2020) found

Practical implications for analysts

Where this fits in the broader factor-analytic methodology

Frequently Asked Questions

Why has Kaiser’s “eigenvalues greater than 1” rule been discredited?

Should I always run parallel analysis?

What’s the difference between Horn’s parallel analysis and Glorfeld’s variant?

How do I choose between EFA and CFA?

What if different methods give different numbers of factors?

References

Related Research

The G Factor: What General Intelligence Means

How to Interpret IQ Test Results

WAIS-IV vs. WAIS-V: What Changed

Validity of WISC-V Strengths and Weaknesses Profiles

Estimation Methods and SEM Fit Indices

People Also Ask

Leave a Reply Cancel reply

The factor-retention problem

What parallel analysis does and where it succeeds

The fit-index difference approach

What Finch (2020) found

Practical implications for analysts

Where this fits in the broader factor-analytic methodology

Frequently Asked Questions

Why has Kaiser’s “eigenvalues greater than 1” rule been discredited?

Should I always run parallel analysis?

What’s the difference between Horn’s parallel analysis and Glorfeld’s variant?

How do I choose between EFA and CFA?

What if different methods give different numbers of factors?

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply