Choosing the number of factors to retain in an exploratory factor analysis is the methodological decision that most determines what the analysis reports. Retain too few and the model collapses real distinctions between constructs into a single muddled dimension; retain too many and the model dignifies measurement noise as substantive structure. The decision is unavoidable, the methods for making it are several, and decades of comparative research have produced no single approach that dominates across all conditions. Finch (2020), in Educational and Psychological Measurement, evaluates fit-index difference values against the field-standard parallel analysis under a wide range of simulation conditions and reports a result that sharpens which method to reach for in which case.
The factor-retention problem
EFA represents observed item correlations as the product of latent factor loadings plus residual variance. The number of factors is a free parameter; the analyst supplies it, and the analysis returns the loadings conditional on that choice. The job of a factor-retention method is to estimate a defensible number of factors from the data themselves rather than from the analyst’s prior expectations. The methods fall into three rough families: methods based on eigenvalues of the correlation matrix (Kaiser’s “eigenvalues > 1” rule, Cattell’s scree test), methods based on simulation (Horn’s parallel analysis, Velicer’s minimum average partial correlation), and methods based on goodness-of-fit comparison across models with different numbers of factors (the fit-index family that Finch focuses on).
The Kaiser rule has been thoroughly discredited by simulation work; it consistently overestimates the number of factors and is now treated as a baseline rather than a recommendation. Cattell’s scree test is visual and depends on subjective judgment, which makes it hard to teach and impossible to automate. Parallel analysis (Horn, 1965; Glorfeld, 1995) and Velicer’s (1976) minimum average partial (MAP) test are the field-standard simulation-based methods. The fit-index family is newer in this application and has not been comprehensively benchmarked until recently.
What parallel analysis does and where it succeeds
Parallel analysis compares the observed eigenvalues of the data correlation matrix to the eigenvalues that would be obtained from random data with the same number of items and respondents but no factor structure. The number of factors retained is the number of observed eigenvalues that exceed the corresponding random-data eigenvalues. The intuition is that a factor worth retaining must explain more variance than would be expected from chance.
Horn’s (1965) original formulation used the mean of simulated eigenvalues; Glorfeld (1995) showed that the 95th percentile is more conservative and reduces the over-extraction tendency that the mean version inherits. Modern implementations in the R psych package and in similar tools default to the Glorfeld 95th-percentile variant. Parallel analysis works well under continuous indicators, normally distributed data, and moderate-to-high factor loadings. It struggles when item distributions are categorical or skewed, when factor loadings are small, or when the underlying structure has highly correlated factors that the random-data baseline doesn’t capture.
Velicer’s (1976) MAP test takes a different approach: it computes the average squared partial correlation between items after sequentially extracting components, and retains as many components as minimize this quantity. MAP is less prone to over-extraction than parallel analysis but can under-extract in the presence of weak factors. Both methods are widely cited as defaults in introductory texts, with parallel analysis usually preferred for its simulation-based interpretability.
The fit-index difference approach
Confirmatory factor analysis (CFA) and structural equation modeling (SEM) routinely use fit indices — the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR) — to evaluate whether a specified factor model fits the data adequately. Hu and Bentler (1999) supplied the cutoff conventions that the field has been arguing about ever since: CFI ≥ 0.95, TLI ≥ 0.95, RMSEA ≤ 0.06, SRMR ≤ 0.08 for adequate fit.
The fit-index difference approach to EFA factor retention treats the EFA as a series of CFA-like models with increasing numbers of factors and computes the change in each fit index as factors are added. The number of factors retained is the smallest number where adding another factor produces a negligible improvement in fit. The intuition is similar to the scree test but applied to fit indices rather than to eigenvalues, and it is automatable rather than dependent on visual judgment.
Finch (2020) operationalized this by setting threshold differences in CFI, TLI, RMSEA, and SRMR — for instance, the model is preferred over the smaller-factor alternative if CFI improves by more than 0.01 — and tested the approach against parallel analysis across a factorial simulation that crossed sample size, factor loadings, number of factors, and item type (continuous vs categorical).
What Finch (2020) found
Two practical conclusions emerged. First, fit-index difference values outperformed parallel analysis for categorical indicators across most conditions. Categorical data violate the continuity assumptions that parallel analysis implicitly relies on, and the random-data baseline is less informative when item responses are bounded by ordinal scales with limited values. Fit-index methods, when computed with appropriate estimation for categorical data (WLSMV or robust MLR), handle this case more gracefully.
Second, fit-index difference values outperformed parallel analysis when factor loadings were low, even with normally distributed indicators. Weak factors are the case where parallel analysis’s random-data baseline is closest to the real eigenvalues, so the discrimination between signal and noise becomes unreliable. Fit-index methods, by contrast, accumulate small improvements in fit across multiple indicators and can detect weak factors that parallel analysis misses.
For the bread-and-butter case — continuous indicators with moderate-to-high factor loadings — parallel analysis remained competitive with fit-index methods, and either approach was defensible. The advantage of fit-index methods was specific to the harder cases (categorical or weak loadings), not a general superiority that warrants displacing parallel analysis as the field’s default.
Practical implications for analysts
The methodological lesson is that no single factor-retention method is universally optimal, and analysts should choose based on the characteristics of their data:
- Continuous indicators with moderate-to-high loadings: parallel analysis (Glorfeld 95th-percentile variant) is reliable and well-established.
- Categorical or ordinal indicators: fit-index difference values with appropriate estimation (WLSMV) are more reliable than parallel analysis, which assumes continuity.
- Weak factor loadings (under ~0.4): fit-index methods detect factors that parallel analysis misses; the cost is a higher false-positive rate that needs to be weighed against the substantive interpretability of the additional factor.
- All cases: report the result of multiple methods, not a single one. When parallel analysis, MAP, and fit-index methods agree, the factor-retention decision is robust; when they disagree, the disagreement is the finding, and the analyst should report the alternatives and choose on substantive grounds.
The reproducibility-friendly practice is to specify the factor-retention method in advance, apply it mechanically, and report any post-hoc deviations explicitly. Choosing the method after seeing the result — running parallel analysis, getting two factors, running fit-index methods, getting three, reporting whichever supports the preferred narrative — is the analytic flexibility that gives EFA its bad reputation in confirmatory disciplines. The remedy is preregistration of the method, not abandonment of EFA.
Where this fits in the broader factor-analytic methodology
Factor retention is the first of several decisions that shape an EFA result. The choice of rotation criterion (oblimin, geomin, varimax) determines how the loadings are presented; choice of estimator (MLR, WLSMV, ULS) determines how robust the fit indices are to assumption violations; treatment of missing data, sample size, and item distributions all interact with the factor-retention decision in non-obvious ways. The Finch 2020 contribution sits at one corner of this multi-dimensional methodological space, and the lesson generalizes: the right factor-retention method is the one that matches the data’s properties, not the one that the analyst learned in graduate school.
The unifying theme across modern factor-analytic methodology is that automated software produces a single answer per dataset, and the user is encouraged to trust it. The right discipline is to run the analysis under multiple defensible specifications, examine where the results agree and where they diverge, and report the divergences as part of the substantive story. Factor retention is one of the easier methodological choices to subject to this multiple-method discipline; the cost of running parallel analysis, MAP, and fit-index methods on the same data is minutes, and the additional information about robustness is non-trivial.
Frequently Asked Questions
Why has Kaiser’s “eigenvalues greater than 1” rule been discredited?
It consistently overestimates the number of factors, sometimes by large margins. The rule was developed for a specific application (principal components on a particular type of correlation matrix) and does not generalize to factor analysis on real data. Modern simulation work has shown it is wrong substantially more often than it is right. It is now used only as a baseline against which better methods are compared.
Should I always run parallel analysis?
For continuous indicators with normally distributed items and moderate-to-high factor loadings, yes — it is a reliable default. For categorical or ordinal indicators, or for cases with weak loadings, parallel analysis can over- or under-extract, and fit-index difference values (Finch, 2020) are a defensible alternative.
What’s the difference between Horn’s parallel analysis and Glorfeld’s variant?
Horn (1965) compared observed eigenvalues to the mean of eigenvalues from simulated random data. Glorfeld (1995) replaced the mean with the 95th percentile, which is more conservative and reduces over-extraction. Modern implementations default to the Glorfeld variant; if the software calls it “parallel analysis”, check which percentile it uses.
How do I choose between EFA and CFA?
EFA is appropriate when the factor structure is unknown or contested; CFA is appropriate when a specific structure is hypothesized in advance. The decision is about the analyst’s prior knowledge, not about the data themselves. EFA findings should not be used to confirm a structure on the same data they were derived from; that is the standard EFA-then-CFA-on-different-samples workflow.
What if different methods give different numbers of factors?
Report the disagreement and reason about it substantively. If parallel analysis suggests two factors and fit-index methods suggest three, the third factor is likely a weak one that parallel analysis is missing or that fit-index methods are over-detecting; the substantive interpretability of the third factor is the deciding evidence. Pre-registering the method and reporting alternatives is the reproducibility-friendly practice.
References
- Finch, W. H. (2020). Using fit statistic differences to determine the optimal number of factors to retain in an exploratory factor analysis. Educational and Psychological Measurement, 80(2), 217–241. https://doi.org/10.1177/0013164419865769
- Glorfeld, L. W. (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55(3), 377–393. https://doi.org/10.1177/0013164495055003002
- Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. https://doi.org/10.1007/BF02289447
- Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
- Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321–327. https://doi.org/10.1007/BF02293557
Related Research
The G Factor: What General Intelligence Means
The g factor — Charles Spearman's name for the common variance that runs through all cognitive tests — is the most replicated and the most…
Apr 10, 2026How to Interpret IQ Test Results
You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…
Mar 15, 2026WAIS-IV vs. WAIS-V: What Changed
Pearson released the Wechsler Adult Intelligence Scale, Fifth Edition (WAIS-5) in late 2024 — the first major revision since the WAIS-IV appeared in 2008. For…
Aug 7, 2025Validity of WISC-V Strengths and Weaknesses Profiles
When a child's WISC-V shows uneven index scores — say, a Verbal Comprehension Index of 115 and a Working Memory Index of 95 — clinicians,…
Jan 15, 2023Estimation Methods and SEM Fit Indices
Structural equation modeling (SEM) reports its goodness of fit through a small set of indices that have, by convention, hardened into thresholds. The Hu and…
Jun 2, 2020People Also Ask
What are validity of wisc-v profiles of strengths and weaknesses?
The Wechsler Intelligence Scale for Children-Fifth Edition (WISC-V) has been widely used to assess cognitive abilities in children. This article by Peter F. de Jong evaluates the validity of interpreting WISC-V profiles of strengths and weaknesses, which are often derived from differences between overall scores and index scores.
Read more →What is the effect of estimation methods on sem fit indices?
The study by Shi and Maydeu-Olivares (2020) analyzes how different estimation methods influence key fit indices in Structural Equation Modeling (SEM). By focusing on methods such as Maximum Likelihood (ML), Unweighted Least Squares (ULS), and Diagonally Weighted Least Squares (DWLS), the authors explore the nuances of model fit across various types of misspecifications.
Read more →Dissecting Cognition: Spatial vs. Abstract Reasoning?
Summary. An analysis of performance on the Jouve–Cerebrals Test of Induction (JCTI) and four GAMA subtests (Matching, Analogies, Sequences, Construction) points to a single dominant source of individual differences rather than two separate abilities. With N = 118, factor-analytic evidence favors a general reasoning factor that subsumes both spatial–temporal and abstract problem-solving demands. Any apparent “two-factor” pattern is better explained by task-specific variance and sampling noise than by distinct latent abilities.
Read more →What are differentiating cognitive abilities: a factor analysis of jcces and gama subtests?
This study aimed to investigate the differentiation between cognitive abilities assessed by the Jouve Cerebrals Crystallized Educational Scale (JCCES) and General Ability Measure for Adults (GAMA). A sample of 63 participants completed both JCCES and GAMA subtests. Pearson correlation and factor analysis were used to analyze the data. The results revealed significant positive correlations between most of the JCCES subtests, while correlations between GAMA and JCCES subtests were generally lower. Factor analysis extracted two distinct factors, with JCCES subtests loading on one factor and GAMA subtests loading on the other. The findings supported the hypothesis that JCCES and GAMA measure distinct cognitive abilities, with JCCES assessing crystallized abilities and GAMA evaluating nonverbal and figurative aspects of general cognitive abilities. This differentiation has important implications for the interpretation of JCCES and GAMA scores and their application in educational, clinical, and research settings.
Read more →Why is background important?
Exploratory factor analysis is widely used to identify underlying structures in datasets. However, selecting the correct number of factors to retain has proven complex, as no single method consistently outperforms others across all scenarios. Fit indices and parallel analysis are frequently used techniques, but their effectiveness varies depending on data characteristics such as distribution and factor loadings. Finch’s research investigates these differences through a simulation-based study.
How does key insights work in practice?
Performance of Fit Index Difference Values: Finch found that fit index difference values were more effective than parallel analysis for categorical indicators and for normally distributed indicators when factor loadings were low. Parallel Analysis Limitations: While parallel analysis remains a trusted method, its performance was less reliable in the scenarios tested,
Jouve, X. (2020, August 6). Factor Retention in Exploratory Factor Analysis. PsychoLogic. https://www.psychologic.online/factor-retention-efa/

