What are future directions?

Further research is needed to refine rotation algorithms and reduce the occurrence of local solutions in multidimensional models. Exploring alternative techniques for improving structural fit indices and testing the algorithms in diverse psychometric applications would enhance the robustness and generalizability of these methods.

Nguyen and Waller’s analysis of rotation local solutions offers a significant contribution to multidimensional IRT research. By identifying the conditions under which rotation methods succeed or fail, the study provides practical guidance for researchers and practitioners aiming to improve measurement precision and model accuracy.

Nguyen, H. V., & Waller, N. G. (2024). Rotation Local Solutions in Multidimensional Item Response Theory Models. Educational and Psychological Measurement, 84(6), 1045–1075. https://doi.org/10.1177/00131644231223722

Rotation Local Solutions in Multidimensional IRT

Published: November 10, 2024 · Last reviewed: May 7, 2026

📖1,632 words⏱7 min read📚5 references cited

Multidimensional item response theory (MIRT) extends one-dimensional models like the 2PL or 3PL to test items that load on more than one latent trait. Once a model has more than one factor, the factor solution is not unique: any rotation of the factor axes produces an equivalent fit, so an analyst has to choose a rotation that yields an interpretable structure. The standard tools for this — analytic criteria such as oblimin and geomin — can fail in a way that is easy to miss. They sometimes converge to a local solution: a rotation that satisfies the criterion’s optimum locally but is far from the configuration that generated the data. Item response theory introductions usually treat rotation as a back-end mechanical step, but in the multidimensional case the choice of starting configuration can change the substantive interpretation of every loading.

The recent comprehensive Monte Carlo study by Nguyen and Waller (2024) quantifies how often this happens, what drives it, and what to do about it. Their answer — perform the rotation from many random starting configurations, not one — is unfashionable but well-supported, and it changes how MIRT analyses should be reported.

Why rotation is non-unique in multidimensional models

A multidimensional factor pattern is identified up to an arbitrary rotation. The model fit is identical whether the loadings are written in one coordinate system or another rotated coordinate system; only the labelled meaning of “factor 1” and “factor 2” changes. Analytic rotation criteria pick a coordinate system that maximizes some definition of simple structure — loadings concentrated on a few factors per item, near-zero cross-loadings elsewhere. Browne (2001) gives the canonical overview: orthogonal criteria like varimax assume uncorrelated factors; oblique criteria like oblimin and geomin allow correlated factors and are the realistic default for psychological constructs.

The criterion is a continuous function of the loading matrix. A numerical optimizer iteratively rotates the loadings to find the configuration where the criterion is minimized. The wrinkle: that function can have multiple minima. The optimizer converges to whichever minimum lies in the basin of its starting position. If the function is well-behaved and has one global minimum, every starting position gets you there. If it has several, different starts give different answers, and the optimizer has no way to know which answer corresponds to the structure that actually generated the data.

Geomin’s vulnerability to local minima

Geomin rotation has been a standard choice for multidimensional models, in part because Asparouhov and Muthén’s (2009) exploratory structural equation modeling framework made it the default in Mplus. Geomin handles cross-loadings more flexibly than the older Crawford-Ferguson family: it tolerates items that load substantively on more than one factor, which is realistic for ability tests where, say, a math word-problem subtest taps both quantitative reasoning and verbal comprehension.

The flexibility comes at a cost. Hattori, Zhang, and Preacher (2017), in Multiple Local Solutions and Geomin Rotation, demonstrated that the geomin criterion’s complexity penalty can produce multiple local minima even in clean simulated data. They showed that with realistic numbers of factors and items, geomin returned different solutions from different starting configurations a substantial fraction of the time, and that the lowest-criterion solution was not always the one closest to the data-generating model. Their recommendation — try multiple starts and inspect the resulting solutions — was a methodological warning shot that the field largely treated as an edge case.

What Nguyen and Waller (2024) found

Nguyen and Waller’s simulation is the most thorough audit of the problem to date. They generated 19,200 datasets across 96 model conditions in a multidimensional 2PL framework, and for each dataset they ran both oblimin and oblique geomin rotation from 200 random starting configurations — 7.6 million rotations in total. The design crossed slope size (λ = 0.4 vs 0.8), indicators per factor (5 vs 15), cross-loading probability (0 vs 0.05), factor correlation (0 vs 0.6), model approximation error (present vs absent), and sample size (500 vs 2,000).

The headline finding: both criteria converge to local solutions under realistic conditions, and geomin does so more often than oblimin. Local-solution rates were highest with small samples (N = 500), low salient loadings (λ = 0.4), few indicators per factor (5), present cross-loadings, and high inter-factor correlations (φ = 0.6). The combination of those conditions — a poorly-defined model fit on a small sample — is exactly where a practitioner most needs the rotation to be reliable, and is exactly where it is least reliable.

Three secondary findings deserve attention:

Model approximation error makes the problem worse. When the data are generated under a model that does not perfectly match the fitted structure (a real-data situation, not a simulation artefact), local-solution rates increased across every condition combination. Real psychometric data have approximation error by default.
Different local solutions produce different trait estimates. The same response pattern, rotated differently, yields different latent-trait scores and different conditional standard errors. This is not a small-decimal disagreement — it is a substantive change in what the test claims to measure for a given respondent.
Numerical fit indices can mislead. Quantitative measures of structural simplicity sometimes ranked the local solution above the solution closest to the data-generating model. A practitioner relying on the criterion value alone has no way to detect this.

Practical implications

The straightforward implication is that single-start rotation is unsafe in MIRT. Nguyen and Waller’s recommendation is to run the rotation from many random starting configurations — they used 200 — and inspect the distribution of solutions. If they cluster tightly around one configuration, that’s evidence the global minimum is well-defined. If they fall into multiple clusters, the analyst has to choose a solution by reasoning about substantive interpretability, not by trusting the criterion to discriminate.

This sits awkwardly with how rotation is usually reported. Most MIRT papers report a single rotated loading matrix without disclosing the starting configuration, the number of starts attempted, or the criterion value at convergence. None of those omissions are usually flagged in peer review. The Nguyen-Waller findings imply that, for borderline-defined models, the loading matrix in a published paper may be one of several plausible local solutions — and that another start could produce a noticeably different interpretation of the same data.

The practical workflow:

Run the rotation from a large number of random starts (Nguyen and Waller’s 200 is a defensible default; software like the R package EFAutilities implements multi-start rotation natively).
Tabulate the criterion values at convergence and the resulting loading matrices. If a small number of distinct solutions appear, examine each.
Use the criterion value as one signal among several, not as the deciding vote. A solution with a slightly higher criterion but more interpretable simple structure is often the right choice.
Report the procedure in the methods section: how many starts, how distinct solutions were ranked, and which was chosen.

For practitioners using MIRT in operational testing — particularly with smaller calibration samples typical of certification programs or research instruments — the cost of single-start rotation is a real risk of misinterpreting the latent structure. The fix is computational, cheap, and well-specified.

Where this fits in the broader rotation literature

The Nguyen-Waller results fit a pattern that runs through factor-analytic methodology: criteria that look like they should converge on a unique solution often do not, and the failure mode is silent. The same general issue surfaces in factor-retention decisions, where different fit indices disagree on the number of factors to extract, and in identification of multidimensional IRT models, where rotational invariance is the deeper structural reason rotation is non-unique to begin with.

The unifying lesson is that automated psychometric software produces a single answer per dataset, which encourages the user to treat that answer as canonical. Multidimensional rotation, factor retention, and model identification all benefit from running the analysis multiple times under varied conditions and treating disagreement as informative. Rotation local solutions are one specific instance of a methodological habit that the literature is slowly internalizing.

Frequently Asked Questions

What is a local solution in factor rotation?

A configuration of rotated loadings that minimizes the rotation criterion within a neighborhood of starting points but is not the global minimum. Different starting configurations can lead the optimizer to different local solutions, each consistent with the criterion’s optimum in its own basin.

Why does geomin produce more local solutions than oblimin?

Geomin’s complexity penalty handles cross-loadings flexibly, which is useful but creates a more rugged criterion surface. Oblimin’s geometry is simpler, so it has fewer distinct minima but is less suited to genuine cross-loading structures. Nguyen and Waller (2024) showed geomin’s local-solution rate exceeds oblimin’s across most realistic conditions in MIRT.

How many starting configurations should I use?

Nguyen and Waller used 200 in their simulation and recommend a large number for applied work. The R package EFAutilities defaults to 10 and can be raised; Mplus offers STARTS= options. For models with many factors or noisy data, more starts are better; the computational cost is small relative to the cost of a misinterpreted solution.

Can I tell which local solution is correct?

Not from the criterion alone. Numerical fit indices sometimes rank a local solution above the data-generating solution. The practical workflow is to inspect each distinct solution for interpretability against substantive theory, examine which loadings are stable across solutions, and report the procedure transparently.

Does this affect unidimensional IRT?

No. Rotation is only meaningful when there is more than one factor. Unidimensional 2PL or 3PL models are identified up to a sign and scale and have no rotational ambiguity. The local-solution problem arises specifically in multidimensional models with two or more correlated latent traits.

References

Asparouhov, T., & Muthén, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16(3), 397–438. https://doi.org/10.1080/10705510903008204
Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36(1), 111–150. https://doi.org/10.1207/S15327906MBR3601_05
Hattori, M., Zhang, G., & Preacher, K. J. (2017). Multiple local solutions and geomin rotation. Multivariate Behavioral Research, 52(6), 720–731. https://doi.org/10.1080/00273171.2017.1361312
Nguyen, H. V., & Waller, N. G. (2024). Rotation local solutions in multidimensional item response theory models. Educational and Psychological Measurement, 84(6), 1045–1075. https://doi.org/10.1177/00131644231223722
Reckase, M. D. (2009). Multidimensional item response theory. Springer.

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Statistical Methods and Data Analysis

Item Response Theory: How Modern Tests Work

Every time you take a standardized test — an IQ assessment, a college entrance exam, a professional certification — the questions have been calibrated using…

Nov 18, 2025

Statistical Methods and Data Analysis

Group-Theoretic Symmetries in Item Response Theory

Item response theory (IRT) parameters are not unique. Different parameterizations of the same model fit the data identically, and the choice between them is settled…

Oct 11, 2024

Cognitive Neuroscience and Brain Function

Eye Movement Models of Decision Making

Eye-tracking has become a quantitative instrument for decision research because where someone looks—and for how long—is structured by the same cognitive process that produces the…

Jun 14, 2023

Statistical Methods and Data Analysis

Bridging Psychology and Psychometrics

In 2024, Psychometrika ran an unusual exchange. Three senior psychometricians — Klaas Sijtsma, Jules Ellis, and Denny Borsboom — published a focus article arguing that…

Dec 19, 2024

Statistical Methods and Data Analysis

Differential Item Functioning and Response Process

A test item that scores differently for two groups of equally able examinees is called a differential item functioning (DIF) item, and identifying these items…

Dec 16, 2024

Rotation Local Solutions in Multidimensional IRT

Why rotation is non-unique in multidimensional models

Geomin’s vulnerability to local minima

What Nguyen and Waller (2024) found

Practical implications

Where this fits in the broader rotation literature

Frequently Asked Questions

What is a local solution in factor rotation?

Why does geomin produce more local solutions than oblimin?

How many starting configurations should I use?

Can I tell which local solution is correct?

Does this affect unidimensional IRT?

References

Related Research

Item Response Theory: How Modern Tests Work

Group-Theoretic Symmetries in Item Response Theory

Eye Movement Models of Decision Making

Bridging Psychology and Psychometrics

Differential Item Functioning and Response Process

People Also Ask

Leave a Reply Cancel reply

Why rotation is non-unique in multidimensional models

Geomin’s vulnerability to local minima

What Nguyen and Waller (2024) found

Practical implications

Where this fits in the broader rotation literature

Frequently Asked Questions

What is a local solution in factor rotation?

Why does geomin produce more local solutions than oblimin?

How many starting configurations should I use?

Can I tell which local solution is correct?

Does this affect unidimensional IRT?

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply