Statistical Methods and Data Analysis

Rotation Local Solutions in Multidimensional IRT

Rotation Local Solutions in Multidimensional Item Response Models
Published: November 10, 2024 · Last reviewed:
📖1,632 words⏱7 min read📚5 references cited

Multidimensional item response theory (MIRT) extends one-dimensional models like the 2PL or 3PL to test items that load on more than one latent trait. Once a model has more than one factor, the factor solution is not unique: any rotation of the factor axes produces an equivalent fit, so an analyst has to choose a rotation that yields an interpretable structure. The standard tools for this — analytic criteria such as oblimin and geomin — can fail in a way that is easy to miss. They sometimes converge to a local solution: a rotation that satisfies the criterion’s optimum locally but is far from the configuration that generated the data. Item response theory introductions usually treat rotation as a back-end mechanical step, but in the multidimensional case the choice of starting configuration can change the substantive interpretation of every loading.

The recent comprehensive Monte Carlo study by Nguyen and Waller (2024) quantifies how often this happens, what drives it, and what to do about it. Their answer — perform the rotation from many random starting configurations, not one — is unfashionable but well-supported, and it changes how MIRT analyses should be reported.

Why rotation is non-unique in multidimensional models

A multidimensional factor pattern is identified up to an arbitrary rotation. The model fit is identical whether the loadings are written in one coordinate system or another rotated coordinate system; only the labelled meaning of “factor 1” and “factor 2” changes. Analytic rotation criteria pick a coordinate system that maximizes some definition of simple structure — loadings concentrated on a few factors per item, near-zero cross-loadings elsewhere. Browne (2001) gives the canonical overview: orthogonal criteria like varimax assume uncorrelated factors; oblique criteria like oblimin and geomin allow correlated factors and are the realistic default for psychological constructs.

The criterion is a continuous function of the loading matrix. A numerical optimizer iteratively rotates the loadings to find the configuration where the criterion is minimized. The wrinkle: that function can have multiple minima. The optimizer converges to whichever minimum lies in the basin of its starting position. If the function is well-behaved and has one global minimum, every starting position gets you there. If it has several, different starts give different answers, and the optimizer has no way to know which answer corresponds to the structure that actually generated the data.

Geomin’s vulnerability to local minima

Geomin rotation has been a standard choice for multidimensional models, in part because Asparouhov and Muthén’s (2009) exploratory structural equation modeling framework made it the default in Mplus. Geomin handles cross-loadings more flexibly than the older Crawford-Ferguson family: it tolerates items that load substantively on more than one factor, which is realistic for ability tests where, say, a math word-problem subtest taps both quantitative reasoning and verbal comprehension.

The flexibility comes at a cost. Hattori, Zhang, and Preacher (2017), in Multiple Local Solutions and Geomin Rotation, demonstrated that the geomin criterion’s complexity penalty can produce multiple local minima even in clean simulated data. They showed that with realistic numbers of factors and items, geomin returned different solutions from different starting configurations a substantial fraction of the time, and that the lowest-criterion solution was not always the one closest to the data-generating model. Their recommendation — try multiple starts and inspect the resulting solutions — was a methodological warning shot that the field largely treated as an edge case.

What Nguyen and Waller (2024) found

Nguyen and Waller’s simulation is the most thorough audit of the problem to date. They generated 19,200 datasets across 96 model conditions in a multidimensional 2PL framework, and for each dataset they ran both oblimin and oblique geomin rotation from 200 random starting configurations — 7.6 million rotations in total. The design crossed slope size (λ = 0.4 vs 0.8), indicators per factor (5 vs 15), cross-loading probability (0 vs 0.05), factor correlation (0 vs 0.6), model approximation error (present vs absent), and sample size (500 vs 2,000).

The headline finding: both criteria converge to local solutions under realistic conditions, and geomin does so more often than oblimin. Local-solution rates were highest with small samples (N = 500), low salient loadings (λ = 0.4), few indicators per factor (5), present cross-loadings, and high inter-factor correlations (φ = 0.6). The combination of those conditions — a poorly-defined model fit on a small sample — is exactly where a practitioner most needs the rotation to be reliable, and is exactly where it is least reliable.

Three secondary findings deserve attention:

  • Model approximation error makes the problem worse. When the data are generated under a model that does not perfectly match the fitted structure (a real-data situation, not a simulation artefact), local-solution rates increased across every condition combination. Real psychometric data have approximation error by default.
  • Different local solutions produce different trait estimates. The same response pattern, rotated differently, yields different latent-trait scores and different conditional standard errors. This is not a small-decimal disagreement — it is a substantive change in what the test claims to measure for a given respondent.
  • Numerical fit indices can mislead. Quantitative measures of structural simplicity sometimes ranked the local solution above the solution closest to the data-generating model. A practitioner relying on the criterion value alone has no way to detect this.

Practical implications

The straightforward implication is that single-start rotation is unsafe in MIRT. Nguyen and Waller’s recommendation is to run the rotation from many random starting configurations — they used 200 — and inspect the distribution of solutions. If they cluster tightly around one configuration, that’s evidence the global minimum is well-defined. If they fall into multiple clusters, the analyst has to choose a solution by reasoning about substantive interpretability, not by trusting the criterion to discriminate.

This sits awkwardly with how rotation is usually reported. Most MIRT papers report a single rotated loading matrix without disclosing the starting configuration, the number of starts attempted, or the criterion value at convergence. None of those omissions are usually flagged in peer review. The Nguyen-Waller findings imply that, for borderline-defined models, the loading matrix in a published paper may be one of several plausible local solutions — and that another start could produce a noticeably different interpretation of the same data.

The practical workflow:

  • Run the rotation from a large number of random starts (Nguyen and Waller’s 200 is a defensible default; software like the R package EFAutilities implements multi-start rotation natively).
  • Tabulate the criterion values at convergence and the resulting loading matrices. If a small number of distinct solutions appear, examine each.
  • Use the criterion value as one signal among several, not as the deciding vote. A solution with a slightly higher criterion but more interpretable simple structure is often the right choice.
  • Report the procedure in the methods section: how many starts, how distinct solutions were ranked, and which was chosen.

For practitioners using MIRT in operational testing — particularly with smaller calibration samples typical of certification programs or research instruments — the cost of single-start rotation is a real risk of misinterpreting the latent structure. The fix is computational, cheap, and well-specified.

Where this fits in the broader rotation literature

The Nguyen-Waller results fit a pattern that runs through factor-analytic methodology: criteria that look like they should converge on a unique solution often do not, and the failure mode is silent. The same general issue surfaces in factor-retention decisions, where different fit indices disagree on the number of factors to extract, and in identification of multidimensional IRT models, where rotational invariance is the deeper structural reason rotation is non-unique to begin with.

The unifying lesson is that automated psychometric software produces a single answer per dataset, which encourages the user to treat that answer as canonical. Multidimensional rotation, factor retention, and model identification all benefit from running the analysis multiple times under varied conditions and treating disagreement as informative. Rotation local solutions are one specific instance of a methodological habit that the literature is slowly internalizing.

Frequently Asked Questions

What is a local solution in factor rotation?

A configuration of rotated loadings that minimizes the rotation criterion within a neighborhood of starting points but is not the global minimum. Different starting configurations can lead the optimizer to different local solutions, each consistent with the criterion’s optimum in its own basin.

Why does geomin produce more local solutions than oblimin?

Geomin’s complexity penalty handles cross-loadings flexibly, which is useful but creates a more rugged criterion surface. Oblimin’s geometry is simpler, so it has fewer distinct minima but is less suited to genuine cross-loading structures. Nguyen and Waller (2024) showed geomin’s local-solution rate exceeds oblimin’s across most realistic conditions in MIRT.

How many starting configurations should I use?

Nguyen and Waller used 200 in their simulation and recommend a large number for applied work. The R package EFAutilities defaults to 10 and can be raised; Mplus offers STARTS= options. For models with many factors or noisy data, more starts are better; the computational cost is small relative to the cost of a misinterpreted solution.

Can I tell which local solution is correct?

Not from the criterion alone. Numerical fit indices sometimes rank a local solution above the data-generating solution. The practical workflow is to inspect each distinct solution for interpretability against substantive theory, examine which loadings are stable across solutions, and report the procedure transparently.

Does this affect unidimensional IRT?

No. Rotation is only meaningful when there is more than one factor. Unidimensional 2PL or 3PL models are identified up to a sign and scale and have no rotational ambiguity. The local-solution problem arises specifically in multidimensional models with two or more correlated latent traits.

References

  • Asparouhov, T., & MuthĂ©n, B. (2009). Exploratory structural equation modeling. Structural Equation Modeling, 16(3), 397–438. https://doi.org/10.1080/10705510903008204
  • Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36(1), 111–150. https://doi.org/10.1207/S15327906MBR3601_05
  • Hattori, M., Zhang, G., & Preacher, K. J. (2017). Multiple local solutions and geomin rotation. Multivariate Behavioral Research, 52(6), 720–731. https://doi.org/10.1080/00273171.2017.1361312
  • Nguyen, H. V., & Waller, N. G. (2024). Rotation local solutions in multidimensional item response theory models. Educational and Psychological Measurement, 84(6), 1045–1075. https://doi.org/10.1177/00131644231223722
  • Reckase, M. D. (2009). Multidimensional item response theory. Springer.

Related Research

Statistical Methods and Data Analysis

Item Response Theory: How Modern Tests Work

Every time you take a standardized test — an IQ assessment, a college entrance exam, a professional certification — the questions have been calibrated using…

Nov 18, 2025
Statistical Methods and Data Analysis

Group-Theoretic Symmetries in Item Response Theory

Item response theory (IRT) parameters are not unique. Different parameterizations of the same model fit the data identically, and the choice between them is settled…

Oct 11, 2024
Cognitive Neuroscience and Brain Function

Eye Movement Models of Decision Making

Eye-tracking has become a quantitative instrument for decision research because where someone looks—and for how long—is structured by the same cognitive process that produces the…

Jun 14, 2023
Statistical Methods and Data Analysis

Bridging Psychology and Psychometrics

In 2024, Psychometrika ran an unusual exchange. Three senior psychometricians — Klaas Sijtsma, Jules Ellis, and Denny Borsboom — published a focus article arguing that…

Dec 19, 2024
Statistical Methods and Data Analysis

Differential Item Functioning and Response Process

A test item that scores differently for two groups of equally able examinees is called a differential item functioning (DIF) item, and identifying these items…

Dec 16, 2024

People Also Ask

What is group-theoretical symmetries in item response theory (irt)?

Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.

Read more →
What is simulated irt dataset generator v1.00 at cogn-iq.org?

The Dataset Generator available at Cogn-IQ.org is a powerful resource designed for researchers and practitioners working with Item Response Theory (IRT). This tool simulates datasets tailored for psychometric analysis, enabling users to explore a range of testing scenarios with customizable item and subject characteristics. It supports the widely used 2-Parameter Logistic (2PL) model, providing flexibility and precision for diverse applications.

Read more →
What are peering into decision making: exploration of modeling eye movements?

The study by Wedel, Pieters, and van der Lans (2023) reviews advancements in modeling eye movements to understand decision-making processes. Eye tracking offers valuable insights into perceptual and cognitive mechanisms, making it a powerful tool for studying how individuals evaluate and make decisions.

Read more →
Why is background important?

The study builds on prior item response theory (IRT) research, specifically focusing on multidimensional models and factor rotation techniques. IRT serves as a foundational framework for analyzing latent traits, and introducing multidimensional models adds complexity to the estimation process. The research extends the standard M2PL model to account for correlated major and uncorrelated minor factors, representing model error. Examining rotation algorithms, the study addresses challenges in achieving accurate trait estimation.

How does key insights work in practice?

Influence of Design Variables: Factors such as slope parameter sizes, number of indicators per factor, and probabilities of cross-loadings significantly impact local solution rates for the oblimin and geomin rotation methods. Performance of Rotation Methods: The geomin rotation algorithm demonstrated higher local solution rates across multiple models, although both methods showed

Why does significance matter in psychology?

This research underscores the importance of understanding rotation local solutions in the context of multidimensional IRT models. The findings provide valuable insights for psychometricians working on improving the accuracy of latent trait estimation. Additionally, the study highlights the need for caution when using numerical measures of structural fit, as these indices may not always align with the true data-generating model.

📋 Cite This Article

Jouve, X. (2024, November 10). Rotation Local Solutions in Multidimensional IRT. PsychoLogic. https://www.psychologic.online/multidimensional-irt-rotation-solutions/

Leave a Reply