Most item response models are cumulative: the probability that a respondent endorses a higher response category increases monotonically with the latent trait. A 2PL or 3PL ability model, a graded-response model for ordinal items, a partial-credit model — they all share this structural assumption, and the assumption matches the way ability tests work. Higher ability means higher probability of getting harder items right; the relationship is monotonic in the trait.
The cumulative assumption breaks down for attitude items. A respondent who is moderate on a political issue may agree with both moderate-left and moderate-right statements; an extreme respondent may disagree with both. The endorsement probability peaks at the item’s location and falls off in both directions, producing a non-monotonic, unimodal response function. The Generalized Graded Unfolding Model (GGUM) is the polytomous IRT model that handles this case: it represents items as having locations on the latent continuum, and respondents endorse items most strongly when their own location matches the item’s location.
Roberts and Thompson (2011), in Applied Psychological Measurement, evaluate the marginal maximum a posteriori (MMAP) procedure for estimating GGUM item parameters and demonstrate that it outperforms both marginal maximum likelihood (MML) and full Bayesian Markov chain Monte Carlo (MCMC) under realistic conditions. The result has practical consequences for attitude-measurement research and for any application where item locations may not be cumulative on the latent trait.
Why unfolding models are different
The unfolding-model lineage starts with Coombs (1964) and the J-scale theory of preference: respondents have ideal points on a latent continuum and prefer stimuli closer to their ideal point. The probability of preferring stimulus A over stimulus B depends on how close each stimulus is to the respondent’s ideal. Andrich and Luo (1993) translated this idea into an item response framework: the hyperbolic cosine model represents the probability of an endorsement as a function of the squared distance between respondent and item locations, producing the characteristic non-monotonic response function. The hyperbolic cosine model handled dichotomous items but did not extend cleanly to polytomous responses.
The GGUM, introduced by Roberts, Donoghue, and Laughlin (2000), is the polytomous extension. Items have a location parameter (where on the trait the item is centered), a discrimination parameter (how peaked the response function is around that location), and a set of category threshold parameters (how the item’s response options are spaced). The probability that a respondent endorses a particular response category depends on both the respondent’s location and the configuration of all three item parameters. The model recovers cumulative behavior as a special case but accommodates non-cumulative structures that cumulative models misrepresent.
The structural complexity is real. A four-category GGUM item has four parameters per item (one location, one discrimination, two thresholds) versus two for a 2PL or three for a graded-response model with comparable response categories. Estimation is correspondingly harder, and the choice of estimation procedure matters more than it does in simpler models.
The estimation problem
Marginal maximum likelihood (MML), as developed by Bock and Aitkin (1981) for cumulative IRT models, is the default estimator for most polytomous IRT models. It integrates over the latent trait distribution to produce marginal item-parameter estimates and works well when the response function is monotonic and the parameter space is smooth. For the GGUM, MML estimation has known instabilities: with limited response categories or extreme item locations, the likelihood surface has flat regions where the optimizer stalls, and the resulting estimates have large standard errors.
The MCMC alternative — sampling from the posterior with a prior on the parameters, in the Patz and Junker (1999) tradition — handles the awkward likelihood by integrating over uncertainty rather than optimizing through it. The cost is computational: GGUM MCMC requires careful tuning, long chains, and explicit convergence checking, all of which raise the barrier to adoption in applied research.
The marginal maximum a posteriori (MMAP) approach Roberts and Thompson (2011) developed sits between the two. Like MML, it integrates over the latent trait to produce marginal item-parameter estimates; like MCMC, it places a prior on the item parameters and finds the joint mode of the posterior. The result is an estimator that handles the GGUM’s awkward likelihood through prior regularization without requiring full Bayesian sampling. Computationally, MMAP is closer to MML than to MCMC; statistically, it inherits some of MCMC’s stability properties.
What Roberts and Thompson (2011) found
The simulation study crossed the number of response categories (from 2 to 6), the number of items, sample size, and the distribution of item locations on the latent trait. For each cell, the authors fit GGUM via MMAP, MML, and MCMC and recorded the recovery quality of the item parameters against ground truth.
The headline result: MMAP recovered item parameters more accurately than MML across most conditions, with the advantage largest when response categories were few (2 or 3) or when item locations were extreme. These are the conditions where MML’s likelihood-surface problems bite hardest, and where the prior regularization in MMAP supplies stability that the data alone cannot. MMAP estimates also had consistently smaller standard errors than MML, which is the operationally relevant property for any program using GGUM in practice.
Compared to MCMC, MMAP performed comparably in accuracy but at substantially lower computational cost. A GGUM calibration that took hours by MCMC could be run in minutes by MMAP, with no meaningful loss of estimation quality in the conditions tested. The trade-off MCMC offers — full posterior uncertainty quantification at the cost of computational expense — is real but is not always needed; for parameter recovery and basic standard-error reporting, MMAP delivered most of the benefit at most of the speed of MML.
The authors framed the result as a recommendation: MMAP is the appropriate default estimator for GGUM in applied research, MML is acceptable for benign conditions but should not be used when categories are few or item locations are extreme, and MCMC remains the appropriate choice when full posterior uncertainty is needed (e.g., for downstream Bayesian decision-theoretic applications).
Practical workflow for GGUM users
For attitude-measurement research using GGUM:
- Use MMAP estimation by default. Software implementations (GGUM2004 and successors) support it as a built-in option, and the operational cost relative to MML is small.
- Examine item-location estimates and category thresholds for plausibility. GGUM’s flexibility makes it easy to fit pathological solutions where item locations or thresholds drift to extreme values; sanity-checking against substantive theory is part of the responsible workflow.
- Treat the discrimination parameter with care. The peakedness of the response function around the item location affects the model’s sensitivity to mid-range respondents; very high discriminations produce sharply peaked response functions that may overfit and produce poor cross-validation.
- For high-stakes applications, run MCMC as a robustness check on the MMAP solution. If the MCMC posterior modes match the MMAP estimates and the posterior intervals are well-behaved, the MMAP solution is credible. If the MCMC posterior is multimodal or has wide tails, the MMAP point estimate is hiding important uncertainty.
- Report the estimation method and the prior specification. Like any Bayesian or quasi-Bayesian estimator, MMAP results depend on the prior; a hidden prior is a methodological liability.
Where this fits in the broader IRT-estimation literature
Item parameter estimation methodology is a quietly active subfield of psychometrics, and the GGUM is one of several models where the estimation problem is harder than the cumulative-IRT default. The pattern that recurs across cumulative IRT estimation, hierarchical Bayesian models, and unfolding models like the GGUM is the same: maximum-likelihood methods are tractable but unstable in difficult conditions, full Bayesian methods are stable but computationally expensive, and modal posterior methods (MMAP, MAP, ridge-regularized ML) often deliver most of the stability benefit at most of the computational economy.
The Roberts-Thompson contribution is one specific instance of this generalization: in the GGUM context, MMAP is the practical sweet spot. Whether the same logic applies to other polytomous unfolding models (e.g., extensions to multidimensional unfolding) and to more recent nonparametric variants is an open question, but the structural argument — that prior regularization is a cheap way to stabilize complex likelihoods — generalizes broadly.
Frequently Asked Questions
What is the difference between cumulative and unfolding IRT models?
Cumulative models assume the probability of endorsing a higher response category increases monotonically with the latent trait. Unfolding models like the GGUM assume the probability peaks at the item’s location and decreases in both directions. Cumulative models suit ability tests; unfolding models suit attitude items where extreme respondents on either side may disagree with moderate statements.
When should I use the GGUM instead of a cumulative polytomous model?
When the items measure attitudes, preferences, or judgments where the response function is non-monotonic. A graded-response model fit to attitude data with non-monotonic items will produce systematically biased item parameters and ability estimates; the GGUM handles the non-monotonicity correctly. Use the GGUM when the item content is non-cumulative; use cumulative models when it is cumulative.
What is MMAP estimation?
Marginal maximum a posteriori estimation places a prior on the item parameters, integrates over the latent trait distribution to obtain marginal item-parameter posteriors, and finds the mode of the joint posterior. It combines MML’s marginal-integration step with the prior regularization that makes Bayesian estimation stable, without requiring full Bayesian sampling.
How does MMAP compare to MML and MCMC?
MMAP outperforms MML in accuracy and standard-error stability, particularly when response categories are few or item locations are extreme. MMAP performs comparably to MCMC in accuracy at substantially lower computational cost. MCMC retains an advantage when full posterior uncertainty quantification is needed; MMAP supplies point estimates and standard errors but not full posterior intervals.
What software supports GGUM estimation?
The GGUM2004 software developed by Roberts and colleagues is the canonical implementation; later versions and the R package GGUM support MMAP and MCMC estimation. Modern probabilistic programming frameworks (Stan, PyMC) can fit GGUM via custom model code if more flexible Bayesian inference is needed.
References
- Andrich, D., & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17(3), 253–276. https://doi.org/10.1177/014662169301700307
- Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
- Journal of Educational and Behavioral Statistics, 24(4), 342–366. https://doi.org/10.3102/10769986024004342
- Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24(1), 3–32. https://doi.org/10.1177/01466216000241001
- Roberts, J. S., & Thompson, V. M. (2011). Marginal maximum a posteriori item parameter estimation for the generalized graded unfolding model. Applied Psychological Measurement, 35(4), 259–279. https://doi.org/10.1177/0146621610392565
Related Research
Computerized Adaptive Testing Explained
If you've taken the GRE, GMAT, or certain professional certification exams, you may have noticed something odd: the questions seemed to adjust to your level.…
Feb 24, 2026Item Response Theory: How Modern Tests Work
Every time you take a standardized test — an IQ assessment, a college entrance exam, a professional certification — the questions have been calibrated using…
Nov 18, 2025Psychometrics: The Science of Psychological Measurement
Psychometrics, a specialized branch within psychology, is dedicated to the theory and methodology of psychological measurement. This discipline encompasses the development and refinement of testing…
Feb 27, 2025Bridging Psychology and Psychometrics
In 2024, Psychometrika ran an unusual exchange. Three senior psychometricians — Klaas Sijtsma, Jules Ellis, and Denny Borsboom — published a focus article arguing that…
Dec 19, 2024Differential Item Functioning and Response Process
A test item that scores differently for two groups of equally able examinees is called a differential item functioning (DIF) item, and identifying these items…
Dec 16, 2024People Also Ask
What is psychometrics: the science of psychological measurement?
The discipline of psychometrics emerged from two distinct yet complementary intellectual traditions. The first, championed by figures such as Charles Darwin, Francis Galton, and James McKeen Cattell, emphasized the study of individual differences and sought to develop systematic methods for their quantification. The second, rooted in the psychophysical research of Johann Friedrich Herbart, Ernst Heinrich Weber, Gustav Fechner, and Wilhelm Wundt, laid the foundation for the empirical investigation of human perception, cognition, and consciousness. Together, these two traditions converged to form the scientific underpinnings of modern psychological measurement.
Read more →What are addressing the divide between psychology and psychometrics?
The article "Rejoinder to McNeish and Mislevy: What Does Psychological Measurement Require?" by Klaas Sijtsma, Jules L. Ellis, and Denny Borsboom provides a detailed response to criticisms and discussions raised by McNeish and Mislevy regarding the role and application of the sum score in psychometric practices. The authors address core concerns while emphasizing the need for a balance between advanced psychometric techniques and practical, transparent approaches.
Read more →What is interpreting differential item functioning with response process data?
Understanding differential item functioning (DIF) is critical for ensuring fairness in assessments across diverse groups. A recent study by Li et al. introduces a method to enhance the interpretability of DIF items by incorporating response process data. This approach aims to improve equity in measurement by examining how participants engage with test items, providing deeper insights into the factors influencing DIF outcomes.
Read more →What are integrating sdt and irt models for mixed-format exams?
Lawrence T. DeCarlo’s recent article introduces a psychological framework for mixed-format exams, combining signal detection theory (SDT) for multiple-choice items and item response theory (IRT) for open-ended items. This fusion allows for a unified model that captures the nuances of each item type while providing insights into the underlying cognitive processes of examinees.
Read more →Why is background important?
The GGUM is widely used in psychological measurement to model responses for items with graded or ordinal response categories. Accurate parameter estimation is essential to ensure the reliability and validity of inferences drawn from such models. Roberts and Thompson addressed the limitations of existing methods, particularly MML and MCMC, by proposing MMAP as a computationally efficient and precise alternative.
How does key insights work in practice?
Improved Accuracy: The MMAP method demonstrated higher accuracy in recovering item parameters compared to MML, especially when the number of response categories was limited, or item locations were extreme. Reduced Variability: Simulations showed that MMAP estimates had consistently smaller standard errors, making the procedure more reliable under various conditions. Computational Efficiency: The
Jouve, X. (2011, June 5). Item Parameter Estimation for GGUM. PsychoLogic. https://www.psychologic.online/ggum-item-parameter-estimation/

