What is significance?

This study highlights the practical advantages of using MMAP for GGUM parameter estimation. The combination of greater accuracy, lower variability, and efficiency makes it a valuable tool for researchers and practitioners in psychological measurement. Additionally, the findings underscore the importance of choosing estimation methods that are tailored to the specific characteristics of the data being analyzed.

What are future directions?

Future research could expand on this work by evaluating the MMAP procedure in real-world datasets across different contexts. Investigating its performance with larger and more diverse populations would help assess its generalizability. Additionally, exploring extensions of MMAP to other item response models may further demonstrate its versatility and applicability.

Roberts and Thompson’s (2011) study provides compelling evidence for the advantages of the MMAP procedure in GGUM parameter estimation. Their findings emphasize the importance of balancing accuracy, variability, and computational demands when selecting estimation methods. This work represents a meaningful contribution to advancing practices in psychological measurement.

Roberts, J. S., & Thompson, V. M. (2011). Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model. Applied Psychological Measurement, 35(4), 259-279. https://doi.org/10.1177/0146621610392565

Item Parameter Estimation for GGUM

Published: June 5, 2011 · Last reviewed: May 7, 2026

📖1,751 words⏱7 min read📚5 references cited

Most item response models are cumulative: the probability that a respondent endorses a higher response category increases monotonically with the latent trait. A 2PL or 3PL ability model, a graded-response model for ordinal items, a partial-credit model — they all share this structural assumption, and the assumption matches the way ability tests work. Higher ability means higher probability of getting harder items right; the relationship is monotonic in the trait.

The cumulative assumption breaks down for attitude items. A respondent who is moderate on a political issue may agree with both moderate-left and moderate-right statements; an extreme respondent may disagree with both. The endorsement probability peaks at the item’s location and falls off in both directions, producing a non-monotonic, unimodal response function. The Generalized Graded Unfolding Model (GGUM) is the polytomous IRT model that handles this case: it represents items as having locations on the latent continuum, and respondents endorse items most strongly when their own location matches the item’s location.

Roberts and Thompson (2011), in Applied Psychological Measurement, evaluate the marginal maximum a posteriori (MMAP) procedure for estimating GGUM item parameters and demonstrate that it outperforms both marginal maximum likelihood (MML) and full Bayesian Markov chain Monte Carlo (MCMC) under realistic conditions. The result has practical consequences for attitude-measurement research and for any application where item locations may not be cumulative on the latent trait.

Why unfolding models are different

The unfolding-model lineage starts with Coombs (1964) and the J-scale theory of preference: respondents have ideal points on a latent continuum and prefer stimuli closer to their ideal point. The probability of preferring stimulus A over stimulus B depends on how close each stimulus is to the respondent’s ideal. Andrich and Luo (1993) translated this idea into an item response framework: the hyperbolic cosine model represents the probability of an endorsement as a function of the squared distance between respondent and item locations, producing the characteristic non-monotonic response function. The hyperbolic cosine model handled dichotomous items but did not extend cleanly to polytomous responses.

The GGUM, introduced by Roberts, Donoghue, and Laughlin (2000), is the polytomous extension. Items have a location parameter (where on the trait the item is centered), a discrimination parameter (how peaked the response function is around that location), and a set of category threshold parameters (how the item’s response options are spaced). The probability that a respondent endorses a particular response category depends on both the respondent’s location and the configuration of all three item parameters. The model recovers cumulative behavior as a special case but accommodates non-cumulative structures that cumulative models misrepresent.

The structural complexity is real. A four-category GGUM item has four parameters per item (one location, one discrimination, two thresholds) versus two for a 2PL or three for a graded-response model with comparable response categories. Estimation is correspondingly harder, and the choice of estimation procedure matters more than it does in simpler models.

The estimation problem

Marginal maximum likelihood (MML), as developed by Bock and Aitkin (1981) for cumulative IRT models, is the default estimator for most polytomous IRT models. It integrates over the latent trait distribution to produce marginal item-parameter estimates and works well when the response function is monotonic and the parameter space is smooth. For the GGUM, MML estimation has known instabilities: with limited response categories or extreme item locations, the likelihood surface has flat regions where the optimizer stalls, and the resulting estimates have large standard errors.

The MCMC alternative — sampling from the posterior with a prior on the parameters, in the Patz and Junker (1999) tradition — handles the awkward likelihood by integrating over uncertainty rather than optimizing through it. The cost is computational: GGUM MCMC requires careful tuning, long chains, and explicit convergence checking, all of which raise the barrier to adoption in applied research.

The marginal maximum a posteriori (MMAP) approach Roberts and Thompson (2011) developed sits between the two. Like MML, it integrates over the latent trait to produce marginal item-parameter estimates; like MCMC, it places a prior on the item parameters and finds the joint mode of the posterior. The result is an estimator that handles the GGUM’s awkward likelihood through prior regularization without requiring full Bayesian sampling. Computationally, MMAP is closer to MML than to MCMC; statistically, it inherits some of MCMC’s stability properties.

What Roberts and Thompson (2011) found

The simulation study crossed the number of response categories (from 2 to 6), the number of items, sample size, and the distribution of item locations on the latent trait. For each cell, the authors fit GGUM via MMAP, MML, and MCMC and recorded the recovery quality of the item parameters against ground truth.

The headline result: MMAP recovered item parameters more accurately than MML across most conditions, with the advantage largest when response categories were few (2 or 3) or when item locations were extreme. These are the conditions where MML’s likelihood-surface problems bite hardest, and where the prior regularization in MMAP supplies stability that the data alone cannot. MMAP estimates also had consistently smaller standard errors than MML, which is the operationally relevant property for any program using GGUM in practice.

Compared to MCMC, MMAP performed comparably in accuracy but at substantially lower computational cost. A GGUM calibration that took hours by MCMC could be run in minutes by MMAP, with no meaningful loss of estimation quality in the conditions tested. The trade-off MCMC offers — full posterior uncertainty quantification at the cost of computational expense — is real but is not always needed; for parameter recovery and basic standard-error reporting, MMAP delivered most of the benefit at most of the speed of MML.

The authors framed the result as a recommendation: MMAP is the appropriate default estimator for GGUM in applied research, MML is acceptable for benign conditions but should not be used when categories are few or item locations are extreme, and MCMC remains the appropriate choice when full posterior uncertainty is needed (e.g., for downstream Bayesian decision-theoretic applications).

Practical workflow for GGUM users

For attitude-measurement research using GGUM:

Use MMAP estimation by default. Software implementations (GGUM2004 and successors) support it as a built-in option, and the operational cost relative to MML is small.
Examine item-location estimates and category thresholds for plausibility. GGUM’s flexibility makes it easy to fit pathological solutions where item locations or thresholds drift to extreme values; sanity-checking against substantive theory is part of the responsible workflow.
Treat the discrimination parameter with care. The peakedness of the response function around the item location affects the model’s sensitivity to mid-range respondents; very high discriminations produce sharply peaked response functions that may overfit and produce poor cross-validation.
For high-stakes applications, run MCMC as a robustness check on the MMAP solution. If the MCMC posterior modes match the MMAP estimates and the posterior intervals are well-behaved, the MMAP solution is credible. If the MCMC posterior is multimodal or has wide tails, the MMAP point estimate is hiding important uncertainty.
Report the estimation method and the prior specification. Like any Bayesian or quasi-Bayesian estimator, MMAP results depend on the prior; a hidden prior is a methodological liability.

Where this fits in the broader IRT-estimation literature

Item parameter estimation methodology is a quietly active subfield of psychometrics, and the GGUM is one of several models where the estimation problem is harder than the cumulative-IRT default. The pattern that recurs across cumulative IRT estimation, hierarchical Bayesian models, and unfolding models like the GGUM is the same: maximum-likelihood methods are tractable but unstable in difficult conditions, full Bayesian methods are stable but computationally expensive, and modal posterior methods (MMAP, MAP, ridge-regularized ML) often deliver most of the stability benefit at most of the computational economy.

The Roberts-Thompson contribution is one specific instance of this generalization: in the GGUM context, MMAP is the practical sweet spot. Whether the same logic applies to other polytomous unfolding models (e.g., extensions to multidimensional unfolding) and to more recent nonparametric variants is an open question, but the structural argument — that prior regularization is a cheap way to stabilize complex likelihoods — generalizes broadly.

Frequently Asked Questions

What is the difference between cumulative and unfolding IRT models?

Cumulative models assume the probability of endorsing a higher response category increases monotonically with the latent trait. Unfolding models like the GGUM assume the probability peaks at the item’s location and decreases in both directions. Cumulative models suit ability tests; unfolding models suit attitude items where extreme respondents on either side may disagree with moderate statements.

When should I use the GGUM instead of a cumulative polytomous model?

When the items measure attitudes, preferences, or judgments where the response function is non-monotonic. A graded-response model fit to attitude data with non-monotonic items will produce systematically biased item parameters and ability estimates; the GGUM handles the non-monotonicity correctly. Use the GGUM when the item content is non-cumulative; use cumulative models when it is cumulative.

What is MMAP estimation?

Marginal maximum a posteriori estimation places a prior on the item parameters, integrates over the latent trait distribution to obtain marginal item-parameter posteriors, and finds the mode of the joint posterior. It combines MML’s marginal-integration step with the prior regularization that makes Bayesian estimation stable, without requiring full Bayesian sampling.

How does MMAP compare to MML and MCMC?

MMAP outperforms MML in accuracy and standard-error stability, particularly when response categories are few or item locations are extreme. MMAP performs comparably to MCMC in accuracy at substantially lower computational cost. MCMC retains an advantage when full posterior uncertainty quantification is needed; MMAP supplies point estimates and standard errors but not full posterior intervals.

What software supports GGUM estimation?

The GGUM2004 software developed by Roberts and colleagues is the canonical implementation; later versions and the R package GGUM support MMAP and MCMC estimation. Modern probabilistic programming frameworks (Stan, PyMC) can fit GGUM via custom model code if more flexible Bayesian inference is needed.

References

Andrich, D., & Luo, G. (1993). A hyperbolic cosine latent trait model for unfolding dichotomous single-stimulus responses. Applied Psychological Measurement, 17(3), 253–276. https://doi.org/10.1177/014662169301700307
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Journal of Educational and Behavioral Statistics, 24(4), 342–366. https://doi.org/10.3102/10769986024004342
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24(1), 3–32. https://doi.org/10.1177/01466216000241001
Roberts, J. S., & Thompson, V. M. (2011). Marginal maximum a posteriori item parameter estimation for the generalized graded unfolding model. Applied Psychological Measurement, 35(4), 259–279. https://doi.org/10.1177/0146621610392565

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Technological Advances in Psychology

Computerized Adaptive Testing Explained

If you've taken the GRE, GMAT, or certain professional certification exams, you may have noticed something odd: the questions seemed to adjust to your level.…

Feb 24, 2026

Statistical Methods and Data Analysis

Item Response Theory: How Modern Tests Work

Every time you take a standardized test — an IQ assessment, a college entrance exam, a professional certification — the questions have been calibrated using…

Nov 18, 2025

Psychological Measurement and Testing

Psychometrics: The Science of Psychological Measurement

Psychometrics, a specialized branch within psychology, is dedicated to the theory and methodology of psychological measurement. This discipline encompasses the development and refinement of testing…

Feb 27, 2025

Statistical Methods and Data Analysis

Bridging Psychology and Psychometrics

In 2024, Psychometrika ran an unusual exchange. Three senior psychometricians — Klaas Sijtsma, Jules Ellis, and Denny Borsboom — published a focus article arguing that…

Dec 19, 2024

Statistical Methods and Data Analysis

Differential Item Functioning and Response Process

A test item that scores differently for two groups of equally able examinees is called a differential item functioning (DIF) item, and identifying these items…

Dec 16, 2024

Item Parameter Estimation for GGUM

Why unfolding models are different

The estimation problem

What Roberts and Thompson (2011) found

Practical workflow for GGUM users

Where this fits in the broader IRT-estimation literature

Frequently Asked Questions

What is the difference between cumulative and unfolding IRT models?