What is significance?

The approach presented by Anselmi et al. represents a meaningful step forward in adaptive testing research. By leveraging the interplay between related abilities, their method improves both the precision and efficiency of CAT procedures. This advancement could lead to more effective applications in fields such as education, psychology, and recruitment testing, where adaptive methods are already well-established.

What are future directions?

While the simulation results are promising, further research is necessary to validate the method in real-world settings. Additional studies could explore the approach's applicability across diverse populations and test designs. Moreover, understanding the limitations of its dependence on ability correlations will be important for determining the contexts in which this method is most effective.

Anselmi, Robusto, and Cristante (2023) provide a forward-looking contribution to the field of adaptive testing. Their method for integrating unidimensional test batteries demonstrates measurable improvements in test performance, with the potential to refine how abilities are assessed. Ongoing validation efforts will determine the full impact of this approach in practical applications.

Anselmi, P., Robusto, E., & Cristante, F. (2023). Enhancing Computerized Adaptive Testing with Batteries of Unidimensional Tests. Applied Psychological Measurement, 47(3), 167-182. https://doi.org/10.1177/01466216231165301

Computerized Adaptive Testing Techniques

Published: May 16, 2023 · Last reviewed: May 6, 2026

📖1,874 words⏱8 min read📚4 references cited

The Anselmi, Robusto, and Cristante (2023) paper in Applied Psychological Measurement proposes a methodological refinement to a setting that sits awkwardly between two well-developed bodies of psychometric theory: batteries of unidimensional tests, where each test measures a single ability but the abilities are correlated. The standard CAT machinery treats each test in such a battery as if its ability were independent of the others, even though in practice the tests’ abilities co-vary substantially. The proposed procedure exploits these correlations by feeding the running estimates of all the battery’s abilities into the prior used at each step of every test. The result, demonstrated in two simulation studies, is more accurate ability estimates in fixed-length CATs and shorter tests in variable-length CATs, with gains that scale directly with the strength of the inter-ability correlations.

The CAT design space and where unidimensional batteries fit

Computerized adaptive testing—originated in the 1970s and matured through the work synthesized by Wainer et al. (2000)—exploits item response theory to select each successive item based on the examinee’s running ability estimate, terminating when measurement precision reaches a target threshold or when a fixed item count is exhausted. The standard formulation is fully unidimensional: one ability θ, one item bank, one termination criterion. Item selection is governed by maximizing Fisher information at the current θ estimate (van der Linden & Pashley, 2009), and ability estimation is typically by maximum likelihood, weighted likelihood, or expected a posteriori (EAP) procedures with a noninformative or weakly informative prior.

The unidimensional CAT framework breaks down in two distinct ways. The first is when the construct is genuinely multidimensional: a single test measures multiple correlated abilities simultaneously (e.g., a math test that taps both procedural fluency and abstract reasoning). For this case, multidimensional IRT (MIRT)-based adaptive testing has been developed (Reckase, 2009), which explicitly models the multidimensional ability vector and selects items to maximize information across all dimensions simultaneously.

The second case—the one Anselmi et al. (2023) target—is more common in practice but less developed theoretically: multiple separate unidimensional tests administered as a battery. Each test is internally unidimensional and has its own item bank; the abilities are distinct constructs but typically correlated. Examples include cognitive-ability batteries (verbal, quantitative, spatial subtests), educational achievement batteries (reading, math, science subtests), and clinical screening batteries (depression, anxiety, somatic symptoms). The standard practice has been to run each subtest as an independent unidimensional CAT, ignoring the cross-subtest correlations during testing. This wastes information.

The Anselmi-Robusto-Cristante procedure

The 2023 paper’s contribution is a CAT procedure that updates ability estimates jointly across the battery rather than independently. The mechanism:

An empirical prior over the battery’s ability vector is maintained throughout testing. This prior reflects the joint distribution of all the battery’s abilities, including their correlations with each other.
When an examinee responds to an item from any subtest, the response updates that subtest’s ability estimate through the standard Bayesian posterior calculation.
The updated estimate is then propagated to the prior on every other ability in the battery, with the propagation strength determined by the empirical correlations among abilities.
Subsequent item selections in any subtest use the updated joint prior, so each subtest’s CAT effectively benefits from information accumulated in the other subtests.

The intuition is straightforward. Suppose verbal and quantitative abilities correlate at r = 0.6 in the population, and an examinee answers three quantitative items correctly at high difficulty. Under standard independent CATs, the verbal subtest starts from scratch with a flat prior. Under the proposed procedure, the verbal subtest starts with a prior shifted toward higher ability—not because the examinee has demonstrated verbal ability directly, but because the inter-ability correlation makes high verbal ability more likely given the observed quantitative performance. The verbal subtest can then converge faster and select more diagnostic items earlier.

The simulation findings

The 2023 paper presents two simulation studies comparing the proposed procedure against the standard independent-CAT baseline. The key findings:

Fixed-length CATs. When the total test length is fixed (the analyst must administer N items per subtest regardless of precision), the proposed procedure produces more accurate ability estimates—lower bias and lower mean squared error against the true ability—than the independent-CAT baseline. The accuracy gain is concentrated in the early items of each subtest, where the cross-subtest information from the joint prior is most informative; late items contribute roughly equivalent information under both procedures.

Variable-length CATs. When the test terminates upon reaching a target measurement precision (typically a posterior SE threshold), the proposed procedure produces shorter tests for the same precision target. Test-length reduction depends on inter-ability correlations and termination criterion strength, but the direction of the effect is consistent: harvesting inter-subtest information allows each subtest to reach its precision target with fewer items.

The correlation-dependence pattern. Both accuracy gains (in fixed-length) and length reductions (in variable-length) scale with the inter-ability correlation. When abilities are weakly correlated (r < 0.3), the proposed procedure offers only modest improvements; the prior-propagation mechanism has little to work with. When abilities are strongly correlated (r > 0.6), gains can be substantial—on the order of 15-25% test-length reduction or comparable RMSE improvements. The boundary case of r = 0 reduces the procedure to the independent-CAT baseline.

Why this is methodologically interesting

The proposed procedure occupies a useful middle ground between two existing alternatives. True MIRT-CAT (Reckase, 2009) requires fitting a multidimensional IRT model to the entire battery, with all items calibrated against the multidimensional ability vector. This is theoretically clean but operationally demanding: the calibration sample must be large enough to estimate the multidimensional structure, the item parameters are more numerous, and the fitting process is computationally heavier. Many real-world test batteries cannot meet these requirements.

Independent unidimensional CAT is operationally simple but informationally wasteful when subtest abilities are correlated. The Anselmi et al. (2023) procedure preserves the operational simplicity of independent unidimensional CAT—each subtest still has its own unidimensional item bank with conventional unidimensional calibration—while recovering most of the cross-subtest information advantage that MIRT-CAT provides. The cost is the requirement to know (or estimate from a calibration sample) the inter-ability correlations.

The procedure is also backward-compatible with existing CAT infrastructure. A test publisher who has already deployed independent unidimensional subtest CATs can adopt the Anselmi et al. procedure without re-calibrating items; only the ability-estimation and prior-propagation logic needs updating. This deployment ease is the kind of feature that determines whether a methodological proposal gets adopted at scale.

Practical considerations and limits

Three practical considerations qualify the procedure’s applicability.

Reliability of the inter-ability correlations. The procedure’s gain depends on accurate estimates of how the battery’s abilities correlate. These estimates come from a calibration sample that is itself subject to sampling error. Anselmi et al. discuss this sensitivity briefly; the operational implication is that calibration samples for batteries using this procedure should be large enough to estimate the correlation matrix with reasonable precision, particularly when the battery has many subtests (the number of correlations grows quadratically with subtest count).

Item-bank exposure. CAT systems typically include item-exposure controls to prevent overuse of high-information items, which would compromise test security. The Anselmi et al. procedure changes the distribution of which items get selected (because the joint prior shifts the starting point and the running estimate trajectory), and item-exposure profiles will need to be re-evaluated under the new procedure rather than assumed to match those of the independent-CAT baseline.

Construct-validity assumption. The procedure assumes that the inter-subtest correlations are stable population properties—that they don’t vary systematically across subgroups in ways that would bias the prior propagation differentially. In practice, intelligence-test correlation matrices have been shown to vary modestly across age, sex, and cultural groups; whether the procedure’s gains are robust to this variation, or whether subgroup-specific calibration is needed, is an empirical question the simulations cannot answer.

Position in the broader CAT methodology landscape

The Anselmi et al. (2023) paper fits into a research program that has been adapting CAT methodology to increasingly realistic measurement situations. The original Wainer et al. (2000) framework was unidimensional and assumed item parameters known with certainty. Subsequent work has relaxed these assumptions in various directions: handling item-parameter uncertainty (van der Linden & Pashley, 2009), accommodating multidimensional constructs (Reckase, 2009), incorporating polytomous item types, and adapting termination rules to balance precision against item-exposure constraints.

The 2023 contribution targets a specific use case—the multi-test battery—that has been operationally widespread but methodologically under-served. By staying within the unidimensional-per-subtest item-calibration framework while exploiting inter-subtest information, it occupies an under-developed but practically important niche.

Frequently asked questions

What is computerized adaptive testing (CAT)?

Computerized adaptive testing selects each successive item based on the examinee’s running ability estimate, terminating when measurement precision reaches a target threshold or when a fixed item count is exhausted. Item selection typically maximizes Fisher information at the current ability estimate, and ability is estimated by maximum likelihood, weighted likelihood, or Bayesian (EAP) procedures.

What is a battery of unidimensional tests?

A battery of unidimensional tests is a set of separate tests, each measuring its own internally unidimensional construct, administered together as a unit. Examples include cognitive-ability batteries (verbal, quantitative, spatial), educational achievement batteries (reading, math, science), and clinical screening batteries (depression, anxiety, somatic symptoms). The subtests are distinct constructs but typically correlate substantially with one another.

How does the Anselmi-Robusto-Cristante procedure differ from standard CAT?

Standard practice runs each subtest in a battery as an independent unidimensional CAT, ignoring cross-subtest correlations. The Anselmi et al. (2023) procedure maintains an empirical prior over the joint ability vector, updates each subtest’s ability estimate after every response, and propagates the update to the priors of all other subtests using the empirical inter-ability correlations. Subsequent item selections in any subtest then benefit from information accumulated in the others.

How much does the procedure improve test efficiency?

Gains depend on inter-subtest correlations. With weak correlations (r < 0.3) the improvement is modest. With strong correlations (r > 0.6) gains can reach 15–25% test-length reduction in variable-length CATs or comparable RMSE improvements in fixed-length CATs. The procedure reduces to standard independent CAT when correlations are zero.

Does the procedure require multidimensional IRT calibration?

No. Each subtest’s items are calibrated under conventional unidimensional IRT, exactly as for an independent CAT. The only additional input is an estimate of the inter-ability correlation matrix, typically from the calibration sample. This is what makes the procedure operationally lighter than full multidimensional IRT-CAT (Reckase, 2009).

Can the procedure be added to an existing CAT system?

Yes, with caveats. The item-bank calibration does not change, so existing item parameters can be reused. The ability-estimation and prior-propagation logic must be updated, item-exposure profiles must be re-evaluated under the new selection trajectory, and the correlation matrix must be estimated with sufficient precision—particularly when the battery has many subtests, since the number of pairwise correlations grows quadratically.

References

Anselmi, P., Robusto, E., & Cristante, F. (2023). Enhancing computerized adaptive testing with batteries of unidimensional tests. Applied Psychological Measurement, 47(3), 167-182. https://doi.org/10.1177/01466216231165301
Reckase, M. D. (2009). Computerized adaptive testing using MIRT. In Multidimensional Item Response Theory. Springer. https://doi.org/10.1007/978-0-387-89976-3_10
van der Linden, W. J., & Pashley, P. J. (2009). Item selection and ability estimation in adaptive testing. In Elements of Adaptive Testing. Springer. https://doi.org/10.1007/978-0-387-85461-8_1
Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., & Mislevy, R. J. (2000). Computerized Adaptive Testing: A Primer (2nd ed.). Routledge. https://doi.org/10.4324/9781410605931

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

IQ Scores and Ranges

What Is Mensa? Membership and Testing

Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…

Mar 25, 2026

Psychometric Testing and IQ Assessment

IQ Test Anxiety: How Stress Affects Your Score

You sit down for an IQ assessment. Your palms are sweating, your mind races, and the moment you see the first timed task, your thoughts…

Mar 22, 2026

Psychometric Testing and IQ Assessment

Raven's Progressive Matrices: Culture-Fair IQ Test

Among the hundreds of cognitive tests developed over the past century, few have achieved the global reach of Raven's Progressive Matrices. Administered in settings from…

Mar 19, 2026

Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026

Technological Advances in Psychology

Computerized Adaptive Testing Explained

If you've taken the GRE, GMAT, or certain professional certification exams, you may have noticed something odd: the questions seemed to adjust to your level.…

Feb 24, 2026

Computerized Adaptive Testing Techniques

The CAT design space and where unidimensional batteries fit

The Anselmi-Robusto-Cristante procedure

The simulation findings

Why this is methodologically interesting

Practical considerations and limits

Position in the broader CAT methodology landscape

Frequently asked questions

What is computerized adaptive testing (CAT)?

What is a battery of unidimensional tests?

How does the Anselmi-Robusto-Cristante procedure differ from standard CAT?

How much does the procedure improve test efficiency?

Does the procedure require multidimensional IRT calibration?

Can the procedure be added to an existing CAT system?

References

Related Research

What Is Mensa? Membership and Testing

IQ Test Anxiety: How Stress Affects Your Score

Raven's Progressive Matrices: Culture-Fair IQ Test

How to Interpret IQ Test Results

Computerized Adaptive Testing Explained

People Also Ask

Leave a Reply Cancel reply

The CAT design space and where unidimensional batteries fit

The Anselmi-Robusto-Cristante procedure

The simulation findings

Why this is methodologically interesting

Practical considerations and limits

Position in the broader CAT methodology landscape

Frequently asked questions

What is computerized adaptive testing (CAT)?

What is a battery of unidimensional tests?

How does the Anselmi-Robusto-Cristante procedure differ from standard CAT?

How much does the procedure improve test efficiency?

Does the procedure require multidimensional IRT calibration?

Can the procedure be added to an existing CAT system?

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply