Technological Advances in Psychology

Computerized Adaptive Testing Techniques

Computerized Adaptive Testing: Exploring Enhanced Techniques
Published: May 16, 2023 · Last reviewed:
📖1,874 words⏱8 min read📚4 references cited
The Anselmi, Robusto, and Cristante (2023) paper in Applied Psychological Measurement proposes a methodological refinement to a setting that sits awkwardly between two well-developed bodies of psychometric theory: batteries of unidimensional tests, where each test measures a single ability but the abilities are correlated. The standard CAT machinery treats each test in such a battery as if its ability were independent of the others, even though in practice the tests’ abilities co-vary substantially. The proposed procedure exploits these correlations by feeding the running estimates of all the battery’s abilities into the prior used at each step of every test. The result, demonstrated in two simulation studies, is more accurate ability estimates in fixed-length CATs and shorter tests in variable-length CATs, with gains that scale directly with the strength of the inter-ability correlations.

The CAT design space and where unidimensional batteries fit

Computerized adaptive testing—originated in the 1970s and matured through the work synthesized by Wainer et al. (2000)—exploits item response theory to select each successive item based on the examinee’s running ability estimate, terminating when measurement precision reaches a target threshold or when a fixed item count is exhausted. The standard formulation is fully unidimensional: one ability θ, one item bank, one termination criterion. Item selection is governed by maximizing Fisher information at the current θ estimate (van der Linden & Pashley, 2009), and ability estimation is typically by maximum likelihood, weighted likelihood, or expected a posteriori (EAP) procedures with a noninformative or weakly informative prior.

The unidimensional CAT framework breaks down in two distinct ways. The first is when the construct is genuinely multidimensional: a single test measures multiple correlated abilities simultaneously (e.g., a math test that taps both procedural fluency and abstract reasoning). For this case, multidimensional IRT (MIRT)-based adaptive testing has been developed (Reckase, 2009), which explicitly models the multidimensional ability vector and selects items to maximize information across all dimensions simultaneously.

The second case—the one Anselmi et al. (2023) target—is more common in practice but less developed theoretically: multiple separate unidimensional tests administered as a battery. Each test is internally unidimensional and has its own item bank; the abilities are distinct constructs but typically correlated. Examples include cognitive-ability batteries (verbal, quantitative, spatial subtests), educational achievement batteries (reading, math, science subtests), and clinical screening batteries (depression, anxiety, somatic symptoms). The standard practice has been to run each subtest as an independent unidimensional CAT, ignoring the cross-subtest correlations during testing. This wastes information.

The Anselmi-Robusto-Cristante procedure

The 2023 paper’s contribution is a CAT procedure that updates ability estimates jointly across the battery rather than independently. The mechanism:

  1. An empirical prior over the battery’s ability vector is maintained throughout testing. This prior reflects the joint distribution of all the battery’s abilities, including their correlations with each other.
  2. When an examinee responds to an item from any subtest, the response updates that subtest’s ability estimate through the standard Bayesian posterior calculation.
  3. The updated estimate is then propagated to the prior on every other ability in the battery, with the propagation strength determined by the empirical correlations among abilities.
  4. Subsequent item selections in any subtest use the updated joint prior, so each subtest’s CAT effectively benefits from information accumulated in the other subtests.

The intuition is straightforward. Suppose verbal and quantitative abilities correlate at r = 0.6 in the population, and an examinee answers three quantitative items correctly at high difficulty. Under standard independent CATs, the verbal subtest starts from scratch with a flat prior. Under the proposed procedure, the verbal subtest starts with a prior shifted toward higher ability—not because the examinee has demonstrated verbal ability directly, but because the inter-ability correlation makes high verbal ability more likely given the observed quantitative performance. The verbal subtest can then converge faster and select more diagnostic items earlier.

The simulation findings

The 2023 paper presents two simulation studies comparing the proposed procedure against the standard independent-CAT baseline. The key findings:

Fixed-length CATs. When the total test length is fixed (the analyst must administer N items per subtest regardless of precision), the proposed procedure produces more accurate ability estimates—lower bias and lower mean squared error against the true ability—than the independent-CAT baseline. The accuracy gain is concentrated in the early items of each subtest, where the cross-subtest information from the joint prior is most informative; late items contribute roughly equivalent information under both procedures.

Variable-length CATs. When the test terminates upon reaching a target measurement precision (typically a posterior SE threshold), the proposed procedure produces shorter tests for the same precision target. Test-length reduction depends on inter-ability correlations and termination criterion strength, but the direction of the effect is consistent: harvesting inter-subtest information allows each subtest to reach its precision target with fewer items.

The correlation-dependence pattern. Both accuracy gains (in fixed-length) and length reductions (in variable-length) scale with the inter-ability correlation. When abilities are weakly correlated (r < 0.3), the proposed procedure offers only modest improvements; the prior-propagation mechanism has little to work with. When abilities are strongly correlated (r > 0.6), gains can be substantial—on the order of 15-25% test-length reduction or comparable RMSE improvements. The boundary case of r = 0 reduces the procedure to the independent-CAT baseline.

Why this is methodologically interesting

The proposed procedure occupies a useful middle ground between two existing alternatives. True MIRT-CAT (Reckase, 2009) requires fitting a multidimensional IRT model to the entire battery, with all items calibrated against the multidimensional ability vector. This is theoretically clean but operationally demanding: the calibration sample must be large enough to estimate the multidimensional structure, the item parameters are more numerous, and the fitting process is computationally heavier. Many real-world test batteries cannot meet these requirements.

Independent unidimensional CAT is operationally simple but informationally wasteful when subtest abilities are correlated. The Anselmi et al. (2023) procedure preserves the operational simplicity of independent unidimensional CAT—each subtest still has its own unidimensional item bank with conventional unidimensional calibration—while recovering most of the cross-subtest information advantage that MIRT-CAT provides. The cost is the requirement to know (or estimate from a calibration sample) the inter-ability correlations.

The procedure is also backward-compatible with existing CAT infrastructure. A test publisher who has already deployed independent unidimensional subtest CATs can adopt the Anselmi et al. procedure without re-calibrating items; only the ability-estimation and prior-propagation logic needs updating. This deployment ease is the kind of feature that determines whether a methodological proposal gets adopted at scale.

Practical considerations and limits

Three practical considerations qualify the procedure’s applicability.

Reliability of the inter-ability correlations. The procedure’s gain depends on accurate estimates of how the battery’s abilities correlate. These estimates come from a calibration sample that is itself subject to sampling error. Anselmi et al. discuss this sensitivity briefly; the operational implication is that calibration samples for batteries using this procedure should be large enough to estimate the correlation matrix with reasonable precision, particularly when the battery has many subtests (the number of correlations grows quadratically with subtest count).

Item-bank exposure. CAT systems typically include item-exposure controls to prevent overuse of high-information items, which would compromise test security. The Anselmi et al. procedure changes the distribution of which items get selected (because the joint prior shifts the starting point and the running estimate trajectory), and item-exposure profiles will need to be re-evaluated under the new procedure rather than assumed to match those of the independent-CAT baseline.

Construct-validity assumption. The procedure assumes that the inter-subtest correlations are stable population properties—that they don’t vary systematically across subgroups in ways that would bias the prior propagation differentially. In practice, intelligence-test correlation matrices have been shown to vary modestly across age, sex, and cultural groups; whether the procedure’s gains are robust to this variation, or whether subgroup-specific calibration is needed, is an empirical question the simulations cannot answer.

Position in the broader CAT methodology landscape

The Anselmi et al. (2023) paper fits into a research program that has been adapting CAT methodology to increasingly realistic measurement situations. The original Wainer et al. (2000) framework was unidimensional and assumed item parameters known with certainty. Subsequent work has relaxed these assumptions in various directions: handling item-parameter uncertainty (van der Linden & Pashley, 2009), accommodating multidimensional constructs (Reckase, 2009), incorporating polytomous item types, and adapting termination rules to balance precision against item-exposure constraints.

The 2023 contribution targets a specific use case—the multi-test battery—that has been operationally widespread but methodologically under-served. By staying within the unidimensional-per-subtest item-calibration framework while exploiting inter-subtest information, it occupies an under-developed but practically important niche.

Frequently asked questions

What is computerized adaptive testing (CAT)?

Computerized adaptive testing selects each successive item based on the examinee’s running ability estimate, terminating when measurement precision reaches a target threshold or when a fixed item count is exhausted. Item selection typically maximizes Fisher information at the current ability estimate, and ability is estimated by maximum likelihood, weighted likelihood, or Bayesian (EAP) procedures.

What is a battery of unidimensional tests?

A battery of unidimensional tests is a set of separate tests, each measuring its own internally unidimensional construct, administered together as a unit. Examples include cognitive-ability batteries (verbal, quantitative, spatial), educational achievement batteries (reading, math, science), and clinical screening batteries (depression, anxiety, somatic symptoms). The subtests are distinct constructs but typically correlate substantially with one another.

How does the Anselmi-Robusto-Cristante procedure differ from standard CAT?

Standard practice runs each subtest in a battery as an independent unidimensional CAT, ignoring cross-subtest correlations. The Anselmi et al. (2023) procedure maintains an empirical prior over the joint ability vector, updates each subtest’s ability estimate after every response, and propagates the update to the priors of all other subtests using the empirical inter-ability correlations. Subsequent item selections in any subtest then benefit from information accumulated in the others.

How much does the procedure improve test efficiency?

Gains depend on inter-subtest correlations. With weak correlations (r < 0.3) the improvement is modest. With strong correlations (r > 0.6) gains can reach 15–25% test-length reduction in variable-length CATs or comparable RMSE improvements in fixed-length CATs. The procedure reduces to standard independent CAT when correlations are zero.

Does the procedure require multidimensional IRT calibration?

No. Each subtest’s items are calibrated under conventional unidimensional IRT, exactly as for an independent CAT. The only additional input is an estimate of the inter-ability correlation matrix, typically from the calibration sample. This is what makes the procedure operationally lighter than full multidimensional IRT-CAT (Reckase, 2009).

Can the procedure be added to an existing CAT system?

Yes, with caveats. The item-bank calibration does not change, so existing item parameters can be reused. The ability-estimation and prior-propagation logic must be updated, item-exposure profiles must be re-evaluated under the new selection trajectory, and the correlation matrix must be estimated with sufficient precision—particularly when the battery has many subtests, since the number of pairwise correlations grows quadratically.

References

Related Research

IQ Scores and Ranges

What Is Mensa? Membership and Testing

Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…

Mar 25, 2026
Psychometric Testing and IQ Assessment

IQ Test Anxiety: How Stress Affects Your Score

You sit down for an IQ assessment. Your palms are sweating, your mind races, and the moment you see the first timed task, your thoughts…

Mar 22, 2026
Psychometric Testing and IQ Assessment

Raven's Progressive Matrices: Culture-Fair IQ Test

Among the hundreds of cognitive tests developed over the past century, few have achieved the global reach of Raven's Progressive Matrices. Administered in settings from…

Mar 19, 2026
Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026
Technological Advances in Psychology

Computerized Adaptive Testing Explained

If you've taken the GRE, GMAT, or certain professional certification exams, you may have noticed something odd: the questions seemed to adjust to your level.…

Feb 24, 2026

People Also Ask

What is psychometrics: the science of psychological measurement?

The discipline of psychometrics emerged from two distinct yet complementary intellectual traditions. The first, championed by figures such as Charles Darwin, Francis Galton, and James McKeen Cattell, emphasized the study of individual differences and sought to develop systematic methods for their quantification. The second, rooted in the psychophysical research of Johann Friedrich Herbart, Ernst Heinrich Weber, Gustav Fechner, and Wilhelm Wundt, laid the foundation for the empirical investigation of human perception, cognition, and consciousness. Together, these two traditions converged to form the scientific underpinnings of modern psychological measurement.

Read more →
What is group-theoretical symmetries in item response theory (irt)?

Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.

Read more →
What are cognitive ability and optimism bias?

This post examines findings from Chris Dawson’s research on the connection between cognitive ability and optimism bias in financial decision-making. Using data from over 36,000 individuals in the U.K., the study highlights how cognitive ability influences unrealistic optimism, particularly in financial expectations versus actual outcomes.

Read more →
What is sequential generalized likelihood ratio tests for item monitoring?

Hyeon-Ah Kang’s 2023 article in Psychometrika introduces innovative methods for monitoring item parameters in psychometric testing. With the growing prevalence of online assessments, the stability and reliability of test items are paramount. This research focuses on sequential generalized likelihood ratio tests, a technique designed to track and evaluate shifts in item parameters effectively.

Read more →
Why is background important?

Computerized Adaptive Testing has been a widely used method in psychological and educational assessment, known for tailoring test items to an individual's ability level. Traditional CAT methods, however, often treat each ability estimation independently, missing opportunities to leverage correlations among measured abilities. Anselmi et al.'s research addresses this limitation by introducing a procedure that updates not only the ability being tested but also all related abilities within the battery, using a shared empirical prior.

How does key insights work in practice?

Integrated Ability Estimation: The proposed method updates all ability estimates dynamically, allowing the test to account for relationships among abilities as responses are collected. Enhanced Accuracy and Efficiency: Simulation studies showed improved accuracy for fixed-length CATs and reduced test lengths for variable-length CATs using this approach. Correlation-Driven Performance: The benefits of the

📋 Cite This Article

Jouve, X. (2023, May 16). Computerized Adaptive Testing Techniques. PsychoLogic. https://www.psychologic.online/computerized-adaptive-testing-techniques/

Leave a Reply