Lawrence T. DeCarlo’s recent article introduces a psychological framework for mixed-format exams, combining signal detection theory (SDT) for multiple-choice items and item response theory (IRT) for open-ended items. This fusion allows for a unified model that captures the nuances of each item type while providing insights into the underlying cognitive processes of examinees.
Background
Mixed-format exams, commonly used in large-scale assessments, present a challenge for researchers seeking to model responses across different item types. Historically, multiple-choice items have been analyzed using frameworks like signal detection theory, while open-ended items are typically modeled using item response theory. DeCarlo’s work builds on these approaches, introducing a method to unify them through the probability of knowing, a concept that bridges both models.
Key Insights
- Unified Framework: The article demonstrates how the SDT choice model and IRT sequential logit model can be integrated into a single framework. This approach captures latent states such as “know” and “don’t know” to analyze responses across item types.
- Psychological Processes: By modeling both item types simultaneously, the approach highlights differences in the cognitive processes involved in multiple-choice and open-ended responses. This sheds light on how examinees interact with each type of item.
- Estimation Benefits: Fitting the SDT and IRT models together offers potential computational advantages and allows for the examination of shared covariates, improving the overall utility of the framework.
Significance
This fusion of SDT and IRT models represents a significant step forward in psychometric analysis. By addressing the differences and connections between item types, the framework provides a deeper understanding of examinee behavior. This has implications for designing fairer and more reliable assessments, particularly in international exams where mixed-format tests are prevalent.
Future Directions
Future research could focus on expanding the application of this model to other testing contexts, including formative assessments or specialized exams. Additionally, exploring how this framework performs with diverse populations and item designs could further validate its effectiveness and versatility.
Conclusion
DeCarlo’s work offers a robust framework for analyzing mixed-format exams by integrating SDT and IRT models. This unified approach not only enhances our understanding of psychological processes in test-taking but also opens the door to more comprehensive and equitable assessments.
Reference:
DeCarlo, L. T. (2024). Fused SDT/IRT Models for Mixed-Format Exams. Educational and Psychological Measurement, 84(6), 1076-1106. https://doi.org/10.1177/00131644241235333
Modern Intelligence Testing: Principles and Practice
Intelligence testing has evolved significantly since Alfred Binet developed the first practical IQ test in 1905. Modern instruments like the Wechsler scales (WAIS-V for adults, WISC-V for children) and the Stanford-Binet Intelligence Scales (SB5) are built on decades of psychometric research, normative data collection, and factor-analytic refinement.
Key Takeaways
- This typically achieves the same measurement precision as a fixed test using 50-80% fewer items.
- This typically achieves the same measurement precision as a fixed test using 50-80% fewer items."
}
}
]
} - Major IQ tests achieve internal consistency coefficients above 0.95 for composite scores and test-retest reliability above 0.90, making them among the most reliable instruments in all of psychology.
- Educational and Psychological Measurement, 84(6), 1076-1106.
Contemporary IQ tests typically measure multiple cognitive domains organized according to the Cattell-Horn-Carroll (CHC) theory of cognitive abilities. Rather than producing a single number, they provide a profile of strengths and weaknesses across domains such as verbal comprehension, fluid reasoning, working memory, processing speed, and visual-spatial processing. This profile approach is more clinically useful than a single Full Scale IQ score, as it can identify specific learning disabilities, cognitive strengths, and patterns associated with various neurological conditions.
Test reliability — the consistency of measurement — is a critical quality indicator. Major IQ tests achieve internal consistency coefficients above 0.95 for composite scores and test-retest reliability above 0.90, making them among the most reliable instruments in all of psychology. However, reliability does not guarantee validity: ongoing research examines whether these tests adequately capture the full range of cognitive abilities valued across different cultures and contexts.
Implications for Test Users and Practitioners
These findings have direct implications for professionals who administer, interpret, or rely on cognitive test results. Clinicians should report confidence intervals alongside point estimates, use profile analysis to identify meaningful strengths and weaknesses rather than relying solely on Full Scale IQ, and consider the measurement properties of the specific subtests being interpreted. Score differences that fall within the standard error of measurement should not be over-interpreted as meaningful patterns.
For organizational contexts (educational placement, employment selection, forensic evaluation), understanding measurement properties helps prevent both over-reliance on test scores and inappropriate dismissal of their utility. The best practice is to integrate cognitive test results with other sources of information — behavioral observations, developmental history, academic records, and adaptive functioning — rather than making high-stakes decisions based on any single score.
Frequently Asked Questions
What is item response theory?
Item Response Theory (IRT) is a modern psychometric framework that models the relationship between a person’s latent ability and their probability of answering test items correctly. Unlike classical test theory, IRT provides item-level analysis, enables computerized adaptive testing, and allows test scores to be compared across different test forms.
How does computerized adaptive testing work?
Computerized adaptive testing (CAT) uses IRT to select test items in real-time based on the test-taker’s responses. After each answer, the algorithm estimates ability and selects the next item that provides maximum information at that ability level. This typically achieves the same measurement precision as a fixed test using 50-80% fewer items.
People Also Ask
What is interpreting differential item functioning with response process data?
Understanding differential item functioning (DIF) is critical for ensuring fairness in assessments across diverse groups. A recent study by Li et al. introduces a method to enhance the interpretability of DIF items by incorporating response process data. This approach aims to improve equity in measurement by examining how participants engage with test items, providing deeper insights into the factors influencing DIF outcomes.
Read more →What is group-theoretical symmetries in item response theory (irt)?
Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.
Read more →What is theoretical framework for bayesian hierarchical 2plm with advi?
This article discusses a Bayesian hierarchical framework for the Two-Parameter Logistic (2PL) Item Response Theory (IRT) model. By introducing hierarchical priors for both respondent abilities and item parameters, this method offers a detailed perspective on latent traits. Additionally, the use of Automatic Differentiation Variational Inference (ADVI) makes the approach scalable and practical for larger datasets.
Read more →What is simulated irt dataset generator v1.00 at cogn-iq.org?
The Dataset Generator available at Cogn-IQ.org is a powerful resource designed for researchers and practitioners working with Item Response Theory (IRT). This tool simulates datasets tailored for psychometric analysis, enabling users to explore a range of testing scenarios with customizable item and subject characteristics. It supports the widely used 2-Parameter Logistic (2PL) model, providing flexibility and precision for diverse applications.
Read more →Why is background important?
Mixed-format exams, commonly used in large-scale assessments, present a challenge for researchers seeking to model responses across different item types. Historically, multiple-choice items have been analyzed using frameworks like signal detection theory, while open-ended items are typically modeled using item response theory. DeCarlo’s work builds on these approaches, introducing a method to unify them through the probability of knowing, a concept that bridges both models.
How does key insights work in practice?
Unified Framework: The article demonstrates how the SDT choice model and IRT sequential logit model can be integrated into a single framework. This approach captures latent states such as "know" and "don’t know" to analyze responses across item types. Psychological Processes: By modeling both item types simultaneously, the approach highlights differences

