Psychological Measurement and Testing

Assessing Verbal Intelligence with the IAW Test

Assessing Verbal Intelligence with the IAW Test
Published: April 7, 2023 · Last reviewed:
📖1,848 words8 min read📚4 references cited

Most vocabulary tests on standard intelligence batteries — the WAIS, the Stanford-Binet, the RIAS — present examinees with a target word and ask them to define it or pick the best definition from a multiple-choice list. The format is efficient, easy to score, and has decades of psychometric research behind it. It is also, in important ways, an unnatural cognitive task. Real verbal ability rarely involves selecting one of four definitions; it involves retrieving the right word for the meaning you have in mind. The I Am a Word (IAW) test, developed by Xavier Jouve and most recently revised in 2023, takes the inverse approach: present the meaning, the structural constraints, and the context, and ask the examinee to produce the word. The format change is small in description but has real psychometric and conceptual consequences.

What the IAW test actually does

The IAW test consists of 100 open-ended verbal items administered without a time limit. Each item presents structural and semantic clues — typically a definition or sentence-frame plus indications of word length, part of speech, or letter constraints — and the examinee types the target word. Several design choices distinguish it from conventional vocabulary measures:

  • Production rather than recognition. The examinee must retrieve the word from semantic memory, not select among presented alternatives. This eliminates the four-options-in-front-of-you cueing artifact that affects multiple-choice vocabulary scores.
  • No time pressure. The untimed format separates the construct of “verbal knowledge” from “verbal speed” — two related but distinguishable abilities that timed multiple-choice formats blend.
  • Multiple correct answers per item. Where natural language admits several semantically appropriate words for a single meaning slot, the scoring accepts any of the validated synonyms, reflecting how lexical access actually operates.
  • Automated scoring. Despite being open-ended, scoring is automated against a curated answer set, retaining the reliability advantages of objective scoring while accommodating linguistic flexibility.

The validation work, reported in Jouve (2023) using a sample of 1,083 examinees, found Cronbach’s alpha of .95 for internal consistency and a correlation of .83 with the Wechsler Adult Intelligence Scale–Third Edition (WAIS-III) Verbal Comprehension Index as well as a strong correlation with the Reynolds Intellectual Assessment Scales (RIAS) Verbal Intelligence Index. These numbers locate the IAW in the same psychometric neighborhood as established verbal-ability subtests of major commercial batteries.

The format question: why open-ended matters

The choice between multiple-choice and open-ended verbal testing is older than commercial intelligence assessment. Heim and Watts’s 1967 experiment in the British Journal of Educational Psychology directly compared the two formats on the same vocabulary content with the same examinees and reported that the formats produce non-trivially different score patterns. Multiple-choice items show systematic cueing effects: examinees can sometimes identify the correct definition by elimination of distractors, by partial recognition of the target word, or by drawing on test-wiseness rather than lexical knowledge. Open-ended formats remove these supports and arguably tap a more authentic representation of vocabulary depth.

The trade-off is in scoring: open-ended responses are harder to score reliably, particularly when the response space is large. Historically, this is why high-stakes commercial vocabulary tests have favored multiple-choice. The IAW’s automated scoring against a validated synonym set is the modern resolution to this trade-off — open-ended response acceptance with multiple-choice-grade scoring reliability.

What “verbal intelligence” actually means

Vocabulary measures are among the most reliable and most g-loaded of any cognitive task. Stanovich’s 1993 chapter in Advances in Child Development and Behavior argues that vocabulary functions as a particularly informative cognitive measure because it is, in effect, a distillation of accumulated language exposure. A child or adult’s working vocabulary reflects a long history of reading, conversation, and instruction, with each individual word a small data point in that history.

That gives vocabulary an unusual property among cognitive tests: it is highly stable, highly reliable, and substantially heritable, while also being directly responsive to environmental enrichment. The IAW, like other vocabulary measures, inherits this property. A high IAW score does not testify to fluid problem-solving in the moment but to a long arc of language exposure and retention.

In the Cattell-Horn-Carroll (CHC) framework, vocabulary tests load primarily on comprehension-knowledge (Gc), the broad ability that encompasses verbal-conceptual knowledge accumulated over time. Production-format vocabulary tests like the IAW are sometimes hypothesized to also tap long-term retrieval (Glr) components more strongly than recognition-format tests, because retrieval rather than recognition is the operative process. The IAW’s untimed format separates Glr efficiency from Glr access, which traditional speeded vocabulary tests blend.

How the IAW compares to standard battery vocabulary subtests

The validation correlation of .83 with the WAIS-III VCI is informative for what it implies about construct overlap. Two tests that correlate at .83 are measuring substantially the same underlying construct — they are not independent measures with weak overlap. Practical implications:

  • The IAW is unlikely to identify cognitive strengths or weaknesses that the WAIS VCI misses. Both tests are measuring essentially the same verbal-knowledge factor.
  • The IAW can serve as a stand-alone verbal-ability indicator. Where a full WAIS administration is impractical and only verbal ability is needed, the IAW’s psychometrics support its use as a faster, simpler alternative.
  • Score discrepancies between the IAW and a battery VCI deserve scrutiny. If a child or adult scores substantially higher on the IAW than on the WAIS VCI, the gap may reflect the format difference (production-friendly vs. recognition-friendly) and points toward strengths in retrieval or comfort with open-ended response formats.

The Reynolds Intellectual Assessment Scales (RIAS), discussed by Brueggemann, Reynolds, and Kamphaus (2006) in Gifted Education International, is one of several contemporary intelligence batteries that produce a Verbal Intelligence Index distinct from a nonverbal index. The RIAS VIX was the second criterion measure in the IAW validation, and the strong correlation with the RIAS VIX further supports the IAW’s positioning as a verbal-intelligence measure rather than a domain-specific vocabulary task.

Practical applications

Several use cases follow from the IAW’s psychometric properties:

  • Research where verbal ability is a covariate. Studies needing to control for or characterize verbal ability without administering a full IQ battery can use the IAW as a screening or matching variable.
  • Self-testing and educational settings. The untimed, open-ended format reduces the test-anxiety component that compresses scores in multiple-choice timed administration. Examinees self-pacing through open-ended items often perform at a level closer to their underlying ability.
  • Cross-population assessment. Open-ended production removes the reliance on familiarity with the multiple-choice testing format that disadvantages some examinees. The IAW is correspondingly more accessible to individuals with limited test-taking experience.
  • Tracking verbal development. Because vocabulary grows with continued language exposure, the IAW can be used to track changes in verbal ability over time, with the caveat that practice effects on the same items should be considered for short-interval retesting.

What the IAW does not do

Several boundaries on interpretation:

  • It is a verbal-knowledge measure, not a general intelligence measure. A high IAW score does not directly imply high fluid reasoning, working memory, or processing speed. Verbal abilities are heavily g-loaded but not equivalent to g.
  • It is calibrated for English speakers. Like other vocabulary tests, it is not directly translatable to other languages without renorming. Cross-language validation work is a separate undertaking.
  • It depends on reading and writing exposure. Individuals with limited literacy may underperform on the IAW relative to their underlying cognitive ability — the same caveat applies to all written vocabulary tests.
  • The validation evidence is from the test author. The .95 alpha and .83 correlation come from Jouve (2023), the test’s developer. Independent replication by other research groups is the standard expectation for a test in widespread clinical use, and as of writing the IAW has not yet accumulated that independent literature.

Open questions

Several questions remain for future work:

  • Test-retest reliability. Internal consistency (alpha) measures the homogeneity of items at a single administration; test-retest reliability captures stability across occasions. The two are related but distinct, and the test-retest properties of the IAW are not yet established at the same level of detail as internal consistency.
  • Cross-population norms. Performance norms for specific clinical populations (older adults, individuals with learning disabilities, second-language English speakers) would expand the test’s applicability.
  • Comparison with newer intelligence batteries. The validation used the WAIS-III, an older edition. Comparison with current batteries (WAIS-IV, WAIS-V) would update the criterion evidence.
  • Differential item functioning. Whether items perform comparably across demographic subgroups is the standard fairness check, and item-level DIF analysis on a test of this size is a natural next step.

Frequently Asked Questions

What does the IAW test actually measure?

Verbal ability — specifically, the breadth and depth of an individual’s vocabulary and the ability to retrieve appropriate words from semantic memory given contextual constraints. It maps onto the comprehension-knowledge (Gc) factor in the Cattell-Horn-Carroll framework.

How is the IAW different from the vocabulary subtest in a regular IQ test?

Standard IQ vocabulary subtests usually present a word and ask for its definition. The IAW inverts this: it presents a definition or context and asks the examinee to produce the word. The IAW is also untimed and accepts multiple correct answers per item where the language admits synonyms.

What’s the difference between a Cronbach’s alpha of .95 and a correlation of .83?

Alpha measures internal consistency — how well the items on a single test administration hang together as measures of the same construct. The .83 correlation with the WAIS-III VCI is concurrent validity — how strongly the IAW score relates to a separate, established measure of the same construct. They are different kinds of evidence and a strong test usually has both.

Is open-ended testing harder than multiple-choice?

Generally, yes — at the same level of item difficulty, open-ended formats produce lower raw scores because cueing effects and elimination strategies are unavailable. But this is part of the design intention: removing cueing produces a less inflated estimate of vocabulary knowledge.

Can the IAW be used to estimate IQ?

The IAW provides a verbal-ability score that strongly correlates with verbal IQ measures from major batteries. It is best used as a verbal-intelligence indicator rather than a stand-alone full-scale IQ estimate, since fluid reasoning, working memory, and other cognitive components are not assessed.

How long does the IAW take?

Because it is untimed, administration time varies widely by examinee. Most adults complete the 100 items in under an hour; some take longer, which is part of the design — pacing is set by the examinee, not by the clock.

What populations is the IAW validated for?

The 2023 validation sample of 1,083 examinees is the primary evidence base. Generalization to specific clinical populations — older adults, learning-disabled examinees, non-native English speakers — is a separate empirical question that further validation work would need to address.

References

Related Research

Intelligence Research and Cognitive Abilities

The G Factor: What General Intelligence Really Means

In 1904, Charles Spearman noticed something that would reshape the study of intelligence for the next century: children who scored well on one type of…

Apr 10, 2026

People Also Ask

What is psychometrics: the science of psychological measurement?

The discipline of psychometrics emerged from two distinct yet complementary intellectual traditions. The first, championed by figures such as Charles Darwin, Francis Galton, and James McKeen Cattell, emphasized the study of individual differences and sought to develop systematic methods for their quantification. The second, rooted in the psychophysical research of Johann Friedrich Herbart, Ernst Heinrich Weber, Gustav Fechner, and Wilhelm Wundt, laid the foundation for the empirical investigation of human perception, cognition, and consciousness. Together, these two traditions converged to form the scientific underpinnings of modern psychological measurement.

Read more →
What are integrating sdt and irt models for mixed-format exams?

Lawrence T. DeCarlo’s recent article introduces a psychological framework for mixed-format exams, combining signal detection theory (SDT) for multiple-choice items and item response theory (IRT) for open-ended items. This fusion allows for a unified model that captures the nuances of each item type while providing insights into the underlying cognitive processes of examinees.

Read more →
What is group-theoretical symmetries in item response theory (irt)?

Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.

Read more →
What are decoding high intelligence: interdisciplinary insights?

Research into high intelligence provides valuable insights into human cognitive abilities and their impact on individual and societal progress. By exploring the historical development of intelligence studies, the challenges of measuring exceptional cognitive abilities, and recent advancements in neuroscience and psychometrics, this article highlights the ongoing importance of understanding high-IQ individuals.

Read more →
Why is background important?

The IAW test emerged as a response to traditional verbal ability measures, which often prioritize speed and structured responses. By emphasizing flexibility and a more personalized assessment, the test addresses gaps in existing tools. The 2023 revision involved a large sample to evaluate its psychometric properties and compare it against established measures like the WAIS-III Verbal Comprehension Index (VCI) and the RIAS Verbal Intelligence Index (VIX).

How does key insights work in practice?

Reliability and Validity: The study demonstrated strong internal consistency for the IAW test, reflecting its reliability in measuring verbal abilities. Concurrent Validity: The IAW test showed robust correlations with established measures, indicating its effectiveness as a complementary tool in intelligence assessment. Engagement and Inclusivity: The test’s format encourages a more inclusive approach

📋 Cite This Article

Jouve, X. (2023, April 7). Assessing Verbal Intelligence with the IAW Test. PsychoLogic. https://www.psychologic.online/2023/04/07/iaw-verbal-intelligence-test/

Leave a Reply