The I Am a Word (IAW) test is a 100-item open-ended vocabulary measure (plus 50 experimental items) developed by Xavier Jouve in 2011 and revised through 2025. Each item presents a meaning and a structural constraint; the examinee produces the target word rather than selecting among alternatives. The test outputs a Vocabulary Proficiency Index (VPI) on the standard IQ metric (M = 100, SD = 15) with reliability ξ = .943 in a 1,929-examinee sample. Across two factor analyses spanning the WAIS Verbal Comprehension Index and the SAT, the IAW VPI loads at .80–.90 on a dominant general factor with explained common variance (ECV) ≥ .80 and ωh ≈ .99 (Cogn-IQ, 2025). The instrument is positioned by its technical manual as an established Gc-anchored measure with an unusual item format and continuing psychometric refinement.
The “I am a word” item template
The IAW’s signature design feature is its item template, which always follows the same structure: “I am a word, [semantic clue], and I [structural constraint]. Who am I?” The examinee types a word that satisfies both elements. A representative example:
“I am a word, nearly meaning ‘angry’, and I begin with ‘F’. Who am I?”
Accepted: furious, fuming, fierce, fiery
The structural constraints are typically letter-based (initial letter, final letter), and most items accept multiple semantically appropriate answers — the average item has approximately 4.3 accepted responses, with roughly 430 unique accepted words across the 100 scored items. This admits the natural synonym variability of real lexical access while retaining objective scoring against a curated answer key.
The format is open-ended (production rather than recognition), untimed, English-only, and administered through automated scoring. Typical administration takes 45–60 minutes.
Why production-format vocabulary measurement matters
Most vocabulary measures in standard intelligence batteries — Wechsler Vocabulary, Stanford-Binet Vocabulary, RIAS Guess What — present the target word and ask either for a definition (open-ended) or for selection among definitional alternatives (multiple-choice). The IAW inverts this structure: present the meaning, the structural constraint, and the contextual cues, and ask the examinee to retrieve the word.
The format change is small in description but has measurement consequences. Production tasks require active retrieval from semantic memory, eliminating the four-options-in-front-of-you cueing artifact that affects multiple-choice formats. Heim and Watts (1967) compared the two formats directly on identical content and reported non-trivially different score patterns, with multiple-choice formats showing systematic cueing effects. The IAW’s automated scoring against a validated synonym set is a modern resolution to the historical trade-off — open-ended response acceptance with multiple-choice-grade scoring reliability.
Stanovich (1993) framed vocabulary as a particularly informative cognitive measure because it functions as a distillation of accumulated language exposure. A working vocabulary reflects a long history of reading, conversation, and instruction, with each individual word a small data point in that history. This gives vocabulary an unusual property among cognitive tests: it is highly stable, highly reliable, substantially heritable, and also responsive to environmental enrichment. The IAW inherits this property as a Gc measure, with the production-format choice adding a retrieval-from-context component on top of the lexical-knowledge core.
Reliability via JRRE
Internal consistency for the IAW is computed using Jouve’s Randomized Reliability Estimation (JRRE; Jouve, 2025), which estimates reliability through repeated randomized split-half resampling: items are randomly partitioned into halves, the half-scores are correlated, the correlation is Spearman-Brown adjusted, and the procedure is averaged across many permutations. The output is a reliability distribution with confidence intervals rather than a single point estimate, robust to any particular split or item ordering. Jouve (2025) documents the conditions under which classical coefficients (Cronbach’s α, KR-20, McDonald’s ωt, Guttman’s λ) underestimate or distort reliability across IRT models, and where JRRE provides more accurate estimates.
For the IAW VPI (N = 1,929, 100 items, 1,881 randomized splits):
- Mean reliability ξ = .9433 (median = .9438)
- 95% CI [.9339, .9494]
- IQR [.9413, .9458]
- Skew −1.029 (left-tail near upper bound, expected at high reliability)
- Outlier splits: 3.0%
The negative skew is intrinsic to bounded distributions approaching 1.0 and to the concavity of the Spearman-Brown transform; the manual emphasizes median and percentile bands rather than mean-only point estimates as the appropriate summary at this reliability range.
Convergent validity across multiple criterion measures
The IAW Technical Manual reports VPI correlations with three criterion-measure families: clinical IQ batteries (WAIS, RIAS), academic-aptitude tests (SAT, AFQT), and (within Cogn-IQ) related custom instruments. Coefficients are corrected for range restriction using Thorndike Case 2 (Thorndike, 1949) where applicable.
WAIS family (clinical IQ):
- WAIS Verbal Comprehension Index: r = .83 (N = 112)
- WAIS Vocabulary subtest: r = .83 (N = 112) in Study 1; r = .87 (N = 39) in Study 2 — the highest single-subtest correlation across the validation panel
- WAIS Similarities: r = .61 (N = 112)
- WAIS Information: r = .78 (N = 112)
RIAS family (clinical IQ):
- RIAS Verbal Intelligence Index (VIX): r = .83 (N = 113)
- RIAS Guess What: r = .75 (N = 113)
- RIAS Verbal Reasoning: r = .82 (N = 113)
Academic aptitude:
- SAT 2005–2016 Composite: r = .78 (N = 41); Reading .68, Writing .66, Math .58
- SAT 2016+ Composite: r = .74 (N = 20); Verbal .76, Math .65
- AFQT Percentile: r = .58 (N = 31)
The pattern is internally consistent. The strongest correlations are with verbal-vocabulary subtests of established batteries (WAIS Vocabulary .83–.87; RIAS Guess What .75; RIAS VIX .83), confirming that the IAW samples the same lexical-knowledge construct. The systematically lower correlations with quantitative measures (SAT Math .58–.65; AFQT .58 across mixed verbal-quantitative content) indicate the appropriate differential-validity pattern: the IAW does not measure quantitative ability. The .61 correlation with WAIS Similarities (a verbal-abstraction subtest) is consistent with Similarities tapping a verbal-reasoning strand that the IAW samples less directly than vocabulary itself.
Factor structure: dominant Gc backbone
The technical manual reports two cross-battery exploratory factor analyses examining how the IAW VPI loads alongside subtests from external batteries.
IAW with WAIS Verbal Comprehension (N = 111)
The factor analysis included the IAW VPI, WAIS Vocabulary, WAIS Similarities, and WAIS Information. KMO = .820 (very good), Bartlett’s χ² = 244.12 (p < .001), parallel analysis suggested a single factor. A descriptive two-factor solution accounts for 85.5% variance (F1 = 60.3%, F2 = 25.2%). Bifactor analysis yielded ECVg/common = .863, ωh = .999, indicating an essentially unidimensional structure dominated by g/Gc.
g-loadings:
- WAIS Vocabulary: .908 (h² = .824)
- IAW VPI: .900 (h² = .810)
- WAIS Information: .855 (h² = .731)
- WAIS Similarities: .759 (h² = .577)
The IAW VPI’s g-loading of .900 is essentially equivalent to WAIS Vocabulary’s .908 — the two measures function as interchangeable indicators of the same Gc factor in this sample, with the IAW providing the open-ended production format and WAIS Vocabulary providing the examiner-administered definitional format.
IAW with SAT Components (N = 41)
The factor analysis included the IAW VPI, SAT Reading, SAT Writing, and SAT Math. KMO = .725 (adequate), Bartlett’s χ² = 39.60 (p < .001). Bifactor analysis yielded ECVg/common = .796, ωh = .998 — strong general-factor dominance with a separable Gq strand.
g-loadings:
- SAT Reading: .858
- IAW VPI: .802 (cross-loading complexity 1.89, bridging Gc and literacy)
- SAT Writing: .793
- SAT Math: .647
The CHC interpretation: a strong g/Gc-literacy strand (IAW with SAT Reading/Writing) and a separable Gq influence (SAT Math). The IAW functions as a Gc-centric measure with expected proximity to literacy outcomes.
Across both analyses, ECV ≥ .80 and ωh ≈ .99 indicate that the IAW consistently loads on a dominant general/verbal dimension with very high g saturation. Secondary strands reflect expected content splits (verbal abstraction in the WAIS analysis; math vs. literacy in the SAT analysis) without altering the IAW’s primary Gc alignment.
What the IAW does and does not measure
The IAW is a vocabulary measure, not a general-intelligence battery. Several boundaries on interpretation follow from the construct definition and the validity profile:
- It measures verbal-crystallized ability (Gc), specifically lexical knowledge (VL). A high IAW score indicates strong vocabulary depth and breadth; it does not directly imply high fluid reasoning, working memory, or processing speed. Vocabulary is heavily g-loaded but not equivalent to g — the IAW’s correlations with non-verbal aptitude measures (.58–.65 with SAT Math; .58 with AFQT) make this differential clear.
- It is calibrated for English speakers. Like other vocabulary tests, it is not directly translatable to other languages without renorming. Cross-language validation work would be a substantial separate undertaking.
- It depends on reading and writing exposure. Production-format vocabulary tests require literacy in the test language. Individuals with limited literacy may underperform relative to their underlying cognitive ability — a caveat that applies to all written vocabulary tests.
- It is positioned for research and self-assessment use. The instrument is appropriate as a screening or matching variable, for research where verbal ability is a covariate, for self-administered cognitive estimation, and for educational contexts where vocabulary depth is the construct of interest. High-stakes clinical IQ assessment requires a full intelligence battery.
Practical implications
For researchers and clinicians considering the IAW:
- Use the IAW when production-format vocabulary measurement is the goal. The open-ended retrieval format and multiple-correct-answer scoring capture lexical breadth and semantic flexibility in ways that recognition-format tests cannot.
- Pair with measures of other broad CHC abilities when a comprehensive cognitive profile is needed. The IAW samples Gc; complementary instruments are needed for Gf (fluid reasoning), Gv (visuospatial), Gs (processing speed), and Gwm (working memory).
- Interpret IAW VPI scores within the established WAIS/RIAS verbal-intelligence neighborhood. Convergent correlations of .83 with both WAIS VCI and RIAS VIX support direct interpretation on the IQ metric.
- Account for linguistic background. Cormier et al. (2022) showed examinee linguistic characteristics affect cognitive test performance more strongly than test characteristics. The IAW’s English-only design adds a language-specific component requiring norm reference within the target population for cross-cultural use.
- Read the technical manual for full norm tables, scoring procedures, and item-level details before high-stakes applications. The Cogn-IQ (2025) manual is the authoritative source.
Open research directions
The IAW has accumulated a substantial validation panel since its 2011 introduction, including criterion correlations with both editions of the SAT, multiple Wechsler comparisons, the RIAS, and the AFQT. Useful extensions of the existing evidence base include independent replication of the convergent-validity correlations and factor-analytic results in samples outside the Cogn-IQ ecosystem, differential item functioning analyses across demographic subgroups to clarify the bounds of fair use, and test-retest reliability studies across longer intervals to complement the strong internal-consistency evidence.
The takeaway
The IAW is a 100-item open-ended vocabulary test using a distinctive “I am a word, [meaning], and I [structural constraint]” template, producing a Vocabulary Proficiency Index (VPI) on the standard IQ metric. JRRE reliability of ξ = .943 across N = 1,929 examinees, convergent correlations of r = .83 with both WAIS VCI and RIAS VIX, and bifactor general-factor dominance (ECV ≥ .80, ωh ≈ .99) across two cross-battery analyses confirm a strong Gc construct. The production format with multiple-correct-answer scoring captures lexical breadth and semantic flexibility that recognition-format vocabulary tests cannot, and the instrument is appropriately positioned by its technical manual for research, self-assessment, and educational applications where verbal-crystallized intelligence is the construct of interest.
References
- Cogn-IQ. (2025). IAW Technical Manual. Cogn-IQ. https://www.cogn-iq.org/methods/iaw-manual/
- Jouve, X. (2023). I Am A Word Test: An open-ended and untimed approach to verbal ability assessment. Cogn-IQ Research Papers. https://pubscience.org/ps-1mSQS-530828-wbh6
- Jouve, X. (2025). When alpha fails: Jouve’s Randomized Reliability Estimation (ξ) versus classical reliability coefficients in Rasch, 2PL, 3PL, 4PL, GRM, GPCM, and NRM models. Cogn-IQ Research Papers. https://pubscience.org/ps-1mYdi-014f7f-ormI
- Heim, A. W., & Watts, K. P. (1967). An experiment on multiple-choice versus open-ended answering in a vocabulary test. British Journal of Educational Psychology, 37(3), 339–346. https://doi.org/10.1111/j.2044-8279.1967.tb01950.x
- Stanovich, K. E. (1993). Does reading make you smarter? Literacy and the development of verbal intelligence. Advances in Child Development and Behavior, 24, 133–180. https://doi.org/10.1016/S0065-2407(08)60302-X
- Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.
- Cormier, D. C., Bulut, O., McGrew, K. S., & Kennedy, K. (2022). Linguistic influences on cognitive test performance: Examinee characteristics are more important than test characteristics. Journal of Intelligence, 10(1), 8. https://doi.org/10.3390/jintelligence10010008
Related Research
The G Factor: What General Intelligence Means
The g factor — Charles Spearman's name for the common variance that runs through all cognitive tests — is the most replicated and the most…
Apr 10, 2026What Is Mensa? Membership and Testing
Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…
Mar 25, 2026IQ Test Anxiety: How Stress Affects Your Score
You sit down for an IQ assessment. Your palms are sweating, your mind races, and the moment you see the first timed task, your thoughts…
Mar 22, 2026Raven's Progressive Matrices: Culture-Fair IQ Test
Among the hundreds of cognitive tests developed over the past century, few have achieved the global reach of Raven's Progressive Matrices. Administered in settings from…
Mar 19, 2026How to Interpret IQ Test Results
You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…
Mar 15, 2026People Also Ask
What is psychometrics: the science of psychological measurement?
The discipline of psychometrics emerged from two distinct yet complementary intellectual traditions. The first, championed by figures such as Charles Darwin, Francis Galton, and James McKeen Cattell, emphasized the study of individual differences and sought to develop systematic methods for their quantification. The second, rooted in the psychophysical research of Johann Friedrich Herbart, Ernst Heinrich Weber, Gustav Fechner, and Wilhelm Wundt, laid the foundation for the empirical investigation of human perception, cognition, and consciousness. Together, these two traditions converged to form the scientific underpinnings of modern psychological measurement.
Read more →What are integrating sdt and irt models for mixed-format exams?
Lawrence T. DeCarlo’s recent article introduces a psychological framework for mixed-format exams, combining signal detection theory (SDT) for multiple-choice items and item response theory (IRT) for open-ended items. This fusion allows for a unified model that captures the nuances of each item type while providing insights into the underlying cognitive processes of examinees.
Read more →What is group-theoretical symmetries in item response theory (irt)?
Item Response Theory (IRT) is a widely adopted framework in psychological and educational assessments, used to model the relationship between latent traits and observed responses. This recent work introduces an innovative approach that incorporates group-theoretic symmetry constraints, offering a refined methodology for estimating IRT parameters with greater precision and efficiency.
Read more →What are decoding high intelligence: interdisciplinary insights?
Research into high intelligence provides valuable insights into human cognitive abilities and their impact on individual and societal progress. By exploring the historical development of intelligence studies, the challenges of measuring exceptional cognitive abilities, and recent advancements in neuroscience and psychometrics, this article highlights the ongoing importance of understanding high-IQ individuals.
Read more →Why is background important?
The IAW test emerged as a response to traditional verbal ability measures, which often prioritize speed and structured responses. By emphasizing flexibility and a more personalized assessment, the test addresses gaps in existing tools. The 2023 revision involved a large sample to evaluate its psychometric properties and compare it against established measures like the WAIS-III Verbal Comprehension Index (VCI) and the RIAS Verbal Intelligence Index (VIX).
How does key insights work in practice?
Reliability and Validity: The study demonstrated strong internal consistency for the IAW test, reflecting its reliability in measuring verbal abilities. Concurrent Validity: The IAW test showed robust correlations with established measures, indicating its effectiveness as a complementary tool in intelligence assessment. Engagement and Inclusivity: The test’s format encourages a more inclusive approach
Jouve, X. (2023, April 7). IAW: Assessing Verbal Intelligence. PsychoLogic. https://www.psychologic.online/iaw-verbal-intelligence/

