What is significance?

The IAW test contributes to the evolving field of cognitive assessment by addressing limitations in traditional verbal ability measures. Its open-ended design aligns with efforts to create testing environments that recognize diverse cognitive styles. By offering a reliable and valid alternative, the IAW test has the potential to enhance how verbal intelligence is assessed across populations.

What are future directions?

Future research could focus on expanding the test’s applicability by examining its performance across different cultural and linguistic groups. Addressing current limitations, such as the need for test-retest reliability studies, will further strengthen its psychometric foundation.

The IAW test offers a fresh perspective on verbal ability assessment, prioritizing inclusivity and meaningful engagement. With continued refinement and research, it has the potential to become a widely used tool for assessing verbal intelligence in diverse settings.

Jouve, X. (2023). I Am A Word Test: An Open-Ended And Untimed Approach To Verbal Ability Assessment. Cogn-IQ Research Papers. https://pubscience.org/ps-1mSQS-530828-wbh6

IAW: Assessing Verbal Intelligence

Q: What is background?

The IAW test emerged as a response to traditional verbal ability measures, which often prioritize speed and structured responses. By emphasizing flexibility and a more personalized assessment, the test addresses gaps in existing tools. The 2023 revision involved a large sample to evaluate its psychometric properties and compare it against established measures like the WAIS-III Verbal Comprehension Index (VCI) and the RIAS Verbal Intelligence Index (VIX).

Q: What are key insights?

Reliability and Validity: The study demonstrated strong internal consistency for the IAW test, reflecting its reliability in measuring verbal abilities. Concurrent Validity: The IAW test showed robust correlations with established measures, indicating its effectiveness as a complementary tool in intelligence assessment. Engagement and Inclusivity: The test’s format encourages a more inclusive approach by reducing pressure and creating a more engaging experience for diverse participants.

Published: April 7, 2023 · Last reviewed: May 6, 2026

📖1,843 words⏱8 min read📚7 references cited

The I Am a Word (IAW) test is a 100-item open-ended vocabulary measure (plus 50 experimental items) developed by Xavier Jouve in 2011 and revised through 2025. Each item presents a meaning and a structural constraint; the examinee produces the target word rather than selecting among alternatives. The test outputs a Vocabulary Proficiency Index (VPI) on the standard IQ metric (M = 100, SD = 15) with reliability ξ = .943 in a 1,929-examinee sample. Across two factor analyses spanning the WAIS Verbal Comprehension Index and the SAT, the IAW VPI loads at .80–.90 on a dominant general factor with explained common variance (ECV) ≥ .80 and ω_h ≈ .99 (Cogn-IQ, 2025). The instrument is positioned by its technical manual as an established Gc-anchored measure with an unusual item format and continuing psychometric refinement.

The “I am a word” item template

The IAW’s signature design feature is its item template, which always follows the same structure: “I am a word, [semantic clue], and I [structural constraint]. Who am I?” The examinee types a word that satisfies both elements. A representative example:

“I am a word, nearly meaning ‘angry’, and I begin with ‘F’. Who am I?”
Accepted: furious, fuming, fierce, fiery

The structural constraints are typically letter-based (initial letter, final letter), and most items accept multiple semantically appropriate answers — the average item has approximately 4.3 accepted responses, with roughly 430 unique accepted words across the 100 scored items. This admits the natural synonym variability of real lexical access while retaining objective scoring against a curated answer key.

The format is open-ended (production rather than recognition), untimed, English-only, and administered through automated scoring. Typical administration takes 45–60 minutes.

Why production-format vocabulary measurement matters

Most vocabulary measures in standard intelligence batteries — Wechsler Vocabulary, Stanford-Binet Vocabulary, RIAS Guess What — present the target word and ask either for a definition (open-ended) or for selection among definitional alternatives (multiple-choice). The IAW inverts this structure: present the meaning, the structural constraint, and the contextual cues, and ask the examinee to retrieve the word.

The format change is small in description but has measurement consequences. Production tasks require active retrieval from semantic memory, eliminating the four-options-in-front-of-you cueing artifact that affects multiple-choice formats. Heim and Watts (1967) compared the two formats directly on identical content and reported non-trivially different score patterns, with multiple-choice formats showing systematic cueing effects. The IAW’s automated scoring against a validated synonym set is a modern resolution to the historical trade-off — open-ended response acceptance with multiple-choice-grade scoring reliability.

Stanovich (1993) framed vocabulary as a particularly informative cognitive measure because it functions as a distillation of accumulated language exposure. A working vocabulary reflects a long history of reading, conversation, and instruction, with each individual word a small data point in that history. This gives vocabulary an unusual property among cognitive tests: it is highly stable, highly reliable, substantially heritable, and also responsive to environmental enrichment. The IAW inherits this property as a Gc measure, with the production-format choice adding a retrieval-from-context component on top of the lexical-knowledge core.

Reliability via JRRE

Internal consistency for the IAW is computed using Jouve’s Randomized Reliability Estimation (JRRE; Jouve, 2025), which estimates reliability through repeated randomized split-half resampling: items are randomly partitioned into halves, the half-scores are correlated, the correlation is Spearman-Brown adjusted, and the procedure is averaged across many permutations. The output is a reliability distribution with confidence intervals rather than a single point estimate, robust to any particular split or item ordering. Jouve (2025) documents the conditions under which classical coefficients (Cronbach’s α, KR-20, McDonald’s ω_t, Guttman’s λ) underestimate or distort reliability across IRT models, and where JRRE provides more accurate estimates.

For the IAW VPI (N = 1,929, 100 items, 1,881 randomized splits):

Mean reliability ξ = .9433 (median = .9438)
95% CI [.9339, .9494]
IQR [.9413, .9458]
Skew −1.029 (left-tail near upper bound, expected at high reliability)
Outlier splits: 3.0%

The negative skew is intrinsic to bounded distributions approaching 1.0 and to the concavity of the Spearman-Brown transform; the manual emphasizes median and percentile bands rather than mean-only point estimates as the appropriate summary at this reliability range.

Convergent validity across multiple criterion measures

The IAW Technical Manual reports VPI correlations with three criterion-measure families: clinical IQ batteries (WAIS, RIAS), academic-aptitude tests (SAT, AFQT), and (within Cogn-IQ) related custom instruments. Coefficients are corrected for range restriction using Thorndike Case 2 (Thorndike, 1949) where applicable.

WAIS family (clinical IQ):

WAIS Verbal Comprehension Index: r = .83 (N = 112)
WAIS Vocabulary subtest: r = .83 (N = 112) in Study 1; r = .87 (N = 39) in Study 2 — the highest single-subtest correlation across the validation panel
WAIS Similarities: r = .61 (N = 112)
WAIS Information: r = .78 (N = 112)

RIAS family (clinical IQ):

RIAS Verbal Intelligence Index (VIX): r = .83 (N = 113)
RIAS Guess What: r = .75 (N = 113)
RIAS Verbal Reasoning: r = .82 (N = 113)

Academic aptitude:

SAT 2005–2016 Composite: r = .78 (N = 41); Reading .68, Writing .66, Math .58
SAT 2016+ Composite: r = .74 (N = 20); Verbal .76, Math .65
AFQT Percentile: r = .58 (N = 31)

The pattern is internally consistent. The strongest correlations are with verbal-vocabulary subtests of established batteries (WAIS Vocabulary .83–.87; RIAS Guess What .75; RIAS VIX .83), confirming that the IAW samples the same lexical-knowledge construct. The systematically lower correlations with quantitative measures (SAT Math .58–.65; AFQT .58 across mixed verbal-quantitative content) indicate the appropriate differential-validity pattern: the IAW does not measure quantitative ability. The .61 correlation with WAIS Similarities (a verbal-abstraction subtest) is consistent with Similarities tapping a verbal-reasoning strand that the IAW samples less directly than vocabulary itself.

Factor structure: dominant Gc backbone

The technical manual reports two cross-battery exploratory factor analyses examining how the IAW VPI loads alongside subtests from external batteries.

IAW with WAIS Verbal Comprehension (N = 111)

The factor analysis included the IAW VPI, WAIS Vocabulary, WAIS Similarities, and WAIS Information. KMO = .820 (very good), Bartlett’s χ² = 244.12 (p < .001), parallel analysis suggested a single factor. A descriptive two-factor solution accounts for 85.5% variance (F1 = 60.3%, F2 = 25.2%). Bifactor analysis yielded ECV_g/common = .863, ω_h = .999, indicating an essentially unidimensional structure dominated by g/Gc.

g-loadings:

WAIS Vocabulary: .908 (h² = .824)
IAW VPI: .900 (h² = .810)
WAIS Information: .855 (h² = .731)
WAIS Similarities: .759 (h² = .577)

The IAW VPI’s g-loading of .900 is essentially equivalent to WAIS Vocabulary’s .908 — the two measures function as interchangeable indicators of the same Gc factor in this sample, with the IAW providing the open-ended production format and WAIS Vocabulary providing the examiner-administered definitional format.

IAW with SAT Components (N = 41)

The factor analysis included the IAW VPI, SAT Reading, SAT Writing, and SAT Math. KMO = .725 (adequate), Bartlett’s χ² = 39.60 (p < .001). Bifactor analysis yielded ECV_g/common = .796, ω_h = .998 — strong general-factor dominance with a separable Gq strand.

g-loadings:

SAT Reading: .858
IAW VPI: .802 (cross-loading complexity 1.89, bridging Gc and literacy)
SAT Writing: .793
SAT Math: .647

The CHC interpretation: a strong g/Gc-literacy strand (IAW with SAT Reading/Writing) and a separable Gq influence (SAT Math). The IAW functions as a Gc-centric measure with expected proximity to literacy outcomes.

Across both analyses, ECV ≥ .80 and ω_h ≈ .99 indicate that the IAW consistently loads on a dominant general/verbal dimension with very high g saturation. Secondary strands reflect expected content splits (verbal abstraction in the WAIS analysis; math vs. literacy in the SAT analysis) without altering the IAW’s primary Gc alignment.

What the IAW does and does not measure

The IAW is a vocabulary measure, not a general-intelligence battery. Several boundaries on interpretation follow from the construct definition and the validity profile:

It measures verbal-crystallized ability (Gc), specifically lexical knowledge (VL). A high IAW score indicates strong vocabulary depth and breadth; it does not directly imply high fluid reasoning, working memory, or processing speed. Vocabulary is heavily g-loaded but not equivalent to g — the IAW’s correlations with non-verbal aptitude measures (.58–.65 with SAT Math; .58 with AFQT) make this differential clear.
It is calibrated for English speakers. Like other vocabulary tests, it is not directly translatable to other languages without renorming. Cross-language validation work would be a substantial separate undertaking.
It depends on reading and writing exposure. Production-format vocabulary tests require literacy in the test language. Individuals with limited literacy may underperform relative to their underlying cognitive ability — a caveat that applies to all written vocabulary tests.
It is positioned for research and self-assessment use. The instrument is appropriate as a screening or matching variable, for research where verbal ability is a covariate, for self-administered cognitive estimation, and for educational contexts where vocabulary depth is the construct of interest. High-stakes clinical IQ assessment requires a full intelligence battery.

Practical implications

For researchers and clinicians considering the IAW:

Use the IAW when production-format vocabulary measurement is the goal. The open-ended retrieval format and multiple-correct-answer scoring capture lexical breadth and semantic flexibility in ways that recognition-format tests cannot.
Pair with measures of other broad CHC abilities when a comprehensive cognitive profile is needed. The IAW samples Gc; complementary instruments are needed for Gf (fluid reasoning), Gv (visuospatial), Gs (processing speed), and Gwm (working memory).
Interpret IAW VPI scores within the established WAIS/RIAS verbal-intelligence neighborhood. Convergent correlations of .83 with both WAIS VCI and RIAS VIX support direct interpretation on the IQ metric.
Account for linguistic background. Cormier et al. (2022) showed examinee linguistic characteristics affect cognitive test performance more strongly than test characteristics. The IAW’s English-only design adds a language-specific component requiring norm reference within the target population for cross-cultural use.
Read the technical manual for full norm tables, scoring procedures, and item-level details before high-stakes applications. The Cogn-IQ (2025) manual is the authoritative source.

Open research directions

The IAW has accumulated a substantial validation panel since its 2011 introduction, including criterion correlations with both editions of the SAT, multiple Wechsler comparisons, the RIAS, and the AFQT. Useful extensions of the existing evidence base include independent replication of the convergent-validity correlations and factor-analytic results in samples outside the Cogn-IQ ecosystem, differential item functioning analyses across demographic subgroups to clarify the bounds of fair use, and test-retest reliability studies across longer intervals to complement the strong internal-consistency evidence.

The takeaway

The IAW is a 100-item open-ended vocabulary test using a distinctive “I am a word, [meaning], and I [structural constraint]” template, producing a Vocabulary Proficiency Index (VPI) on the standard IQ metric. JRRE reliability of ξ = .943 across N = 1,929 examinees, convergent correlations of r = .83 with both WAIS VCI and RIAS VIX, and bifactor general-factor dominance (ECV ≥ .80, ω_h ≈ .99) across two cross-battery analyses confirm a strong Gc construct. The production format with multiple-correct-answer scoring captures lexical breadth and semantic flexibility that recognition-format vocabulary tests cannot, and the instrument is appropriately positioned by its technical manual for research, self-assessment, and educational applications where verbal-crystallized intelligence is the construct of interest.

References

Cogn-IQ. (2025). IAW Technical Manual. Cogn-IQ. https://www.cogn-iq.org/methods/iaw-manual/
Jouve, X. (2023). I Am A Word Test: An open-ended and untimed approach to verbal ability assessment. Cogn-IQ Research Papers. https://pubscience.org/ps-1mSQS-530828-wbh6
Jouve, X. (2025). When alpha fails: Jouve’s Randomized Reliability Estimation (ξ) versus classical reliability coefficients in Rasch, 2PL, 3PL, 4PL, GRM, GPCM, and NRM models. Cogn-IQ Research Papers. https://pubscience.org/ps-1mYdi-014f7f-ormI
Heim, A. W., & Watts, K. P. (1967). An experiment on multiple-choice versus open-ended answering in a vocabulary test. British Journal of Educational Psychology, 37(3), 339–346. https://doi.org/10.1111/j.2044-8279.1967.tb01950.x
Stanovich, K. E. (1993). Does reading make you smarter? Literacy and the development of verbal intelligence. Advances in Child Development and Behavior, 24, 133–180. https://doi.org/10.1016/S0065-2407(08)60302-X
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge University Press.
Cormier, D. C., Bulut, O., McGrew, K. S., & Kennedy, K. (2022). Linguistic influences on cognitive test performance: Examinee characteristics are more important than test characteristics. Journal of Intelligence, 10(1), 8. https://doi.org/10.3390/jintelligence10010008

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Intelligence Research and Cognitive Abilities

The G Factor: What General Intelligence Means

The g factor — Charles Spearman's name for the common variance that runs through all cognitive tests — is the most replicated and the most…

Apr 10, 2026

IQ Scores and Ranges

What Is Mensa? Membership and Testing

Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…

Mar 25, 2026

Psychometric Testing and IQ Assessment

IQ Test Anxiety: How Stress Affects Your Score

You sit down for an IQ assessment. Your palms are sweating, your mind races, and the moment you see the first timed task, your thoughts…

Mar 22, 2026

Psychometric Testing and IQ Assessment

Raven's Progressive Matrices: Culture-Fair IQ Test

Among the hundreds of cognitive tests developed over the past century, few have achieved the global reach of Raven's Progressive Matrices. Administered in settings from…

Mar 19, 2026

Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026

IAW: Assessing Verbal Intelligence

The “I am a word” item template

Why production-format vocabulary measurement matters

Reliability via JRRE

Convergent validity across multiple criterion measures

Factor structure: dominant Gc backbone

IAW with WAIS Verbal Comprehension (N = 111)

IAW with SAT Components (N = 41)

What the IAW does and does not measure

Practical implications

Open research directions

The takeaway

References

Related Research

The G Factor: What General Intelligence Means

What Is Mensa? Membership and Testing

IQ Test Anxiety: How Stress Affects Your Score

Raven's Progressive Matrices: Culture-Fair IQ Test

How to Interpret IQ Test Results

People Also Ask

Leave a Reply Cancel reply

The “I am a word” item template

Why production-format vocabulary measurement matters

Reliability via JRRE

Convergent validity across multiple criterion measures

Factor structure: dominant Gc backbone

IAW with WAIS Verbal Comprehension (N = 111)

IAW with SAT Components (N = 41)

What the IAW does and does not measure

Practical implications

Open research directions

The takeaway

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply