Psychological Measurement and Testing

Stanford-Binet vs WAIS in Intellectual Disability

Stanford-Binet & WAIS IQ Differences and Their Implications for Adults with Intellectual Disability
Published: April 2, 2010 · Last reviewed:
📖1,770 words⏱7 min read📚6 references cited

Two of the most widely used adult intelligence tests — the Stanford-Binet Intelligence Scales (SB5) and the Wechsler Adult Intelligence Scale (WAIS) — produce notably different scores when administered to the same person on the same day. The discrepancy is not random. Silverman, Miezejeski, Ryan, Zigman, Krinsky-McHale, and Urv (2010) reported that in 74 adults with diagnosed intellectual disability, WAIS Full-Scale IQ scores were on average 16.7 points higher than Stanford-Binet Composite IQ scores — in every one of the 74 individuals. The gap is large enough to change diagnoses, eligibility for services, and in capital cases, life-or-death sentencing decisions.

The Silverman 2010 finding

The study tested 74 adults with previously diagnosed intellectual disability on both the Stanford-Binet (then in its 5th edition, SB5; Roid, 2003) and the Wechsler Adult Intelligence Scale (then in its 3rd or 4th edition; Wechsler, 2008). The scores were compared individually rather than only at the group mean. The result was uniform: every single participant scored higher on the WAIS than on the SB5, with mean WAIS-FSIQ exceeding mean SB-Composite by 16.7 IQ points. The discrepancy was not driven by floor effects (the SB5’s lower minimum possible score) and persisted across the range of intellectual disability severity in the sample.

The directionality and consistency are what make the finding clinically important. Random measurement noise would produce some participants scoring higher on each test; systematic 16.7-point gaps in one direction across all 74 cases reflect something structural about how the two tests differ in this population, not noise.

Why the gap exists

Three structural factors plausibly contribute. First, the tests were normed on different reference samples at different times, and IQ score distributions drift across cohorts — the Flynn effect documents secular increases of roughly 3 IQ points per decade across most of the 20th century (Trahan, Stuebing, Hiscock, & Fletcher, 2014). When two tests are normed years apart, the older norms produce inflated scores relative to current population performance. This is why test publishers re-norm at intervals; it is also why direct comparisons between editions of different test families are interpretively complicated.

Second, the tests sample different cognitive content. The SB5 organises subtests around five CHC factors (Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial Processing, Working Memory) administered in both verbal and nonverbal forms. The WAIS-IV organised subtests around four indices (Verbal Comprehension, Perceptual Reasoning, Working Memory, Processing Speed). The content overlap is substantial but not complete, and the two tests put different weights on the underlying broad abilities. In a low-IQ sample, where ceiling and floor effects on individual subtests interact with the composite scoring rules, the resulting Full-Scale or Composite scores can land at meaningfully different points on the IQ scale even when the underlying ability is the same.

Third, the scoring methodology differs. The SB5’s Composite IQ is computed differently from the WAIS-IV’s Full-Scale IQ, and the relationship between subtest performance and the final score is not identical across instruments. For an examinee at the low end of the distribution, small differences in subtest weighting can compound into multi-point differences at the composite level.

Diagnostic implications

The clinical diagnosis of intellectual disability under both DSM-5-TR and the American Association on Intellectual and Developmental Disabilities’ 12th-edition framework (Schalock, Luckasson, & Tassé, 2021) requires three elements: significant limitations in intellectual functioning (typically two standard deviations below the population mean, with appropriate confidence intervals), significant limitations in adaptive functioning, and onset during the developmental period. The IQ criterion is conventionally interpreted as a Full-Scale or Composite score around 70 or below, with the standard error of measurement creating a working range of roughly 65–75 around the cutoff.

The Silverman finding directly affects this diagnostic process. A patient who scores 67 on the SB5 (clearly within the ID range) might score 84 on the WAIS-IV (clearly outside the ID range) on the same day. Whether the patient meets the IQ criterion for ID diagnosis can therefore depend on which test the clinician chose — an unsatisfying state of affairs for a diagnosis that determines eligibility for services, educational supports, disability benefits, and legal protections.

The 16.7-point average gap is also large compared to the standard error of measurement on either test individually (typically 3–5 IQ points). Confidence intervals around a single test’s score do not bridge the gap; a person can be reliably diagnosable on one instrument and reliably non-diagnosable on the other, and the difference reflects the test, not their ability.

The largest practical stake of the SB-WAIS discrepancy is in capital-case forensic evaluation. The U.S. Supreme Court’s 2002 decision in Atkins v. Virginia (536 U.S. 304) ruled that executing individuals with intellectual disability is unconstitutional under the Eighth Amendment, making accurate ID diagnosis directly relevant to whether a defendant is eligible for the death penalty. The court did not specify which IQ test must be used or how scores should be interpreted, leaving this to state-level practice and individual clinical judgement.

In a forensic context, the choice of test — and the interpretation of scores against the diagnostic threshold — can determine the outcome. A defendant whose true ability sits in the 65–75 borderline range may meet the ID criterion on the SB5 and not on the WAIS, with the WAIS systematically biased toward inflating scores relative to the SB in this exact population. The Silverman 2010 paper has been cited in subsequent legal and forensic-psychology literature as part of the case for cautious interpretation of single-test scores in capital cases, with explicit attention to which instrument was administered and what its known biases are. Both the SB5 and the WAIS family are reviewed alongside each other in comparative analyses of the two instruments for clinical and research use.

Practical guidance

Three practical takeaways follow from the Silverman finding. First, in any clinical or forensic context where the diagnostic decision turns on whether a Full-Scale IQ score is above or below 70, the test choice should be deliberate and documented — the gap between the SB5 and the WAIS is large enough that the choice carries real information. Second, when feasible, administering both tests provides a more defensible picture than relying on either alone, particularly in low-IQ samples where the discrepancy is most pronounced. Third, a single Full-Scale or Composite score should always be reported with its confidence interval, and the interpretation should explicitly acknowledge the instrument used and its known properties relative to alternatives.

For typical-population assessment, where the discrepancy is smaller and the diagnostic stakes lower, either test produces broadly defensible scores. The Silverman gap is largest in the very samples where the consequence of getting it wrong is largest — ID populations and forensic cases — and that is precisely where the test-choice decision deserves the most attention.

What is still open

The Silverman 2010 sample (n = 74) is informative but not definitive. The participants were already diagnosed with ID at study entry, which constrains generalisability to the diagnostic decision itself (i.e., the population in whom the discrepancy matters most clinically). Subsequent work has examined SB-WAIS comparisons in different age groups, different specific ID aetiologies, and different test editions; the broad pattern of WAIS scores running higher than SB scores in low-IQ samples replicates, though the precise size of the gap varies. The interpretive question of which test better tracks “real” ability in this population is unresolvable in principle: both are validated against external criteria, both have published psychometric properties, and the choice is in practice driven by clinician training, available resources, and the specific decision the score is being used to inform.

Frequently asked questions

How big is the difference between Stanford-Binet and WAIS scores?

Silverman et al. (2010) reported a mean difference of 16.7 IQ points in a sample of 74 adults with intellectual disability, with WAIS scores higher than Stanford-Binet scores in every one of the 74 individuals. In typical-population samples, the discrepancy is smaller; the gap is largest in low-IQ populations where diagnostic decisions are most consequential.

Which test is more accurate for diagnosing intellectual disability?

Neither is straightforwardly “more accurate.” Both the SB5 and WAIS-IV/V are validated psychometric instruments with published reliability and validity evidence. They produce different scores for structural reasons (different norm samples, different content sampling, different scoring methodology). The diagnostic question of which IQ score better corresponds to “true” ability in any given individual cannot be resolved by the tests alone — clinical judgement, adaptive-functioning assessment, and developmental history all bear on the diagnosis.

Why does this matter for the death penalty?

The U.S. Supreme Court’s 2002 decision in Atkins v. Virginia ruled that executing individuals with intellectual disability is unconstitutional under the Eighth Amendment. ID diagnosis is therefore directly relevant to capital-case eligibility. A defendant whose ability sits in the borderline range may meet the IQ criterion for ID on one test but not the other, and the choice of test (and its interpretation) can determine the outcome. The Silverman finding is part of the basis for cautious, multi-test interpretation in forensic contexts.

Should clinicians use both tests in low-IQ assessment?

When feasible, yes. Administering both provides a more defensible picture than relying on either alone, particularly when the diagnostic decision hinges on whether a single composite score is above or below 70. When only one test is administered, the choice should be deliberate, the result should be reported with its confidence interval, and the interpretation should acknowledge what is known about the instrument’s properties relative to alternatives.

Is the gap due to the Flynn effect?

Partly, but not entirely. The Flynn effect (Trahan et al., 2014) means that older norms produce inflated scores relative to current population performance, so when two tests were normed years apart, the older test would be expected to over-score relative to the newer one. The Silverman 2010 study found WAIS scores higher than SB5 scores at a time when WAIS-III/IV norms were older than SB5 norms, which is consistent with a Flynn-effect contribution. But the gap appears too large to be Flynn-effect alone, and structural differences in content sampling and scoring methodology also contribute.

References

  • Roid, G. H. (2003). Stanford-Binet Intelligence Scales, Fifth Edition (SB5). Riverside Publishing.
  • Schalock, R. L., Luckasson, R., & TassĂ©, M. J. (2021). An overview of intellectual disability: Definition, diagnosis, classification, and systems of supports (12th ed.). American Journal on Intellectual and Developmental Disabilities, 126(6), 439–442. https://doi.org/10.1352/1944-7558-126.6.439
  • Silverman, W., Miezejeski, C., Ryan, R., Zigman, W., Krinsky-McHale, S., & Urv, T. (2010). Stanford-Binet and WAIS IQ differences and their implications for adults with intellectual disability. Intelligence, 38(2), 242–248. https://doi.org/10.1016/j.intell.2009.12.005
  • Trahan, L. H., Stuebing, K. K., Hiscock, M. K., & Fletcher, J. M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332–1360. https://doi.org/10.1037/a0037173
  • Wechsler, D. (2008). Wechsler Adult Intelligence Scale—Fourth Edition. Pearson.
  • Atkins v. Virginia, 536 U.S. 304 (2002).

Related Research

IQ Scores and Ranges

What Is Mensa? Membership and Testing

Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…

Mar 25, 2026
Cognitive Abilities and Intelligence

Sex Differences in Cognitive Abilities

Few topics in psychology generate more heat and less light than sex differences in cognitive abilities. Claims range from "men and women are cognitively identical"…

Dec 3, 2025
Psychological Measurement and Testing

WAIS-IV vs. WAIS-V: What Changed

Pearson released the Wechsler Adult Intelligence Scale, Fifth Edition (WAIS-5) in late 2024 — the first major revision since the WAIS-IV appeared in 2008. For…

Aug 7, 2025
Psychological Measurement and Testing

History of the WAIS: Wechsler-Bellevue to WAIS-V

The Wechsler Adult Intelligence Scale is the most widely administered individual IQ test in clinical practice, and has been for most of the 70 years…

Oct 27, 2023
Psychological Measurement and Testing

JCFS: Assessing Nonverbal Intelligence

Most cognitive ability tests in widespread use are either verbal-heavy (vocabulary, comprehension, knowledge) or rely on a fixed sequence of items that takes everyone a…

Apr 12, 2023

People Also Ask

What are the complex journey of the wais: insights and transformations?

The Wechsler Adult Intelligence Scale (WAIS), developed in 1955 by David Wechsler, introduced a broader and more dynamic approach to assessing cognitive abilities. Over the years, it has been refined through several editions, becoming one of the most widely used tools in psychological and neurocognitive evaluations. This post reviews its historical development, structure, and contributions to cognitive science.

Read more →
What are assessing nonverbal intelligence: insights from the jcfs?

The Jouve-Cerebrals Figurative Sequences (JCFS) is a self-administered test designed to measure nonverbal cognitive abilities, focusing on pattern recognition and problem-solving. This post outlines the psychometric evaluation of the JCFS, emphasizing its reliability and practical applications while acknowledging areas for future development.

Read more →
What is an alternative cattell-horn-carroll (chc) factor structure of the wais-iv?

The Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV) is widely recognized as one of the most utilized intelligence tests for adults. While previous studies have examined the test's structure using the Cattell–Horn–Carroll (CHC) model, individuals aged 70 and older have often been excluded due to the absence of supplemental subtests in their standardization sample. Niileksela, Reynolds, and Kaufman (2013) address this gap by presenting an alternative five-factor CHC model tailored for this age group.

Read more →
What is gender differences in technical aptitude?

Frank L. Schmidt’s 2011 article provides an in-depth examination of the observed differences between males and females in technical aptitude. The study attributes these differences to variations in experience and interest in technical domains rather than inherent differences in general mental ability (GMA). Through four predictive tests backed by a comprehensive dataset, Schmidt identifies patterns that inform our understanding of technical aptitude and its implications for employment and education.

Read more →
Why is background important?

The Stanford-Binet and WAIS are two of the most recognized tools for measuring intellectual abilities. Historically, these tests have been used to determine cognitive strengths, weaknesses, and eligibility for various services. The study by Silverman et al. focused on a group of 74 adults with intellectual disabilities, revealing that WAIS Full-Scale IQ scores were consistently higher than the Stanford-Binet Composite IQ scores by an average of 16.7 points. This discrepancy raises questions about the interpretation of results and their implications in clinical and legal contexts.

How does key insights work in practice?

Consistent Discrepancies: The study found that WAIS IQ scores tended to be significantly higher than those from the Stanford-Binet, challenging the assumption that these tests are interchangeable for assessing intellectual disabilities. Impact on Diagnostic Criteria: The differences in scoring may lead to variations in diagnosing intellectual disabilities, particularly in determining eligibility

📋 Cite This Article

Jouve, X. (2010, April 2). Stanford-Binet vs WAIS in Intellectual Disability. PsychoLogic. https://www.psychologic.online/stanford-binet-wais-intellectual-disability/

Leave a Reply