What is significance?

This research underscores the importance of selecting appropriate assessment tools based on the context. Discrepancies between the WAIS and Stanford-Binet highlight the need for cautious interpretation of IQ scores, particularly when they influence critical decisions such as service eligibility, legal proceedings, or monitoring age-related cognitive changes. The findings call for a better understanding of how these tools align with real-world outcomes and diagnostic frameworks.

What are future directions?

Further research should investigate the underlying causes of these score differences and evaluate whether they extend to other populations. Studies could also focus on developing guidelines for choosing between these tools in specific contexts. Additionally, improving the alignment of IQ tests with contemporary diagnostic criteria could enhance their effectiveness and equity.

The work by Silverman et al. highlights significant disparities between two prominent IQ assessment tools. By addressing these differences, researchers and practitioners can better tailor evaluations to individual needs, ensuring more accurate diagnoses and fairer outcomes. This research reminds us of the complexities involved in measuring intelligence and the importance of continuously refining assessment methods.

Silverman, W., Miezejeski, C., Ryan, R., Zigman, W., Krinsky-McHale, S., & Urv, T. (2010). Stanford-Binet and WAIS IQ Differences and Their Implications for Adults with Intellectual Disability. Intelligence, 38(2), 242–248. https://doi.org/10.1016/j.intell.2009.12.005

Stanford-Binet vs WAIS in Intellectual Disability

Published: April 2, 2010 · Last reviewed: May 6, 2026

📖1,770 words⏱7 min read📚6 references cited

Two of the most widely used adult intelligence tests — the Stanford-Binet Intelligence Scales (SB5) and the Wechsler Adult Intelligence Scale (WAIS) — produce notably different scores when administered to the same person on the same day. The discrepancy is not random. Silverman, Miezejeski, Ryan, Zigman, Krinsky-McHale, and Urv (2010) reported that in 74 adults with diagnosed intellectual disability, WAIS Full-Scale IQ scores were on average 16.7 points higher than Stanford-Binet Composite IQ scores — in every one of the 74 individuals. The gap is large enough to change diagnoses, eligibility for services, and in capital cases, life-or-death sentencing decisions.

The Silverman 2010 finding

The study tested 74 adults with previously diagnosed intellectual disability on both the Stanford-Binet (then in its 5th edition, SB5; Roid, 2003) and the Wechsler Adult Intelligence Scale (then in its 3rd or 4th edition; Wechsler, 2008). The scores were compared individually rather than only at the group mean. The result was uniform: every single participant scored higher on the WAIS than on the SB5, with mean WAIS-FSIQ exceeding mean SB-Composite by 16.7 IQ points. The discrepancy was not driven by floor effects (the SB5’s lower minimum possible score) and persisted across the range of intellectual disability severity in the sample.

The directionality and consistency are what make the finding clinically important. Random measurement noise would produce some participants scoring higher on each test; systematic 16.7-point gaps in one direction across all 74 cases reflect something structural about how the two tests differ in this population, not noise.

Why the gap exists

Three structural factors plausibly contribute. First, the tests were normed on different reference samples at different times, and IQ score distributions drift across cohorts — the Flynn effect documents secular increases of roughly 3 IQ points per decade across most of the 20th century (Trahan, Stuebing, Hiscock, & Fletcher, 2014). When two tests are normed years apart, the older norms produce inflated scores relative to current population performance. This is why test publishers re-norm at intervals; it is also why direct comparisons between editions of different test families are interpretively complicated.

Second, the tests sample different cognitive content. The SB5 organises subtests around five CHC factors (Fluid Reasoning, Knowledge, Quantitative Reasoning, Visual-Spatial Processing, Working Memory) administered in both verbal and nonverbal forms. The WAIS-IV organised subtests around four indices (Verbal Comprehension, Perceptual Reasoning, Working Memory, Processing Speed). The content overlap is substantial but not complete, and the two tests put different weights on the underlying broad abilities. In a low-IQ sample, where ceiling and floor effects on individual subtests interact with the composite scoring rules, the resulting Full-Scale or Composite scores can land at meaningfully different points on the IQ scale even when the underlying ability is the same.

Third, the scoring methodology differs. The SB5’s Composite IQ is computed differently from the WAIS-IV’s Full-Scale IQ, and the relationship between subtest performance and the final score is not identical across instruments. For an examinee at the low end of the distribution, small differences in subtest weighting can compound into multi-point differences at the composite level.

Diagnostic implications

The clinical diagnosis of intellectual disability under both DSM-5-TR and the American Association on Intellectual and Developmental Disabilities’ 12th-edition framework (Schalock, Luckasson, & Tassé, 2021) requires three elements: significant limitations in intellectual functioning (typically two standard deviations below the population mean, with appropriate confidence intervals), significant limitations in adaptive functioning, and onset during the developmental period. The IQ criterion is conventionally interpreted as a Full-Scale or Composite score around 70 or below, with the standard error of measurement creating a working range of roughly 65–75 around the cutoff.

The Silverman finding directly affects this diagnostic process. A patient who scores 67 on the SB5 (clearly within the ID range) might score 84 on the WAIS-IV (clearly outside the ID range) on the same day. Whether the patient meets the IQ criterion for ID diagnosis can therefore depend on which test the clinician chose — an unsatisfying state of affairs for a diagnosis that determines eligibility for services, educational supports, disability benefits, and legal protections.

The 16.7-point average gap is also large compared to the standard error of measurement on either test individually (typically 3–5 IQ points). Confidence intervals around a single test’s score do not bridge the gap; a person can be reliably diagnosable on one instrument and reliably non-diagnosable on the other, and the difference reflects the test, not their ability.

Forensic and legal implications

The largest practical stake of the SB-WAIS discrepancy is in capital-case forensic evaluation. The U.S. Supreme Court’s 2002 decision in Atkins v. Virginia (536 U.S. 304) ruled that executing individuals with intellectual disability is unconstitutional under the Eighth Amendment, making accurate ID diagnosis directly relevant to whether a defendant is eligible for the death penalty. The court did not specify which IQ test must be used or how scores should be interpreted, leaving this to state-level practice and individual clinical judgement.

In a forensic context, the choice of test — and the interpretation of scores against the diagnostic threshold — can determine the outcome. A defendant whose true ability sits in the 65–75 borderline range may meet the ID criterion on the SB5 and not on the WAIS, with the WAIS systematically biased toward inflating scores relative to the SB in this exact population. The Silverman 2010 paper has been cited in subsequent legal and forensic-psychology literature as part of the case for cautious interpretation of single-test scores in capital cases, with explicit attention to which instrument was administered and what its known biases are. Both the SB5 and the WAIS family are reviewed alongside each other in comparative analyses of the two instruments for clinical and research use.

Practical guidance

Three practical takeaways follow from the Silverman finding. First, in any clinical or forensic context where the diagnostic decision turns on whether a Full-Scale IQ score is above or below 70, the test choice should be deliberate and documented — the gap between the SB5 and the WAIS is large enough that the choice carries real information. Second, when feasible, administering both tests provides a more defensible picture than relying on either alone, particularly in low-IQ samples where the discrepancy is most pronounced. Third, a single Full-Scale or Composite score should always be reported with its confidence interval, and the interpretation should explicitly acknowledge the instrument used and its known properties relative to alternatives.

For typical-population assessment, where the discrepancy is smaller and the diagnostic stakes lower, either test produces broadly defensible scores. The Silverman gap is largest in the very samples where the consequence of getting it wrong is largest — ID populations and forensic cases — and that is precisely where the test-choice decision deserves the most attention.

What is still open

The Silverman 2010 sample (n = 74) is informative but not definitive. The participants were already diagnosed with ID at study entry, which constrains generalisability to the diagnostic decision itself (i.e., the population in whom the discrepancy matters most clinically). Subsequent work has examined SB-WAIS comparisons in different age groups, different specific ID aetiologies, and different test editions; the broad pattern of WAIS scores running higher than SB scores in low-IQ samples replicates, though the precise size of the gap varies. The interpretive question of which test better tracks “real” ability in this population is unresolvable in principle: both are validated against external criteria, both have published psychometric properties, and the choice is in practice driven by clinician training, available resources, and the specific decision the score is being used to inform.

Frequently asked questions

How big is the difference between Stanford-Binet and WAIS scores?

Silverman et al. (2010) reported a mean difference of 16.7 IQ points in a sample of 74 adults with intellectual disability, with WAIS scores higher than Stanford-Binet scores in every one of the 74 individuals. In typical-population samples, the discrepancy is smaller; the gap is largest in low-IQ populations where diagnostic decisions are most consequential.

Which test is more accurate for diagnosing intellectual disability?

Neither is straightforwardly “more accurate.” Both the SB5 and WAIS-IV/V are validated psychometric instruments with published reliability and validity evidence. They produce different scores for structural reasons (different norm samples, different content sampling, different scoring methodology). The diagnostic question of which IQ score better corresponds to “true” ability in any given individual cannot be resolved by the tests alone — clinical judgement, adaptive-functioning assessment, and developmental history all bear on the diagnosis.

Why does this matter for the death penalty?

The U.S. Supreme Court’s 2002 decision in Atkins v. Virginia ruled that executing individuals with intellectual disability is unconstitutional under the Eighth Amendment. ID diagnosis is therefore directly relevant to capital-case eligibility. A defendant whose ability sits in the borderline range may meet the IQ criterion for ID on one test but not the other, and the choice of test (and its interpretation) can determine the outcome. The Silverman finding is part of the basis for cautious, multi-test interpretation in forensic contexts.

Should clinicians use both tests in low-IQ assessment?

When feasible, yes. Administering both provides a more defensible picture than relying on either alone, particularly when the diagnostic decision hinges on whether a single composite score is above or below 70. When only one test is administered, the choice should be deliberate, the result should be reported with its confidence interval, and the interpretation should acknowledge what is known about the instrument’s properties relative to alternatives.

Is the gap due to the Flynn effect?

Partly, but not entirely. The Flynn effect (Trahan et al., 2014) means that older norms produce inflated scores relative to current population performance, so when two tests were normed years apart, the older test would be expected to over-score relative to the newer one. The Silverman 2010 study found WAIS scores higher than SB5 scores at a time when WAIS-III/IV norms were older than SB5 norms, which is consistent with a Flynn-effect contribution. But the gap appears too large to be Flynn-effect alone, and structural differences in content sampling and scoring methodology also contribute.

References

Roid, G. H. (2003). Stanford-Binet Intelligence Scales, Fifth Edition (SB5). Riverside Publishing.
Schalock, R. L., Luckasson, R., & Tassé, M. J. (2021). An overview of intellectual disability: Definition, diagnosis, classification, and systems of supports (12th ed.). American Journal on Intellectual and Developmental Disabilities, 126(6), 439–442. https://doi.org/10.1352/1944-7558-126.6.439
Silverman, W., Miezejeski, C., Ryan, R., Zigman, W., Krinsky-McHale, S., & Urv, T. (2010). Stanford-Binet and WAIS IQ differences and their implications for adults with intellectual disability. Intelligence, 38(2), 242–248. https://doi.org/10.1016/j.intell.2009.12.005
Trahan, L. H., Stuebing, K. K., Hiscock, M. K., & Fletcher, J. M. (2014). The Flynn effect: A meta-analysis. Psychological Bulletin, 140(5), 1332–1360. https://doi.org/10.1037/a0037173
Wechsler, D. (2008). Wechsler Adult Intelligence Scale—Fourth Edition. Pearson.
Atkins v. Virginia, 536 U.S. 304 (2002).

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

IQ Scores and Ranges

What Is Mensa? Membership and Testing

Mensa. The name conjures images of genius-level intellects gathering to solve the world's hardest puzzles. In reality, the world's largest and oldest high-IQ society is…

Mar 25, 2026

Cognitive Abilities and Intelligence

Sex Differences in Cognitive Abilities

Few topics in psychology generate more heat and less light than sex differences in cognitive abilities. Claims range from "men and women are cognitively identical"…

Dec 3, 2025

Psychological Measurement and Testing

WAIS-IV vs. WAIS-V: What Changed

Pearson released the Wechsler Adult Intelligence Scale, Fifth Edition (WAIS-5) in late 2024 — the first major revision since the WAIS-IV appeared in 2008. For…

Aug 7, 2025

Psychological Measurement and Testing

History of the WAIS: Wechsler-Bellevue to WAIS-V

The Wechsler Adult Intelligence Scale is the most widely administered individual IQ test in clinical practice, and has been for most of the 70 years…

Oct 27, 2023

Psychological Measurement and Testing

JCFS: Assessing Nonverbal Intelligence

Most cognitive ability tests in widespread use are either verbal-heavy (vocabulary, comprehension, knowledge) or rely on a fixed sequence of items that takes everyone a…

Apr 12, 2023

Stanford-Binet vs WAIS in Intellectual Disability

The Silverman 2010 finding

Why the gap exists

Diagnostic implications

Forensic and legal implications

Practical guidance

What is still open

Frequently asked questions

How big is the difference between Stanford-Binet and WAIS scores?

Which test is more accurate for diagnosing intellectual disability?

Why does this matter for the death penalty?

Should clinicians use both tests in low-IQ assessment?

Is the gap due to the Flynn effect?

References

Related Research

What Is Mensa? Membership and Testing

Sex Differences in Cognitive Abilities

WAIS-IV vs. WAIS-V: What Changed

History of the WAIS: Wechsler-Bellevue to WAIS-V

JCFS: Assessing Nonverbal Intelligence

People Also Ask

Leave a Reply Cancel reply

The Silverman 2010 finding

Why the gap exists

Diagnostic implications

Forensic and legal implications

Practical guidance

What is still open

Frequently asked questions

How big is the difference between Stanford-Binet and WAIS scores?

Which test is more accurate for diagnosing intellectual disability?

Why does this matter for the death penalty?

Should clinicians use both tests in low-IQ assessment?

Is the gap due to the Flynn effect?

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply