How to Interpret IQ Test Results: A Psychometrician’s Guide

Q: What do individual subtest scores tell you?

Below the index level, individual subtest scores (scaled scores with a mean of 10 and SD of 3) provide the finest grain of analysis. On the WAIS-V, for example: However, individual subtest scores have lower reliability than composite scores (typically .80–.90 vs. .95+ for composites). This means they carry more measurement error and should be interpreted cautiously. A single low subtest score could reflect genuine weakness, testing conditions (fatigue, distraction), or simply measurement noise.

Q: How do you determine if a score is a strength or weakness?

There are two frames of reference: Normative comparison (compared to the population): Is the score above or below the population average of 100 (for indices) or 10 (for subtests)? An index score of 115 is a normative strength; 85 is a normative weakness. Ipsative comparison (compared to the person's own average): Is the score above or below this individual's own mean? A person with an FSIQ of 125 who scores 105 on Processing Speed has an ipsative weakness — their processing speed is significantly below their own overall level, even though it's average compared to the population.

Q: What are common misinterpretations to avoid?

Having worked in psychometric assessment, these are the errors I see most frequently: 1. Treating IQ as permanent and fixed. IQ scores show good stability over time (test-retest correlations of .85–.95 over months, .70–.85 over years), but they can and do change, especially in children, after significant life events, or with major environmental changes. The Flynn Effect demonstrates that even population-level IQ changes over generations.

Q: When should you seek retesting?

Retesting may be warranted when: Standard practice recommends waiting at least 12 months between administrations of the same test to minimize practice effects. If earlier retesting is needed, a different test (e.g., switching from WAIS-V to Stanford-Binet) can reduce practice effects.

Published: March 2, 2026

You’ve received an IQ test report — perhaps for yourself, your child, or a client. It’s filled with numbers, percentiles, confidence intervals, and subtest scores. What does it all mean? This guide walks you through interpreting a cognitive ability report the way a psychometrician would, helping you understand not just what the scores say, but what they don’t.

Key Takeaway: IQ scores are not fixed labels but probabilistic estimates with meaningful margins of error. A single Full Scale IQ number can obscure important patterns in cognitive strengths and weaknesses. Understanding composite scores, confidence intervals, and score discrepancies provides far richer — and more accurate — insight than a single number ever could.

What is the Full Scale IQ (FSIQ) score?

Key Takeaway: The Full Scale IQ is the headline number — the single score that summarizes overall cognitive ability. On modern tests like the WAIS-V and WISC-V, it's set to a mean of 100 and a standard deviation of 15.

The Full Scale IQ is the headline number — the single score that summarizes overall cognitive ability. On modern tests like the WAIS-V and WISC-V, it’s set to a mean of 100 and a standard deviation of 15.

This means:

FSIQ Range	Classification	Percentile Range	Approximate Frequency
130+	Very Superior / Extremely High	98th+	~2.2% of population
120–129	Superior / Very High	91st–97th	~6.7%
110–119	High Average	75th–90th	~16.1%
90–109	Average	25th–74th	~50%
80–89	Low Average	9th–24th	~16.1%
70–79	Borderline	2nd–8th	~6.7%
Below 70	Extremely Low	Below 2nd	~2.2%

The FSIQ is useful as a general summary, but treating it as the only important number is one of the most common mistakes in IQ interpretation. It’s an average of several distinct abilities — and averages can be misleading.

Why are composite (index) scores more informative than FSIQ?

Key Takeaway: Modern intelligence tests don't measure a single ability. The WAIS-V, for example, breaks cognitive ability into five primary index scores: A person with an FSIQ of 105 might have a VCI of 120 and a PSI of 85 — very different from someone with a flat profile of 105 across all indices.

Modern intelligence tests don’t measure a single ability. The WAIS-V, for example, breaks cognitive ability into five primary index scores:

Verbal Comprehension Index (VCI): Vocabulary, general knowledge, verbal reasoning — broadly, crystallized intelligence
Fluid Reasoning Index (FRI): Novel problem-solving, pattern recognition, abstract reasoning — broadly, fluid intelligence
Visual Spatial Index (VSI): Spatial visualization, mental rotation, visual-motor integration
Working Memory Index (WMI): Holding and manipulating information in mind, mental arithmetic
Processing Speed Index (PSI): Speed and accuracy of visual scanning, symbol coding, simple decision-making

A person with an FSIQ of 105 might have a VCI of 120 and a PSI of 85 — very different from someone with a flat profile of 105 across all indices. The first person has a genuine verbal strength and a processing speed weakness; the second has uniformly average abilities. Their FSIQ number is identical, but their cognitive profiles — and the practical implications — are quite different.

Understanding the distinction between fluid and crystallized intelligence is particularly important here, as these abilities develop differently, respond to different interventions, and predict different outcomes.

What are confidence intervals and why do they matter?

Key Takeaway: Every IQ score is an estimate, not an exact measurement. This is one of the most misunderstood aspects of psychological testing. If you took the same test twice (with no practice effects), you wouldn't get exactly the same score — you'd get a range of scores centered around your "true" ability level.

Every IQ score is an estimate, not an exact measurement. This is one of the most misunderstood aspects of psychological testing. If you took the same test twice (with no practice effects), you wouldn’t get exactly the same score — you’d get a range of scores centered around your “true” ability level.

The confidence interval quantifies this uncertainty. A 95% confidence interval for an FSIQ of 112 might be 107–117, meaning we’re 95% confident the person’s true FSIQ falls somewhere in that range.

Practical implications:

An FSIQ of 128 with a 95% CI of 123–133 might or might not cross the “gifted” threshold of 130 — the single number makes this look certain, but the confidence interval reveals genuine uncertainty
Two people scoring 95 and 102 may not actually differ in ability — their confidence intervals likely overlap substantially
Narrower confidence intervals indicate more reliable measurement (more subtests, longer testing, optimal conditions)

The Standard Error of Measurement (SEM) for the WAIS-V FSIQ is approximately 2.6 points. This means even under ideal conditions, a single test administration has a ±5 point range at the 95% confidence level. For individual index scores (which are based on fewer subtests), the SEM is typically larger — around 3.5–5 points.

How should you interpret score differences between indices?

Key Takeaway: When one index score is notably higher or lower than others, this is called a discrepancy or scatter. But not all differences are meaningful. You need to consider both: Statistical significance: Is the difference large enough that it's unlikely to have occurred by chance? Typically, a difference of 10–15 points between index scores reaches statistical…

When one index score is notably higher or lower than others, this is called a discrepancy or scatter. But not all differences are meaningful. You need to consider both:

Statistical significance: Is the difference large enough that it’s unlikely to have occurred by chance? Typically, a difference of 10–15 points between index scores reaches statistical significance (p < .05). The test manual provides exact critical values.

Base rate (clinical rarity): Even if a difference is statistically significant, it might be common in the general population. For example, a 15-point difference between VCI and PSI occurs in roughly 25% of normal adults — it’s statistically significant but not clinically unusual. A 25-point difference, occurring in only 5–10% of the population, is both significant and rare, warranting closer examination.

The distinction matters because clinicians sometimes over-interpret normal variation as pathological. A profile with some scatter is the norm, not the exception — perfectly flat profiles are actually quite rare.

What do individual subtest scores tell you?

Key Takeaway: Below the index level, individual subtest scores (scaled scores with a mean of 10 and SD of 3) provide the finest grain of analysis. On the WAIS-V, for example: However, individual subtest scores have lower reliability than composite scores (typically .80–.90 vs. .95+ for composites).

Below the index level, individual subtest scores (scaled scores with a mean of 10 and SD of 3) provide the finest grain of analysis. On the WAIS-V, for example:

Vocabulary — breadth and depth of word knowledge; strongly influenced by education and reading history
Similarities — abstract verbal reasoning; identifying conceptual relationships between words
Block Design — visual-spatial construction; analyzing and reproducing geometric patterns
Matrix Reasoning — nonverbal fluid reasoning; finding patterns in visual sequences
Digit Span — working memory; repeating and mentally manipulating number sequences
Coding — processing speed; rapidly copying symbol-number pairs under time pressure

However, individual subtest scores have lower reliability than composite scores (typically .80–.90 vs. .95+ for composites). This means they carry more measurement error and should be interpreted cautiously. A single low subtest score could reflect genuine weakness, testing conditions (fatigue, distraction), or simply measurement noise.

The research on short-form IQ estimation demonstrates why composites are more reliable: aggregating multiple indicators cancels out random error, producing more stable and valid estimates.

How do you determine if a score is a strength or weakness?

Key Takeaway: There are two frames of reference: Normative comparison (compared to the population): Is the score above or below the population average of 100 (for indices) or 10 (for subtests)? An index score of 115 is a normative strength; 85 is a normative weakness.

There are two frames of reference:

Normative comparison (compared to the population): Is the score above or below the population average of 100 (for indices) or 10 (for subtests)? An index score of 115 is a normative strength; 85 is a normative weakness.

Ipsative comparison (compared to the person’s own average): Is the score above or below this individual’s own mean? A person with an FSIQ of 125 who scores 105 on Processing Speed has an ipsative weakness — their processing speed is significantly below their own overall level, even though it’s average compared to the population.

Both perspectives matter. A normative weakness affects absolute performance (can the person do grade-level work?). An ipsative weakness may explain frustrations or difficulties that seem inconsistent with overall ability (a bright student who can’t finish timed tests).

What are common misinterpretations to avoid?

Key Takeaway: Having worked in psychometric assessment, these are the errors I see most frequently: 1. Treating IQ as permanent and fixed. IQ scores show good stability over time (test-retest correlations of .85–.95 over months, .70–.85 over years), but they can and do change, especially in children, after significant life events, or with major environmental changes.

Having worked in psychometric assessment, these are the errors I see most frequently:

1. Treating IQ as permanent and fixed. IQ scores show good stability over time (test-retest correlations of .85–.95 over months, .70–.85 over years), but they can and do change, especially in children, after significant life events, or with major environmental changes. The Flynn Effect demonstrates that even population-level IQ changes over generations.

2. Over-interpreting small score differences. A 3-point difference between two index scores is meaningless noise. Even a 7-point difference rarely reaches statistical significance. Yet parents sometimes agonize over why their child scored 108 on one index and 103 on another — these scores are essentially identical.

3. Ignoring the testing context. Was the person anxious? Fatigued? Tested in a noisy room? Medicated or ill? Testing conditions significantly affect scores, particularly on timed tasks (Processing Speed, Working Memory). A PSI score obtained while a child was fighting the flu should be interpreted differently than one obtained under optimal conditions.

4. Equating IQ with worth, potential, or destiny. IQ tests measure a specific set of cognitive abilities that predict certain outcomes (academic performance, job training success). They don’t measure creativity, wisdom, social intelligence, motivation, character, or dozens of other qualities that matter for a fulfilling life.

5. Comparing scores across different tests. A WISC-V FSIQ of 112 is not directly comparable to a Stanford-Binet 5 FSIQ of 112 or a score from an online IQ test. Different tests use different norms, include different subtests, and may be normed on different populations. Our analysis of WAIS-IV vs. Stanford-Binet comparisons illustrates these discrepancies.

When should you seek retesting?

Key Takeaway: Retesting may be warranted when: Standard practice recommends waiting at least 12 months between administrations of the same test to minimize practice effects. If earlier retesting is needed, a different test (e.g., switching from WAIS-V to Stanford-Binet) can reduce practice effects.

Retesting may be warranted when:

Testing conditions were suboptimal (illness, anxiety, environmental distractions)
Scores seem inconsistent with observed behavior and academic/professional performance
A significant life change has occurred since the last testing (brain injury, treatment, educational intervention)
The previous test is more than 2–3 years old for children (whose abilities are still developing) or the norms are outdated (the Flynn Effect means older norms may inflate scores by ~3 points per decade)
High-stakes decisions (gifted placement, disability classification, forensic evaluation) require the highest confidence in the scores

Standard practice recommends waiting at least 12 months between administrations of the same test to minimize practice effects. If earlier retesting is needed, a different test (e.g., switching from WAIS-V to Stanford-Binet) can reduce practice effects.

How to read a test report: a step-by-step approach

Start with the FSIQ and its confidence interval — get the general range of overall ability
Check index scores — look for significant discrepancies (15+ points between highest and lowest). If scatter is large, the FSIQ may not be the best summary
Compare indices to each other — use both statistical significance tables and base rate tables from the test manual
Look at subtest scores within each index — note any subtests that deviate markedly from the index mean
Consider the testing context — behavioral observations, test conditions, the examiner’s qualitative notes
Integrate with other data — IQ scores should be interpreted alongside academic records, behavioral observations, medical history, and other assessment data

A well-written psychological report does all of this interpretation for you. But understanding the principles empowers you to ask better questions and avoid being misled by oversimplified summaries.

For more on the science behind cognitive measurement, explore our psychological measurement and testing research.

What is the Full Scale IQ (FSIQ) score?

Why are composite (index) scores more informative than FSIQ?

What are confidence intervals and why do they matter?

How should you interpret score differences between indices?

What do individual subtest scores tell you?

How do you determine if a score is a strength or weakness?

What are common misinterpretations to avoid?

When should you seek retesting?

How to read a test report: a step-by-step approach

Related Research

People Also Ask

You may also like...

Popular Posts