Psychological Measurement and Testing

Nonmemory-Based Performance Validity Tests

Improving Detection of Noncredible Results with Nonmemory-Based Performance Validity Tests
Published: October 2, 2020 · Last reviewed:
📖1,801 words8 min read📚4 references cited

Performance validity tests (PVTs) are the neuropsychologist’s safeguard against test results that look like cognitive impairment but actually reflect insufficient effort, deliberate underperformance, or response bias. The classical PVT toolkit is dominated by memory-based instruments — the Test of Memory Malingering (TOMM), the Word Memory Test, the Medical Symptom Validity Test — that work by exploiting the gap between true memory performance and the floor performance a feigning examinee tends to produce. The memory orientation made historical sense, since malingered cognitive impairment most often presents as exaggerated forgetting, but it leaves a methodological gap: clinicians need PVTs that share little measurement variance with memory-based instruments, so that a multi-test validity battery can detect noncredible performance through more than one route.

Webber, Critchfield, and Soble (2020), in Assessment, evaluate the convergent, discriminant, and concurrent validity of four nonmemory-based PVTs — the Dot Counting Test (DCT), the WAIS-IV Reliable Digit Span (RDS), Reliable Digit Span Revised (RDS-R), and the WAIS-IV Digit Span Age-Corrected Scaled Score (ACSS) — and report a result that has practical consequences for which combinations belong in a validity battery and which are redundant.

The framework: MND criteria and the AACN consensus

The conceptual anchor for performance validity testing is Slick, Sherman, and Iverson’s (1999) diagnostic criteria for malingered neurocognitive dysfunction (MND). The criteria specify that a diagnosis of malingered impairment requires evidence from multiple sources — external incentive, behavioral indicators, and psychometric markers including PVT failures — meeting structured thresholds for “definite”, “probable”, or “possible” MND. The framework was the field’s first formal proposal to anchor effort-based diagnoses in psychometric data rather than clinical impression alone.

The American Academy of Clinical Neuropsychology consensus conference statement (Heilbronner et al., 2009) extended the MND framework into a practice standard: every neuropsychological evaluation should include validity testing, and a single PVT failure is generally insufficient evidence for a noncredible performance determination. Two or more PVTs from independent measurement domains were recommended as the minimum, on the reasoning that any single PVT can fail spuriously due to genuine cognitive impairment, confusion, or random error, and that convergent failures across instruments are more diagnostic than a single-test signal.

Sherman, Slick, and Iverson (2020) updated the MND criteria into MND-2, sharpening several thresholds and explicitly addressing the role of embedded validity indicators (PVT signals derived from standard cognitive subtests rather than from dedicated effort instruments). The Webber-Critchfield-Soble 2020 paper sits in this lineage: it asks which nonmemory-based PVTs and embedded indicators contribute incremental validity to a battery that is otherwise dominated by memory-based instruments.

The four nonmemory-based PVTs evaluated

Dot Counting Test (DCT). Examinees count groups of dots — easy ungrouped arrays interspersed with structured grouped arrays — under timed conditions. Genuinely impaired examinees show a predictable pattern: longer counting time on grouped arrays than ungrouped, because the structured arrays facilitate efficient counting strategies. Feigning examinees often miss this pattern, producing flatter or inverted timing profiles. The DCT is brief, requires no specialized stimuli, and operates on a perceptual rather than memory substrate.

Reliable Digit Span (RDS). The longest forward and longest backward span an examinee passes on both trials of the WAIS-IV Digit Span subtest, summed. Below an empirically derived cut-score (most often ≤6), the result is flagged as suggestive of noncredible performance. RDS exploits the fact that digit span is highly resistant to most genuine impairments — even moderate dementia produces only modest reductions — so very low spans in an examinee whose other cognition seems intact suggest deliberate underperformance.

Reliable Digit Span Revised (RDS-R). A modification by Schroeder et al. that incorporates Sequencing trials from the WAIS-IV Digit Span subtest, computing an extended sum across forward, backward, and sequencing components. RDS-R was proposed to capture a broader span profile and improve sensitivity in subtle feigning cases.

WAIS-IV Digit Span ACSS. The age-corrected scaled score for the full Digit Span subtest, used as a stand-alone PVT cut-off rather than as a sub-scoring index. Webber and colleagues found that this simpler approach — using the full subtest’s standard score with cuts at ≤5 or ≤6 — produced classification accuracy equal to or better than RDS and RDS-R in their sample, raising the question of whether the more elaborate sub-scoring procedures are worth their complexity.

What the data showed

Webber and colleagues reported that the four nonmemory-based PVTs were significantly correlated with each other but uncorrelated with memory-based PVTs administered in the same battery. The correlation pattern is the practical case for including nonmemory-based instruments: they cluster as a coherent factor distinct from memory-based PVTs, so a battery that draws from both clusters has redundancy where it matters (within a cluster, where multiple measures of the same underlying construct provide convergent signal) and independence where it matters (between clusters, where the diagnostic value comes from convergence across measurement methods).

The headline classification finding: WAIS-IV Digit Span ACSS at cut-scores of ≤5 or ≤6 produced equal or better accuracy than RDS or RDS-R for distinguishing valid-unimpaired examinees from noncredible performers. The simpler instrument matched or beat the more elaborate sub-scoring procedures. Combining DCT with any of the digit-span variants improved classification accuracy in valid-unimpaired examinees but was less effective for valid-impaired examinees — patients with genuine cognitive deficits who passed PVTs were classified more reliably with a single instrument than with combinations, presumably because the combined cuts produce more false-positive flags in genuinely impaired performers.

The pairing the authors highlighted as practically optimal: DCT plus WAIS-IV Digit Span ACSS. The combination uses two independent measurement methods (perceptual counting plus auditory-verbal short-term memory), pulls from the broadly validated nonmemory-based cluster, and uses the simplest digit-span scoring approach available. For a clinician building a multi-instrument validity battery, this is the recommendation that fell out of the data.

Where this fits in the practical PVT workflow

The clinically actionable summary is that a contemporary neuropsychological battery should include at least two PVTs, drawn from at least two distinct measurement domains, with at least one of them nonmemory-based. The reason is structural rather than statistical: an examinee who feigns memory impairment may pass a perceptual-counting PVT, and an examinee who feigns generalized cognitive impairment may pass an instrument that targets memory specifically. Convergent failures across uncorrelated domains are diagnostic; a single failure on a single instrument is suggestive but not sufficient.

For batteries that already include the WAIS-IV — common in contemporary clinical practice — the embedded ACSS PVT is essentially free: the subtest is being administered anyway, and the validity check is a matter of comparing the age-corrected scaled score against an established cut. Adding the DCT is a five-minute procedure that contributes signal from a different measurement substrate. The combination is a reasonable default for clinicians who want validity coverage without lengthening the protocol substantially.

The framework remains less effective for valid-impaired examinees — patients with genuine cognitive impairment who are nonetheless giving credible effort. False-positive rates rise in this group across all PVTs, and the combinatorial approach does not solve the problem. The clinical reading in such cases requires informed clinical judgment rather than mechanical application of cut-scores: a patient with documented severe TBI may legitimately produce digit spans below typical PVT thresholds without feigning. The PVT signal is one input among several, not a verdict.

Where this fits in the broader validity literature

Performance validity testing has grown from a niche forensic concern into a standard component of every major neuropsychological evaluation, and the methodological literature has matured along with it. The MND criteria (Slick et al., 1999; Sherman et al., 2020) provide the diagnostic framework; the AACN consensus statement (Heilbronner et al., 2009) provides the practice standard; the validation literature on individual PVTs — including Webber and colleagues’ analysis of the nonmemory-based instruments — provides the empirical basis for choosing which tests to include.

The unresolved methodological challenges are substantial: how to handle valid-impaired examinees with elevated false-positive rates, how to set culturally and linguistically appropriate cut-scores, how to integrate symptom validity tests (which target self-reported symptoms rather than performance) into the same diagnostic framework, and how to weigh PVT failures against external evidence of effort or motivation. The Webber 2020 contribution sits within this larger project, supplying one piece of the structural-validity puzzle: which nonmemory-based PVTs cluster together, which are redundant with each other, and which combinations buy the most diagnostic information per minute of administration.

Frequently Asked Questions

Why do neuropsychologists use multiple PVTs rather than just one?

Single PVTs have non-trivial false-positive rates, particularly in patients with genuine cognitive impairment. The AACN consensus practice standard (Heilbronner et al., 2009) requires at least two PVTs from independent measurement domains, on the principle that convergent failures across uncorrelated tests carry much stronger diagnostic weight than a single-test signal.

What is the Reliable Digit Span (RDS) and how is it different from RDS-R?

RDS is the sum of the longest forward and longest backward digit spans an examinee passes on both trials, with a cut-score (typically ≤6) flagging suspect performance. RDS-R extends this by adding the WAIS-IV Sequencing trials to the sum, on the rationale that the additional component improves sensitivity. Webber and colleagues (2020) found that the simpler ACSS-based cut performed at least as well as either RDS variant.

Why is digit span a useful PVT?

Digit span is one of the most robust cognitive measures: even moderate dementia produces only modest reductions, and most genuine impairments preserve the ability to repeat short sequences. Very low spans in an examinee whose other cognition seems intact suggest deliberate underperformance rather than legitimate impairment.

How does the Dot Counting Test work?

Examinees count groups of dots arranged either ungrouped (random scatter) or grouped (structured arrays). Genuinely impaired examinees show longer counting times on ungrouped arrays than on grouped ones, because grouping facilitates efficient counting. Feigning examinees often miss this asymmetry, producing flatter or inverted timing profiles that the DCT scoring system flags.

What does it mean if a patient fails one PVT but passes another?

It is suggestive but not diagnostic. The MND-2 criteria (Sherman et al., 2020) and AACN practice standard require multiple lines of evidence — including PVT failures, external incentive context, and behavioral indicators — to meet thresholds for “probable” or “definite” malingered neurocognitive dysfunction. A single PVT failure is one input among several, not a verdict on its own.

References

  • Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S. R. (2009). American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23(7), 1093–1129. https://doi.org/10.1080/13854040903155063
  • Archives of Clinical Neuropsychology, 35(6), 735–764. https://doi.org/10.1093/arclin/acaa019
  • Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13(4), 545–561. https://doi.org/10.1076/clin.13.4.545.1885
  • Webber, T. A., Critchfield, E. A., & Soble, J. R. (2020). Convergent, discriminant, and concurrent validity of nonmemory-based performance validity tests. Assessment, 27(7), 1399–1415. https://doi.org/10.1177/1073191118804874

Related Research

Educational Psychology and Interventions

Mental Math Training and Cognitive Performance

The relationship between mental arithmetic skill and broader cognitive performance is more interesting than the conventional "math is hard" framing suggests. People who can rapidly…

Jan 4, 2013
Cognitive Abilities and Intelligence

Gender, Education, and Cognitive Outcomes

A 2010 study by Jouve, drawing on 251 examinees of the Jouve Cerebrals Test of Induction (JCTI), found that males scored higher than females on…

Jan 27, 2010
Statistical Methods and Data Analysis

Validity and Reliability of the CCAT

The Cerebrals Cognitive Abilities Test (CCAT) is the original three-subtest crystallized-intelligence battery developed in the Cogn-IQ research program. It contains Verbal Analogies (VA), Mathematical Problems…

Jan 8, 2010
Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026
Psychological Measurement and Testing

Autism and IQ: Intelligence in ASD

What does research say about IQ in autism? Learn about cognitive profiles in ASD, why traditional IQ tests may be misleading, and how intelligence varies across the autism spectrum.

Feb 19, 2026

People Also Ask

How Mental Arithmetic Affects High School Math Performance?

Price, Mazzocco, and Ansari (2013) conducted a study to investigate the brain mechanisms involved in mental arithmetic and their connection to high school math performance. By examining brain activity during single-digit calculations, the researchers highlighted how specific neural patterns relate to mathematical competence, measured through PSAT math scores. This work contributes to understanding the neural basis of mathematical ability.

Read more →
What are gender and education: their interplay in cognitive test outcomes?

This study examines how educational attainment and gender intersect to influence performance on the Jouve Cerebrals Test of Induction (JCTI). By analyzing a diverse group of 251 individuals, the research highlights how cognitive performance varies across different stages of education and between genders.

Read more →
What are evaluating the reliability and validity of the tri52: a computerized nonverbal intelligence test?

The TRI52 is a computerized nonverbal intelligence test composed of 52 figurative items designed to measure cognitive abilities without relying on acquired knowledge. This study aims to investigate the reliability, validity, and applicability of TRI52 in diverse populations. The TRI52 demonstrates high reliability, as indicated by a Cronbach's Alpha coefficient of .92 (N = 1,019). Furthermore, the TRI52 Reasoning Index (RIX) exhibits strong correlations with established measures, such as the Scholastic Aptitude Test (SAT) composite score, SAT Mathematical Reasoning test scaled score, Wechsler Adult Intelligence Scale III (WAIS-III) Full-Scale IQ, and the Slosson Intelligence Test—Revised (SIT-R3) Total Standard Score. The nonverbal nature of the TRI52 minimizes cultural biases, making it suitable for diverse populations. The results support the potential of TRI52 as a reliable and valid measure of nonverbal intelligence.

Read more →
What are assessing the validity and reliability of the cerebrals cognitive abilities test (ccat)?

The Cerebrals Cognitive Abilities Test (CCAT) is a psychometric test battery comprising three subtests: Verbal Analogies (VA), Mathematical Problems (MP), and General Knowledge (GK). The CCAT is designed to assess general crystallized intelligence and scholastic ability in adolescents and adults. This study aimed to investigate the reliability, criterion-related validity, and norm establishment of the CCAT. The results indicated excellent reliability, strong correlations with established measures, and suitable age-referenced norms. The findings support the use of the CCAT as a valid and reliable measure of crystallized intelligence and scholastic ability.

Read more →
Why is background important?

Performance Validity Tests (PVTs) are designed to identify cases where neuropsychological test results may not accurately reflect a person's true abilities, often due to insufficient effort or intentional underperformance. While memory-based PVTs are widely used, the article focuses on nonmemory-based PVTs, offering an alternative approach for evaluating test validity in specific scenarios.

How does key insights work in practice?

Correlation Between PVTs: The study finds significant correlations among the Dot Counting Test (DCT), Reliable Digit Span (RDS), Revised RDS (RDS-R), and Age-Corrected Scaled Score (ACSS) from the WAIS-IV Digit Span subtest. However, these tools show limited correlation with memory-based PVTs. Combining Tools for Accuracy: When RDS, RDS-R, and ACSS are

📋 Cite This Article

Jouve, X. (2020, October 2). Nonmemory-Based Performance Validity Tests. PsychoLogic. https://www.psychologic.online/nonmemory-performance-validity-tests/

Leave a Reply