What is significance?

This research contributes to the ongoing refinement of neuropsychological assessments by offering an evidence-based approach to enhance test validity. The findings highlight the potential of nonmemory-based PVTs to complement traditional methods, ensuring more accurate and reliable results, particularly for individuals without cognitive impairments.

What are future directions?

Further research is needed to explore the applicability of these findings to a broader range of clinical and non-clinical populations. Additionally, understanding why the combined method is less effective for valid-impaired examinees could inform the development of tailored PVT strategies that address this limitation.

This study provides valuable insights into the role of nonmemory-based PVTs in detecting noncredible performance. By highlighting effective combinations of tools like DCT and ACSS, the research supports a more nuanced approach to neuropsychological assessment, paving the way for continued improvements in validity testing.

Webber, T. A., Critchfield, E. A., & Soble, J. R. (2020). Convergent, Discriminant, and Concurrent Validity of Nonmemory-Based Performance Validity Tests. Assessment, 27(7), 1399-1415. https://doi.org/10.1177/1073191118804874

Nonmemory-Based Performance Validity Tests

Published: October 2, 2020 · Last reviewed: May 7, 2026

📖1,801 words⏱8 min read📚4 references cited

Performance validity tests (PVTs) are the neuropsychologist’s safeguard against test results that look like cognitive impairment but actually reflect insufficient effort, deliberate underperformance, or response bias. The classical PVT toolkit is dominated by memory-based instruments — the Test of Memory Malingering (TOMM), the Word Memory Test, the Medical Symptom Validity Test — that work by exploiting the gap between true memory performance and the floor performance a feigning examinee tends to produce. The memory orientation made historical sense, since malingered cognitive impairment most often presents as exaggerated forgetting, but it leaves a methodological gap: clinicians need PVTs that share little measurement variance with memory-based instruments, so that a multi-test validity battery can detect noncredible performance through more than one route.

Webber, Critchfield, and Soble (2020), in Assessment, evaluate the convergent, discriminant, and concurrent validity of four nonmemory-based PVTs — the Dot Counting Test (DCT), the WAIS-IV Reliable Digit Span (RDS), Reliable Digit Span Revised (RDS-R), and the WAIS-IV Digit Span Age-Corrected Scaled Score (ACSS) — and report a result that has practical consequences for which combinations belong in a validity battery and which are redundant.

The framework: MND criteria and the AACN consensus

The conceptual anchor for performance validity testing is Slick, Sherman, and Iverson’s (1999) diagnostic criteria for malingered neurocognitive dysfunction (MND). The criteria specify that a diagnosis of malingered impairment requires evidence from multiple sources — external incentive, behavioral indicators, and psychometric markers including PVT failures — meeting structured thresholds for “definite”, “probable”, or “possible” MND. The framework was the field’s first formal proposal to anchor effort-based diagnoses in psychometric data rather than clinical impression alone.

The American Academy of Clinical Neuropsychology consensus conference statement (Heilbronner et al., 2009) extended the MND framework into a practice standard: every neuropsychological evaluation should include validity testing, and a single PVT failure is generally insufficient evidence for a noncredible performance determination. Two or more PVTs from independent measurement domains were recommended as the minimum, on the reasoning that any single PVT can fail spuriously due to genuine cognitive impairment, confusion, or random error, and that convergent failures across instruments are more diagnostic than a single-test signal.

Sherman, Slick, and Iverson (2020) updated the MND criteria into MND-2, sharpening several thresholds and explicitly addressing the role of embedded validity indicators (PVT signals derived from standard cognitive subtests rather than from dedicated effort instruments). The Webber-Critchfield-Soble 2020 paper sits in this lineage: it asks which nonmemory-based PVTs and embedded indicators contribute incremental validity to a battery that is otherwise dominated by memory-based instruments.

The four nonmemory-based PVTs evaluated

Dot Counting Test (DCT). Examinees count groups of dots — easy ungrouped arrays interspersed with structured grouped arrays — under timed conditions. Genuinely impaired examinees show a predictable pattern: longer counting time on grouped arrays than ungrouped, because the structured arrays facilitate efficient counting strategies. Feigning examinees often miss this pattern, producing flatter or inverted timing profiles. The DCT is brief, requires no specialized stimuli, and operates on a perceptual rather than memory substrate.

Reliable Digit Span (RDS). The longest forward and longest backward span an examinee passes on both trials of the WAIS-IV Digit Span subtest, summed. Below an empirically derived cut-score (most often ≤6), the result is flagged as suggestive of noncredible performance. RDS exploits the fact that digit span is highly resistant to most genuine impairments — even moderate dementia produces only modest reductions — so very low spans in an examinee whose other cognition seems intact suggest deliberate underperformance.

Reliable Digit Span Revised (RDS-R). A modification by Schroeder et al. that incorporates Sequencing trials from the WAIS-IV Digit Span subtest, computing an extended sum across forward, backward, and sequencing components. RDS-R was proposed to capture a broader span profile and improve sensitivity in subtle feigning cases.

WAIS-IV Digit Span ACSS. The age-corrected scaled score for the full Digit Span subtest, used as a stand-alone PVT cut-off rather than as a sub-scoring index. Webber and colleagues found that this simpler approach — using the full subtest’s standard score with cuts at ≤5 or ≤6 — produced classification accuracy equal to or better than RDS and RDS-R in their sample, raising the question of whether the more elaborate sub-scoring procedures are worth their complexity.

What the data showed

Webber and colleagues reported that the four nonmemory-based PVTs were significantly correlated with each other but uncorrelated with memory-based PVTs administered in the same battery. The correlation pattern is the practical case for including nonmemory-based instruments: they cluster as a coherent factor distinct from memory-based PVTs, so a battery that draws from both clusters has redundancy where it matters (within a cluster, where multiple measures of the same underlying construct provide convergent signal) and independence where it matters (between clusters, where the diagnostic value comes from convergence across measurement methods).

The headline classification finding: WAIS-IV Digit Span ACSS at cut-scores of ≤5 or ≤6 produced equal or better accuracy than RDS or RDS-R for distinguishing valid-unimpaired examinees from noncredible performers. The simpler instrument matched or beat the more elaborate sub-scoring procedures. Combining DCT with any of the digit-span variants improved classification accuracy in valid-unimpaired examinees but was less effective for valid-impaired examinees — patients with genuine cognitive deficits who passed PVTs were classified more reliably with a single instrument than with combinations, presumably because the combined cuts produce more false-positive flags in genuinely impaired performers.

The pairing the authors highlighted as practically optimal: DCT plus WAIS-IV Digit Span ACSS. The combination uses two independent measurement methods (perceptual counting plus auditory-verbal short-term memory), pulls from the broadly validated nonmemory-based cluster, and uses the simplest digit-span scoring approach available. For a clinician building a multi-instrument validity battery, this is the recommendation that fell out of the data.

Where this fits in the practical PVT workflow

The clinically actionable summary is that a contemporary neuropsychological battery should include at least two PVTs, drawn from at least two distinct measurement domains, with at least one of them nonmemory-based. The reason is structural rather than statistical: an examinee who feigns memory impairment may pass a perceptual-counting PVT, and an examinee who feigns generalized cognitive impairment may pass an instrument that targets memory specifically. Convergent failures across uncorrelated domains are diagnostic; a single failure on a single instrument is suggestive but not sufficient.

For batteries that already include the WAIS-IV — common in contemporary clinical practice — the embedded ACSS PVT is essentially free: the subtest is being administered anyway, and the validity check is a matter of comparing the age-corrected scaled score against an established cut. Adding the DCT is a five-minute procedure that contributes signal from a different measurement substrate. The combination is a reasonable default for clinicians who want validity coverage without lengthening the protocol substantially.

The framework remains less effective for valid-impaired examinees — patients with genuine cognitive impairment who are nonetheless giving credible effort. False-positive rates rise in this group across all PVTs, and the combinatorial approach does not solve the problem. The clinical reading in such cases requires informed clinical judgment rather than mechanical application of cut-scores: a patient with documented severe TBI may legitimately produce digit spans below typical PVT thresholds without feigning. The PVT signal is one input among several, not a verdict.

Where this fits in the broader validity literature

Performance validity testing has grown from a niche forensic concern into a standard component of every major neuropsychological evaluation, and the methodological literature has matured along with it. The MND criteria (Slick et al., 1999; Sherman et al., 2020) provide the diagnostic framework; the AACN consensus statement (Heilbronner et al., 2009) provides the practice standard; the validation literature on individual PVTs — including Webber and colleagues’ analysis of the nonmemory-based instruments — provides the empirical basis for choosing which tests to include.

The unresolved methodological challenges are substantial: how to handle valid-impaired examinees with elevated false-positive rates, how to set culturally and linguistically appropriate cut-scores, how to integrate symptom validity tests (which target self-reported symptoms rather than performance) into the same diagnostic framework, and how to weigh PVT failures against external evidence of effort or motivation. The Webber 2020 contribution sits within this larger project, supplying one piece of the structural-validity puzzle: which nonmemory-based PVTs cluster together, which are redundant with each other, and which combinations buy the most diagnostic information per minute of administration.

Frequently Asked Questions

Why do neuropsychologists use multiple PVTs rather than just one?

Single PVTs have non-trivial false-positive rates, particularly in patients with genuine cognitive impairment. The AACN consensus practice standard (Heilbronner et al., 2009) requires at least two PVTs from independent measurement domains, on the principle that convergent failures across uncorrelated tests carry much stronger diagnostic weight than a single-test signal.

What is the Reliable Digit Span (RDS) and how is it different from RDS-R?

RDS is the sum of the longest forward and longest backward digit spans an examinee passes on both trials, with a cut-score (typically ≤6) flagging suspect performance. RDS-R extends this by adding the WAIS-IV Sequencing trials to the sum, on the rationale that the additional component improves sensitivity. Webber and colleagues (2020) found that the simpler ACSS-based cut performed at least as well as either RDS variant.

Why is digit span a useful PVT?

Digit span is one of the most robust cognitive measures: even moderate dementia produces only modest reductions, and most genuine impairments preserve the ability to repeat short sequences. Very low spans in an examinee whose other cognition seems intact suggest deliberate underperformance rather than legitimate impairment.

How does the Dot Counting Test work?

Examinees count groups of dots arranged either ungrouped (random scatter) or grouped (structured arrays). Genuinely impaired examinees show longer counting times on ungrouped arrays than on grouped ones, because grouping facilitates efficient counting. Feigning examinees often miss this asymmetry, producing flatter or inverted timing profiles that the DCT scoring system flags.

What does it mean if a patient fails one PVT but passes another?

It is suggestive but not diagnostic. The MND-2 criteria (Sherman et al., 2020) and AACN practice standard require multiple lines of evidence — including PVT failures, external incentive context, and behavioral indicators — to meet thresholds for “probable” or “definite” malingered neurocognitive dysfunction. A single PVT failure is one input among several, not a verdict on its own.

References

Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S. R. (2009). American Academy of Clinical Neuropsychology Consensus Conference Statement on the neuropsychological assessment of effort, response bias, and malingering. The Clinical Neuropsychologist, 23(7), 1093–1129. https://doi.org/10.1080/13854040903155063
Archives of Clinical Neuropsychology, 35(6), 735–764. https://doi.org/10.1093/arclin/acaa019
Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13(4), 545–561. https://doi.org/10.1076/clin.13.4.545.1885
Webber, T. A., Critchfield, E. A., & Soble, J. R. (2020). Convergent, discriminant, and concurrent validity of nonmemory-based performance validity tests. Assessment, 27(7), 1399–1415. https://doi.org/10.1177/1073191118804874

Xavier Jouve, Ph.D.PsychometricianPhD

Xavier Jouve, Ph.D., is a psychometrician and quantitative psychologist specializing in cognitive ability measurement, item response theory, and test development. He is Head of Research at Cogn-IQ, where he has designed and validated seven cognitive assessment instruments — including the JCTI (inductive reasoning), JCCES (crystallized intelligence), IAW (vocabulary), JCFS (figurative sequences), JCWS (verbal reasoning), GIE (general knowledge), and WN (logical inference) — collectively normed on over 13,000 examinees. His work applies 2PL IRT modeling, computerized adaptive testing, and advanced composite scoring methods (including the modified Tellegen & Briggs Formula 4 with cubic correction) to produce research-grade cognitive measures available online. ORCID: 0009-0006-1283-045X

ORCID

Related Research

Educational Psychology and Interventions

Mental Math Training and Cognitive Performance

The relationship between mental arithmetic skill and broader cognitive performance is more interesting than the conventional "math is hard" framing suggests. People who can rapidly…

Jan 4, 2013

Cognitive Abilities and Intelligence

Gender, Education, and Cognitive Outcomes

A 2010 study by Jouve, drawing on 251 examinees of the Jouve Cerebrals Test of Induction (JCTI), found that males scored higher than females on…

Jan 27, 2010

Statistical Methods and Data Analysis

Validity and Reliability of the CCAT

The Cerebrals Cognitive Abilities Test (CCAT) is the original three-subtest crystallized-intelligence battery developed in the Cogn-IQ research program. It contains Verbal Analogies (VA), Mathematical Problems…

Jan 8, 2010

Psychological Measurement and Testing

How to Interpret IQ Test Results

You've received an IQ test report — for yourself, your child, or a client — and what should be a clean answer is a thicket…

Mar 15, 2026

Psychological Measurement and Testing

Autism and IQ: Intelligence in ASD

What does research say about IQ in autism? Learn about cognitive profiles in ASD, why traditional IQ tests may be misleading, and how intelligence varies across the autism spectrum.

Feb 19, 2026

Nonmemory-Based Performance Validity Tests

The framework: MND criteria and the AACN consensus

The four nonmemory-based PVTs evaluated

What the data showed

Where this fits in the practical PVT workflow

Where this fits in the broader validity literature

Frequently Asked Questions

Why do neuropsychologists use multiple PVTs rather than just one?

What is the Reliable Digit Span (RDS) and how is it different from RDS-R?

Why is digit span a useful PVT?

How does the Dot Counting Test work?

What does it mean if a patient fails one PVT but passes another?

References

Related Research

Mental Math Training and Cognitive Performance

Gender, Education, and Cognitive Outcomes

Validity and Reliability of the CCAT

How to Interpret IQ Test Results

Autism and IQ: Intelligence in ASD

People Also Ask

Leave a Reply Cancel reply

The framework: MND criteria and the AACN consensus

The four nonmemory-based PVTs evaluated

What the data showed

Where this fits in the practical PVT workflow

Where this fits in the broader validity literature

Frequently Asked Questions

Why do neuropsychologists use multiple PVTs rather than just one?

What is the Reliable Digit Span (RDS) and how is it different from RDS-R?

Why is digit span a useful PVT?

How does the Dot Counting Test work?

What does it mean if a patient fails one PVT but passes another?

References

Related Research

People Also Ask

You may also like...

Popular Posts

Leave a Reply Cancel reply