Why Accuracy Matters: The Hidden Risks of Flawed Language and Literacy Tests

When it comes to diagnosing students with developmental language disorders (DLD) and literacy deficits, the choice of assessment tools is critical. Many commonly used language tests have serious psychometric flaws that undermine their validity, leading to misidentifications that can have lasting educational and social consequences. This post examines the psychometric properties of several widely used comprehensive language tests and highlights why clinicians and educators should exercise caution when selecting assessments.

Understanding Sensitivity and Specificity

Sensitivity and specificity are fundamental indicators of an assessment’s accuracy. Sensitivity refers to a test’s ability to correctly identify students who truly have a language or literacy disorder. Specificity refers to a test’s ability to correctly identify students who do not have a disorder. According to Vance and Plante (1994), a sensitivity of 90% or higher is considered good, 80%-89% is fair, and anything below 80% leads to unacceptably high rates of misidentification.

A test with poor sensitivity can fail to identify students who genuinely require support, while a test with poor specificity may lead to the unnecessary labeling of typically developing students as having a disorder. Either scenario can negatively impact a child’s education and development.

Concerns with Common Language Assessments

Several commonly used language assessments fail to meet acceptable sensitivity thresholds, raising concerns about their suitability for diagnosing DLD and literacy deficits.

Clinical Evaluation of Language Fundamentals-Fifth Edition (CELF-5): While its sensitivity is reported as 80% at -1.33 SD and 85% at -1.5 SD, the reference standard is flawed, calling these values into question.
Comprehensive Assessment of Spoken Language – Second Edition (CASL-2): Sensitivity is only 74% at -1 SD, making it unreliable for diagnostic purposes.
Oral and Written Language Scales Second Edition (OWLS-II): No sensitivity and specificity studies were conducted. Additionally, the standardization sample included individuals with diagnosed disabilities, contaminating the normative data.
Receptive, Expressive & Social Communication Assessment–Elementary (RESCA-E): The test authors explicitly caution that it cannot be used for diagnostic purposes due to a lack of sensitivity and specificity data.
Test of Language Development-Intermediate: 5 (TOLD-I:5) & Test of Language Development-Primary: 5 (TOLD-P:5): While composite scores appear adequate, the diagnostic categories are arbitrary and unreliable.

Contaminated Normative Samples and Weak Academic Rigor

Another significant issue with some of these tests is their contaminated normative samples. When individuals with diagnosed language and literacy disorders are included in the standardization sample, it skews the normative data, making it difficult to distinguish between typical and atypical performance. This reduces the test’s ability to accurately diagnose language impairments.

Moreover, many assessments lack academic rigor in their test items, meaning that their content may not align with the real-world language and literacy demands placed on students in educational settings. As a result, a child’s performance on these tests may not accurately reflect their actual language abilities or struggles in school.

Selecting High-Quality Standardized Assessments

Despite these concerns, some tests demonstrate acceptable psychometric properties. For example, the Test of Integrated Language and Literacy (TILLS) reports sensitivity values ranging from 81% to 97%, depending on age and cut scores. This indicates a much higher level of reliability for identifying students with genuine language and literacy disorders. However, using a single comprehensive standardized test as the sole basis for diagnosing language and literacy disorders is inadequate. A thorough evaluation requires multiple measures to capture the full scope of a student’s difficulties. While comprehensive tests provide a starting point, they often fail to assess in-depth critical areas such as narrative skills, pragmatic language, as well as grade level reading comprehension and written composition. To ensure accurate identification and appropriate intervention, clinicians should incorporate specialized standardized assessments alongside clinical-grade-level narrative reading comprehension and written composition evaluations. Professionals should obtain a more complete and accurate picture of a student’s language and literacy abilities using a multifaceted approach, reducing the risk of misdiagnosis and inadequate support.

The Takeaway for Parents and Professionals

Parents, educators, and clinicians should be aware of the limitations of commonly used language assessments. Choosing a test with weak psychometric properties can result in misdiagnoses, inappropriate interventions, and missed opportunities for the child to receive the support they need.

When selecting an assessment, it is crucial to:

Examine sensitivity and specificity values (ensure sensitivity is at least 80% and ideally above 90%)
Investigate the normative sample (avoid tests with contaminated samples)
Consider the academic validity of test items (ensure they reflect the real-world demands of language and literacy)
Supplement with additional measures of storytelling, pragmatics, reading, and writing (reflect the real-world demands of language and literacy)

By making informed decisions about assessment tools, professionals can provide more accurate diagnoses and support students in achieving their full potential.

References:

Dollaghan, C. (2007). The handbook for evidence-based practice in communication disorders. Baltimore, MD: Paul H.
Brooks Publishing Co.
LEADERS Project (2014). Test Review: Clinical Evaluation of Language Fundamentals—5th Edn (CELF-5).
Michigan Kent County ISD Speech and Language Eligibility Guidelines. Test Comparisons Chart (Appendix 2H)
Vance, R., & Plante, E. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and
Hearing Services in Schools, 25, 15-24.

Related Posts:

Share this with others

Understanding Sensitivity and Specificity

Concerns with Common Language Assessments

Contaminated Normative Samples and Weak Academic Rigor

Selecting High-Quality Standardized Assessments

The Takeaway for Parents and Professionals

Leave a Comment Cancel Reply