Strata Academy
Diagnostic test accuracy – sensitivity, specificity, and likelihood ratios
Core metrics, pre-test probability, and pairing with QUADAS-2 appraisal
Quick answer
Sensitivity and specificity describe test performance; likelihood ratios update pre-test probability to post-test probability. Always pair statistical interpretation with QUADAS-2 bias assessment and STARD reporting.
1. Core metrics
Sensitivity is the proportion of people with the condition who test positive on the index test. Specificity is the proportion without the condition who test negative.
These properties are intrinsic to the test at a given threshold, but their clinical usefulness depends on pre-test probability – how common the condition is in your population.
Likelihood ratios combine sensitivity and specificity to update probability: LR+ = sensitivity / (1 − specificity); LR− = (1 − sensitivity) / specificity.
A LR+ of 10 roughly multiplies odds of disease by 10; a LR− of 0.1 divides odds by 10. Memorise the anchors: LR+ > 10 and LR− < 0.1 produce large shifts.
- Predictive values (PPV/NPV) depend on prevalence – do not generalise across settings
- ROC curves plot sensitivity vs 1 − specificity across thresholds
- Pre-specify the threshold that defines a 'positive' test before results are known
2. Pre-test and post-test probability
Start with an estimate of pre-test probability from prevalence, clinical context, or referral pathway. Apply the likelihood ratio to move to post-test probability using odds form or a Fagan nomogram.
In low-prevalence screening, even a highly specific test can yield poor positive predictive value – many false positives among the well.
In high-prevalence specialist clinics, sensitivity dominates – missed cases are costly.
Students should practise one worked example per coursework module; examiners often test LR interpretation without providing PPV/NPV.
3. Design features to appraise
Patients should represent the intended clinical spectrum – not only severe cases from tertiary centres if the test will be used in primary care.
All enrolled patients should receive the reference standard, or differential verification must be explained. Partial verification inflates accuracy estimates.
Index test and reference standard should be interpreted blindly where feasible. Knowledge of one result can bias the other.
Case–control diagnostic designs often inflate sensitivity and specificity because cases and controls are selected for known disease status.
4. ROC curves and AUC
ROC curves show trade-offs between sensitivity and specificity across cut-offs. The area under the curve (AUC) summarises discrimination but hides the chosen threshold.
Ask which operating point was used clinically and whether it was pre-specified. A high AUC with a suboptimal threshold may still misclassify patients.
For continuous biomarkers, report how the cut-off was derived – data-driven optimisation without external validation overfits.
Compare AUC only across studies with similar patient spectrums and reference standards.
5. QUADAS-2 and STARD
Use QUADAS-2 domains (patient selection, index test, reference standard, flow and timing) for risk of bias and applicability – rated separately.
STARD checklist supports complete reporting: participant flow, index test details, reference standard, analysis population, and indeterminate results.
A well-reported STARD paper can still have high QUADAS-2 bias if verification was partial. Reporting and bias are different lenses.
For AI imaging classifiers, pair QUADAS-2 with CLAIM reporting expectations.
6. Common student mistakes
Quoting sensitivity without prevalence context.
Using PPV from a tertiary cohort to counsel primary-care patients.
Treating AUC as sufficient without threshold analysis.
Applying ROB 2 to diagnostic accuracy studies.
Ignoring indeterminate test results in the analysis population.
Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.