Strata Academy

QUADAS-2 explained – appraising diagnostic accuracy studies

Four domains, patient flow, applicability concerns, and pairing with STARD reporting

Quick answer

QUADAS-2 appraises risk of bias and applicability in diagnostic accuracy studies across four domains: patient selection, index test, reference standard, and flow/timing. Pair with STARD reporting and likelihood-ratio interpretation.

1. What is QUADAS-2?

QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) evaluates risk of bias and concerns about applicability in studies comparing an index test against a reference standard.

Unlike ROB 2, the focus is not randomisation but whether the test comparison was fair, complete, and relevant to your clinical question.

QUADAS-2 replaced the original QUADAS tool with clearer signalling questions and separate applicability ratings alongside bias ratings for each domain.

Diagnostic reviews in Cochrane and major journals expect QUADAS-2 or equivalent per included study, presented in tables or traffic-light plots.

2. When to use QUADAS-2

Use QUADAS-2 when a study estimates sensitivity, specificity, predictive values, likelihood ratios, or AUC for a diagnostic test versus a reference standard.

This includes imaging, laboratory biomarkers, point-of-care tests, clinical decision rules, and AI-based classifiers when evaluated against a clinical reference.

Do not use QUADAS-2 for intervention trials, prognostic models without a diagnostic contrast, or screening studies where the index test is not compared to a defined gold standard in the same patients.

For screening programme evaluations with multiple linked tests, applicability concerns often dominate – document the care pathway explicitly.

3. Four QUADAS-2 domains

Each domain is rated for risk of bias (low / high / unclear) and separately for applicability concerns (low / high / unclear). Both dimensions matter for practice decisions.

Patient selection asks whether the enrolled spectrum matches the intended use population – primary care vs tertiary centre changes sensitivity and specificity materially.

Index test domain examines whether results could have been influenced by knowledge of the reference standard or by clinical data available when the test was interpreted.

Reference standard domain asks whether the comparator is credible and applied consistently, regardless of index test results, without incorporation bias.

4. Patient flow and verification bias

Partial verification – only patients with positive index tests receive reference standard – is a classic source of bias that inflates sensitivity and specificity estimates.

Differential verification – different reference standards for different patients – threatens validity and should usually be rated high risk of bias in flow and timing.

Draw or locate a patient flow diagram. Count how many entered, how many had both tests, and how many were excluded without verification.

STARD reporting items align closely with QUADAS-2 flow concerns – use both tools together.

5. Applicability vs risk of bias

A study can be low risk of bias internally yet high applicability concern if the population or test setting differs from your practice.

Example: a PET scan study in tertiary oncology may not apply to your district general hospital pathway for staging.

Applicability concerns should be reported explicitly in appraisal summaries – not folded into a single 'quality' score.

When synthesising diagnostic reviews, consider whether pooled estimates combine studies with incompatible patient spectrums.

6. Linking to diagnostic statistics

After QUADAS-2, interpret sensitivity and specificity with confidence intervals, pre-test probability, and likelihood ratios.

Small studies produce unstable estimates – wide CIs around sensitivity in rare diseases are common and should temper conclusions.

AUC alone obscures threshold choice. Ask which cut-off was used and whether it was pre-specified.

See our diagnostic accuracy statistics guide for post-test probability worked examples.

7. Common mistakes

Applying ROB 2 to diagnostic studies because the paper mentions 'random sampling'.

Ignoring spectrum bias when only severely ill patients were enrolled.

Treating AUC alone without threshold analysis or clinical consequences.

Rating applicability low risk because the authors are from a prestigious centre.

Pooling sensitivities from studies with different reference standards without comment.

8. StrataResearch and QUADAS-2

Diagnostic accuracy manuscripts are routed to QUADAS-2-aligned appraisal with domain-structured feedback.

Upload via quick analysis to compare automated domain coverage against your manual QUADAS-2 worksheet.

Pair with STARD reporting checks when preparing diagnostic study manuscripts for coursework.

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.