Strata Academy

Diagnostic Accuracy Appraisal: QUADAS-2 & STARD Workflow

From clinical question to likelihood ratios: patient flow, verification bias, applicability concerns, and STARD reporting checklist

Quick answer

Appraising diagnostic accuracy studies requires QUADAS-2 for risk of bias and applicability across four domains (patient selection, index test, reference standard, flow/timing), paired with STARD for reporting completeness. Draw the patient flow first, then assess whether the index test was compared fairly to an adequate reference standard in an appropriate patient spectrum.

1. Define the diagnostic question

Before opening the PDF, specify the index test (the test under evaluation), the reference standard (gold standard comparator), the target condition, and the intended use setting. A troponin study in ED chest pain differs materially from a research-laboratory assay in asymptomatic screening.

Diagnostic accuracy studies estimate sensitivity, specificity, predictive values, likelihood ratios, or AUC. They are not intervention trials — ROB 2 does not apply. QUADAS-2 is the Cochrane-recommended risk-of-bias tool; STARD 2015 is the reporting guideline.

Clarify whether you need to know if the test works (accuracy) or whether it changes management (impact). This guide focuses on accuracy appraisal; test-treatment pathways require different evidence frameworks.

2. STARD reporting scan (first pass)

STARD provides 30 items covering title, abstract, methods, results, and discussion for diagnostic accuracy studies. Use it as a structured reading guide before deep appraisal — missing items often flag bias domains.

Priority STARD items for students: patient recruitment and sampling (Item 5), index test and reference standard descriptions (Items 10–13), patient flow with numbers at each stage (Item 17), and cross-tabulation of results (Item 18).

STARD completeness does not guarantee low risk of bias. A well-reported case–control diagnostic design may still be high applicability concern. Reporting and bias are complementary assessments.

  1. Read abstract — do sensitivity/specificity match the clinical question?
  2. Locate patient flow diagram — count enrolled, tested, verified, analysed
  3. Identify reference standard — same for all patients?
  4. Check index test blinding — could knowledge of reference standard influence interpretation?
  5. Find 2×2 table — raw true/false positive and negative counts

3. QUADAS-2 domain appraisal

QUADAS-2 evaluates four domains, each rated for risk of bias (low / high / unclear) and separately for applicability concerns (low / high / unclear). Work through signalling questions in order — do not assign domain scores without completing the algorithm.

Patient selection: was the patient spectrum representative of the intended use population? Case–control designs that assemble known cases and healthy controls often inflate accuracy and warrant high applicability concern.

Index test: was interpretation blind to reference standard results? Were thresholds pre-specified? Data-driven cut-offs optimised on the same dataset inflate apparent performance.

Reference standard: is it the best available comparator? Was it applied regardless of index test result? Differential verification — different standards for different patients — threatens validity.

Flow and timing: adequate interval between tests? Did all enrolled patients receive both tests? Partial verification (only positives get reference standard) is classic verification bias.

4. Patient flow and verification bias

Draw or reconstruct the flow diagram before interpreting sensitivity and specificity. Ask: of all patients who had the index test, how many had reference standard verification?

Partial verification bias occurs when only patients with positive index tests proceed to reference standard. This inflates both sensitivity and specificity because false negatives and false positives are undercounted.

Differential verification applies different reference standards to different subgroups — for example, CT for high-risk patients and clinical follow-up for low-risk patients. Rate high risk of bias in flow and timing unless adequately corrected.

Incorporation bias occurs when the index test forms part of the reference standard definition — common when comparing imaging modalities against composite clinical diagnosis that already incorporated the index result.

5. Interpreting results clinically

Sensitivity and specificity alone are insufficient for bedside decisions. Convert results to likelihood ratios (LR+ and LR−) and apply them to pre-test probability using Fagan nomogram logic or Bayesian framing.

Predictive values depend on prevalence — high specificity in a low-prevalence population still yields many false positives if the test is applied indiscriminately. Always ask: what was the prevalence in this study, and does it match my patients?

Confidence intervals around sensitivity and specificity widen rapidly in small studies. A reported sensitivity of 95% (95% CI 85–99%) with n=50 diseased patients carries more uncertainty than the point estimate suggests.

BMJ Statistics Notes and CEBM resources provide accessible introductions to likelihood ratios and diagnostic reasoning — pair statistical interpretation with QUADAS-2 judgements.

6. Diagnostic systematic reviews

When synthesising multiple diagnostic accuracy studies, complete QUADAS-2 per included study and present summary tables. Cochrane recommends bivariate or hierarchical models for pooled sensitivity and specificity — not simple pooling of proportions without accounting for correlation.

Investigate heterogeneity in patient spectrum and index test version. A pooled estimate mixing primary care and tertiary referral populations may be clinically meaningless even if statistically estimable.

GRADE for diagnostic test accuracy follows a structured approach considering study limitations (QUADAS-2), inconsistency, indirectness, imprecision, and publication bias. Report certainty alongside pooled estimates.

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.