Strata Academy
Risk of Bias in Systematic Reviews: Study-Level vs Review-Level
ROB 2 for included trials, AMSTAR 2 for review quality, ROBIS for review conduct bias, and how the layers connect
Quick answer
Risk of bias operates at two levels in systematic reviews. Study-level tools (ROB 2 for RCTs, ROBINS-I for non-randomised studies, QUADAS-2 for diagnostics) assess included primary studies. Review-level tools assess the review itself: AMSTAR 2 for methodological quality and ROBIS for bias in review conduct. Do not conflate a well-conducted review with unbiased primary evidence.
1. Two levels of risk of bias
When reading or conducting a systematic review, distinguish bias in the review process from bias in the primary studies it includes. A meticulously searched and screened review can still pool trials with high risk of bias — the diamond reflects flawed primary data.
Study-level risk of bias asks: within each included trial, were design and conduct flaws likely to distort the effect estimate? Review-level risk of bias asks: did the reviewers introduce bias through incomplete searching, selective inclusion, or inappropriate synthesis?
Students often apply ROB 2 to the systematic review paper itself — this is incorrect. ROB 2 domains (randomisation, deviations, missing data, measurement, selective reporting) apply to randomised trials included in the review.
Reporting quality (PRISMA) and methodological quality (AMSTAR 2) are related but distinct. A review can report well yet search poorly, or search comprehensively yet pool clinically heterogeneous studies inappropriately.
2. ROB 2 for included randomised trials
ROB 2 (Risk of Bias 2) is Cochrane's recommended tool for randomised trials with parallel, cluster, crossover, or split-body designs. It covers five domains: randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result.
Each domain receives an algorithm-guided judgement: low risk, some concerns, or high risk. An overall judgement is derived from domain ratings — not an arithmetic average.
In your review, complete ROB 2 for every included RCT at the outcome level where domains differ by outcome (e.g. subjective vs objective endpoints). Present results in a traffic-light plot or summary table linked to GRADE assessments.
Dual independent assessment with a resolver is expected for dissertation-level work. Document training — ROB 2 requires understanding signalling questions, not checkbox completion.
- Domain 1 — sequence generation and allocation concealment
- Domain 2 — deviations from intended interventions (ITT matters)
- Domain 3 — missing outcome data and attrition patterns
- Domain 4 — blinding of outcome assessors for subjective outcomes
- Domain 5 — compare trial registry/protocol to published outcomes
3. AMSTAR 2 for review methodological quality
AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews) evaluates the conduct of the systematic review itself across 16 items. Critical flaws include: no comprehensive search, no duplicate screening, no list of excluded studies with reasons, and no risk-of-bias assessment of included studies.
AMSTAR 2 produces an overall confidence rating: high, moderate, low, or critically low. Multiple critical flaws downgrade to critically low regardless of strengths elsewhere.
Use AMSTAR 2 when appraising someone else's review for journal club, coursework, or guideline development. When conducting your own review, pre-specify AMSTAR 2-aligned methods in PROSPERO so you avoid critical flaws by design.
AMSTAR 2 does not replace ROBIS for bias in review conclusions — it focuses on whether methods were adequate, not whether the review team's judgements were biased.
4. ROBIS for bias in review conduct
ROBIS (Risk Of Bias In Systematic reviews) assesses whether the review process introduced bias affecting the interpretation of findings. It covers four domains: study eligibility criteria, identification and selection of studies, data collection and appraisal, and synthesis and findings.
Unlike AMSTAR 2's checklist approach, ROBIS asks signalling questions leading to low, high, or unclear risk of bias judgements per domain, then an overall judgement about whether the review conclusions are trustworthy.
ROBIS is particularly useful when deciding whether to trust a review's pooled estimate for clinical or policy decisions. A review can score well on AMSTAR 2 items yet show high ROBIS if post-hoc subgroup analyses drove conclusions not supported by the protocol.
For umbrella reviews (reviews of reviews), ROBIS applies to each included systematic review. AMSTAR 2 may also be used per included review.
- Domain 1 — were eligibility criteria appropriate and pre-specified?
- Domain 2 — was study identification comprehensive and selection unbiased?
- Domain 3 — were data extraction and RoB assessment adequate?
- Domain 4 — were synthesis methods appropriate and conclusions matched evidence?
5. Integrating RoB into synthesis and GRADE
Study-level RoB judgements feed directly into GRADE certainty assessments. If most contributing RCTs are at high risk of bias in domains that matter for the outcome, GRADE typically downgrades certainty by one level for risk of bias.
In meta-analysis, consider subgroup or sensitivity analyses restricted to low-risk studies. If the pooled estimate changes materially, report this — it signals that overall certainty should remain low despite a narrow confidence interval.
Narrative synthesis without meta-analysis still requires RoB tables. Describe how bias patterns across studies influenced the direction and strength of conclusions.
Review-level quality (AMSTAR 2, ROBIS) informs whether you should trust the review's synthesis at all. Study-level RoB informs how much to trust the underlying evidence — both are needed.
6. Practical workflow for student reviews
At protocol stage: pre-specify study-level RoB tool matched to included designs (ROB 2 for RCTs, ROBINS-I for non-randomised interventions, QUADAS-2 for diagnostic accuracy). Plan dual assessment and how RoB links to GRADE.
During conduct: complete RoB during or immediately after data extraction. Resolve disagreements before meta-analysis. Do not selectively exclude high-risk studies post hoc unless pre-specified as sensitivity analysis.
When appraising a published review: apply AMSTAR 2 for methods, ROBIS for trust in conclusions, and inspect whether authors reported study-level RoB. Check PRISMA flow for unexplained attrition.
In write-up: present study-level RoB summary figures, state review limitations transparently, and align GRADE statements with observed bias patterns. Distinguish 'we conducted a rigorous review' from 'the evidence is trustworthy'.
Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.