Strata Academy

Risk of Bias in Systematic Reviews: Study-Level vs Review-Level

ROB 2 for included trials, AMSTAR 2 for review quality, ROBIS for review conduct bias, and how the layers connect

Quick answer

Risk of bias operates at two levels in systematic reviews. Study-level tools (ROB 2 for RCTs, ROBINS-I for non-randomised studies, QUADAS-2 for diagnostics) assess included primary studies. Review-level tools assess the review itself: AMSTAR 2 for methodological quality and ROBIS for bias in review conduct. Do not conflate a well-conducted review with unbiased primary evidence.

ROB 2 applies to included RCTs — not to the systematic review as a whole.
AMSTAR 2 appraises whether the review methods were sound.
ROBIS assesses bias in how the review was conducted and reported.
GRADE integrates study-level RoB into certainty of evidence statements.

1. Two levels of risk of bias

When reading or conducting a systematic review, distinguish bias in the review process from bias in the primary studies it includes. A meticulously searched and screened review can still pool trials with high risk of bias — the diamond reflects flawed primary data.

Study-level risk of bias asks: within each included trial, were design and conduct flaws likely to distort the effect estimate? Review-level risk of bias asks: did the reviewers introduce bias through incomplete searching, selective inclusion, or inappropriate synthesis?

Students often apply ROB 2 to the systematic review paper itself — this is incorrect. ROB 2 domains (randomisation, deviations, missing data, measurement, selective reporting) apply to randomised trials included in the review.

Reporting quality (PRISMA) and methodological quality (AMSTAR 2) are related but distinct. A review can report well yet search poorly, or search comprehensively yet pool clinically heterogeneous studies inappropriately.

Note: Never write 'this systematic review has low risk of bias in Domain 2' using ROB 2. Use AMSTAR 2 or ROBIS for the review; use ROB 2 (or design-appropriate tools) for included studies.

2. ROB 2 for included randomised trials

ROB 2 (Risk of Bias 2) is Cochrane's recommended tool for randomised trials with parallel, cluster, crossover, or split-body designs. It covers five domains: randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result.

Each domain receives an algorithm-guided judgement: low risk, some concerns, or high risk. An overall judgement is derived from domain ratings — not an arithmetic average.

In your review, complete ROB 2 for every included RCT at the outcome level where domains differ by outcome (e.g. subjective vs objective endpoints). Present results in a traffic-light plot or summary table linked to GRADE assessments.

Dual independent assessment with a resolver is expected for dissertation-level work. Document training — ROB 2 requires understanding signalling questions, not checkbox completion.

Domain 1 — sequence generation and allocation concealment
Domain 2 — deviations from intended interventions (ITT matters)
Domain 3 — missing outcome data and attrition patterns
Domain 4 — blinding of outcome assessors for subjective outcomes
Domain 5 — compare trial registry/protocol to published outcomes

Tip: Extract registry IDs during data extraction so Domain 5 selective reporting assessment is feasible for every trial.

3. AMSTAR 2 for review methodological quality

AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews) evaluates the conduct of the systematic review itself across 16 items. Critical flaws include: no comprehensive search, no duplicate screening, no list of excluded studies with reasons, and no risk-of-bias assessment of included studies.

AMSTAR 2 produces an overall confidence rating: high, moderate, low, or critically low. Multiple critical flaws downgrade to critically low regardless of strengths elsewhere.

Use AMSTAR 2 when appraising someone else's review for journal club, coursework, or guideline development. When conducting your own review, pre-specify AMSTAR 2-aligned methods in PROSPERO so you avoid critical flaws by design.

AMSTAR 2 does not replace ROBIS for bias in review conclusions — it focuses on whether methods were adequate, not whether the review team's judgements were biased.

AMSTAR 2 focus	Example critical flaw	Prevention in your review
Comprehensive search	PubMed-only search	MEDLINE + Embase + CENTRAL + registries
Duplicate selection	Single reviewer screening	Dual independent screening + resolver
RoB of included studies	No RoB tool used	Pre-specify ROB 2 / ROBINS-I / QUADAS-2
Excluded studies listed	No exclusion list	PRISMA flow with reasons
Meta-analysis methods	Inappropriate pooling	Pre-specify model and heterogeneity plan

4. ROBIS for bias in review conduct

ROBIS (Risk Of Bias In Systematic reviews) assesses whether the review process introduced bias affecting the interpretation of findings. It covers four domains: study eligibility criteria, identification and selection of studies, data collection and appraisal, and synthesis and findings.

Unlike AMSTAR 2's checklist approach, ROBIS asks signalling questions leading to low, high, or unclear risk of bias judgements per domain, then an overall judgement about whether the review conclusions are trustworthy.

ROBIS is particularly useful when deciding whether to trust a review's pooled estimate for clinical or policy decisions. A review can score well on AMSTAR 2 items yet show high ROBIS if post-hoc subgroup analyses drove conclusions not supported by the protocol.

For umbrella reviews (reviews of reviews), ROBIS applies to each included systematic review. AMSTAR 2 may also be used per included review.

Domain 1 — were eligibility criteria appropriate and pre-specified?
Domain 2 — was study identification comprehensive and selection unbiased?
Domain 3 — were data extraction and RoB assessment adequate?
Domain 4 — were synthesis methods appropriate and conclusions matched evidence?

5. Integrating RoB into synthesis and GRADE

Study-level RoB judgements feed directly into GRADE certainty assessments. If most contributing RCTs are at high risk of bias in domains that matter for the outcome, GRADE typically downgrades certainty by one level for risk of bias.

In meta-analysis, consider subgroup or sensitivity analyses restricted to low-risk studies. If the pooled estimate changes materially, report this — it signals that overall certainty should remain low despite a narrow confidence interval.

Narrative synthesis without meta-analysis still requires RoB tables. Describe how bias patterns across studies influenced the direction and strength of conclusions.

Review-level quality (AMSTAR 2, ROBIS) informs whether you should trust the review's synthesis at all. Study-level RoB informs how much to trust the underlying evidence — both are needed.

6. Practical workflow for student reviews

At protocol stage: pre-specify study-level RoB tool matched to included designs (ROB 2 for RCTs, ROBINS-I for non-randomised interventions, QUADAS-2 for diagnostic accuracy). Plan dual assessment and how RoB links to GRADE.

During conduct: complete RoB during or immediately after data extraction. Resolve disagreements before meta-analysis. Do not selectively exclude high-risk studies post hoc unless pre-specified as sensitivity analysis.

When appraising a published review: apply AMSTAR 2 for methods, ROBIS for trust in conclusions, and inspect whether authors reported study-level RoB. Check PRISMA flow for unexplained attrition.

In write-up: present study-level RoB summary figures, state review limitations transparently, and align GRADE statements with observed bias patterns. Distinguish 'we conducted a rigorous review' from 'the evidence is trustworthy'.

Frequently asked questions

Should I use AMSTAR 2 or ROBIS to appraise a systematic review?

Both serve different purposes and are often used together. AMSTAR 2 identifies whether review methods met quality standards (checklist). ROBIS judges whether the review process likely biased the conclusions (domains with overall judgement). For journal club, completing AMSTAR 2 first is usually faster; add ROBIS when deciding whether to trust the pooled estimate.

Can I use ROB 2 for cohort studies included in my review?

No. ROB 2 is for randomised trials only. Non-randomised intervention studies require ROBINS-I. Prognostic cohorts may use ROBINS-I or quality checklists like Newcastle–Ottawa Scale depending on context. Diagnostic studies use QUADAS-2.

Does a high AMSTAR 2 score mean the treatment works?

No. AMSTAR 2 assesses review conduct, not treatment effect. A critically well-conducted review of biased trials may correctly report low-certainty evidence of benefit. Always read study-level RoB and GRADE alongside AMSTAR 2.

How many reviewers should assess risk of bias?

Cochrane recommends two independent reviewers with a third resolver for disagreements. Single assessment should be reported as a limitation. Pilot ROB 2 on 2–3 studies to calibrate signalling question interpretation before scaling.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.