Strata Academy

AMSTAR 2 Explained: Appraise Systematic Review Quality

16 items, critical domains, weak/moderate/strong ratings, and how AMSTAR 2 differs from PRISMA reporting

Quick answer

AMSTAR 2 is a 16-item tool for methodological quality of systematic reviews. Seven items are critical — flaws there cap the overall confidence rating (critically low to high). Pair with PRISMA for reporting and ROBIS for review-process bias.

1. What is AMSTAR 2?

AMSTAR 2 (A MeaSurement Tool to Assess systematic Reviews 2) is a 16-item checklist for evaluating the methodological quality of systematic reviews – including reviews that do not include meta-analysis.

It was developed to address limitations of the original AMSTAR tool, with clearer response options (yes / partial yes / no) and explicit rules for overall confidence ratings.

AMSTAR 2 produces an overall confidence in the results of the review: critically low, low, moderate, or high. Seven items are designated critical – serious flaws in those items cap the overall rating regardless of strengths elsewhere.

Students often confuse AMSTAR 2 with PRISMA. PRISMA asks whether authors reported what they did transparently; AMSTAR 2 asks whether what they did was methodologically sound.

For UK medical students on intercalated research years or MSc evidence synthesis modules, AMSTAR 2 is the standard tool for appraising published systematic reviews in coursework. Examiners expect item-by-item scoring with evidence, not a vague 'good review' summary.

Original AMSTAR (2007) still appears in older papers. If a review cites AMSTAR 1, note the version mismatch when comparing to current teaching — items and critical domains differ from AMSTAR 2.

Industry-sponsored reviews: Item 16 (conflicts) interacts with Item 15 (publication bias). A methodologically strong review of predominantly industry trials may still warrant cautious interpretation — document both AMSTAR items.

2. Seven critical domains

The seven critical items drive the overall AMSTAR 2 judgement. If any critical item is rated 'No', the overall confidence cannot exceed low; multiple critical flaws yield critically low confidence.

When appraising a review, locate evidence for each critical item in the protocol registration, methods supplement, and excluded-studies appendices – not only the main text.

Partial yes on critical items still counts as a flaw under official guidance. Students should quote the exact AMSTAR 2 wording in coursework rather than paraphrasing loosely.

Open the supplementary search strategy before scoring Item 4. Medline-only searches without Embase or subject-specific databases often fail comprehensive search criteria in clinical topics.

Item 7 — list of excluded studies with reasons — is among the most commonly failed critical items. If authors say 'available on request' without providing the list, score No.

3. Full 16-item overview

Non-critical items still matter for overall confidence. Duplicate screening and data extraction, PICO components defined a priori, investigation of heterogeneity, and conflict-of-interest statements all appear in the full checklist.

Item 1 asks whether PICO components were established before the review. Item 3 examines whether study selection and data extraction were performed in duplicate – a common weakness in rapid reviews commissioned by NHS bodies.

Item 8 covers whether included studies were described in adequate detail. Item 10 asks whether authors investigated heterogeneity and explained subgroup claims with caution.

Item 16 requires declaration of conflicts of interest in both the review team and included primary studies. Undeclared industry funding in included trials may bias the body of evidence even if the review methods are sound.

Item 14 (funding of included studies) supports interpretation alongside Item 16. Reviews of predominantly industry-funded trials with positive pooled effects warrant explicit discussion even when AMSTAR conduct is moderate.

4. AMSTAR 2 vs PRISMA vs ROBIS

PRISMA 2020 assesses reporting transparency – flow diagram, search dates, eligibility criteria visible in the manuscript. A review can tick many PRISMA boxes while still using a single screener or inappropriate meta-analysis model.

AMSTAR 2 assesses methodological conduct – were decisions made in a way that limits bias in the review's conclusions? It produces a confidence rating for the review as a whole.

ROBIS assesses risk of bias in the systematic review process across phases (eligibility, identification, synthesis). Overlap with AMSTAR exists, but ROBIS frames judgements as low/high/unclear risk of bias for the review result.

For high-stakes appraisal – guideline work, dissertation systematic review chapters – use all three lenses. For journal club, AMSTAR 2 plus PRISMA flow reconciliation is often sufficient depth.

5. Interpreting overall confidence

Critically low confidence: one or more critical flaws without adequate correction. The review conclusions should not be relied upon without independent verification.

Low confidence: more than one non-critical weakness, or one critical weakness. Use findings cautiously and emphasise uncertainty in coursework discussion.

Moderate confidence: exactly one non-critical weakness and no critical flaws. Common in well-conducted reviews with minor limitations such as single data extraction.

High confidence: no critical flaws and at most one non-critical weakness. Rare in practice – do not assign high without item-by-item justification.

State the rating and which items failed. Examiners want specificity ('failed Item 7 – no excluded studies list') not vague praise.

6. Worked example – appraising a published review

Apply AMSTAR 2 item by item to a Cochrane review or high-impact systematic review in your specialty. Cochrane reviews often score well on search and ROB but may still fail individual items — practice finding evidence for each score.

Start with PROSPERO registration date vs screening start, then the search appendix, then the excluded studies list. These three checks cover multiple critical items within the first hour of appraisal.

7. If you are conducting your own review

Design methods to pass critical items from day one: register your protocol on PROSPERO before screening, document the full search strategy with database justification, and maintain an excluded-studies spreadsheet with reason codes.

Match risk-of-bias tools to each included study design – ROB 2 for RCTs, ROBINS-I for non-randomised interventions, QUADAS-2 for diagnostic accuracy. Applying one generic checklist to all designs fails Item 9.

Pre-specify your meta-analysis model (random vs fixed effects) in the protocol. Post-hoc switching after seeing the forest plot undermines Item 11 and GRADE credibility.

See our systematic review methodology guide for the full workflow from question to synthesis. Build AMSTAR 2 compliance into your Covidence/Rayyan workflow rather than retrofitting before submission.

  1. Register protocol on PROSPERO or OSF before screening.
  2. Run searches with librarian support; save strategies verbatim.
  3. Screen and extract in duplicate with conflict resolution.
  4. Maintain excluded-studies log with coded reasons.
  5. Apply design-appropriate ROB tools and use them in synthesis.

8. AMSTAR 2 and rapid or living reviews

Rapid reviews commissioned by NICE, WHO, or NHS bodies may legitimately omit duplicate data extraction or limit grey literature searching. AMSTAR 2 still applies, but appraisers should note the review type and whether shortcuts were transparently declared.

Living systematic reviews update search and synthesis on a schedule. Check whether each update re-ran full search and risk-of-bias assessment or only appended new studies — partial updates may fail Item 4 or Item 9 on later versions.

Scoping reviews and evidence maps are not full systematic reviews. AMSTAR 2 may be inappropriate — confirm with your supervisor before scoring a scoping review against all 16 items.

When appraising COVID-era rapid reviews, many failed Item 7 (excluded studies list) due to time pressure. Note this as a limitation when using their conclusions in coursework without independent verification.

For your own dissertation, do not label a narrative review 'systematic' unless it meets AMSTAR-critical criteria. Examiners increasingly run AMSTAR 2 on student submissions.

Dual independent screening is Item 3 — single-reviewer screening with a second checker only for conflicts may score partial yes. Document your Covidence workflow precisely when writing methods.

10. Journal club checklist (AMSTAR 2)

Before scoring, confirm the paper is a systematic review — not a narrative review or scoping review mislabelled as systematic. Read the methods for dual screening and comprehensive search.

Present critical items first in journal club: registration, search, excluded studies, ROB tools, meta-analysis methods, ROB in interpretation, publication bias. One failed critical item caps confidence at low.

Show the supplement on screen — search strategy and excluded studies list. If the lead author cannot locate them, that is your headline finding.

End with overall confidence rating and one sentence on whether you would change practice based on this review alone.

Cochrane reviews are not automatically high confidence — run AMSTAR 2 anyway. Even Cochrane outputs can fail Item 7 or Item 15 when updates truncate appendices.

10. Common student errors

AMSTAR 2 errors in coursework usually stem from treating it like a reading checklist rather than a methods audit. Avoid these patterns.

11. StrataResearch and AMSTAR 2

Systematic review and meta-analysis uploads are routed to AMSTAR 2, PRISMA 2020, ROBIS, and GRADE-aligned pathways – not to single-study tools like ROB 2 alone.

Compare your manual AMSTAR worksheet to structured output on a review PDF via quick analysis. Discrepancies often reveal items hidden in supplements you had not opened.

Use StrataResearch output as a structured second opinion, not a substitute for reading the review methods yourself.

Item 4 and Item 7 failures are the commonest discrepancies between student AMSTAR worksheets and automated screening — always open the PDF supplement before finalising your overall confidence rating.

Teaching tip: photocopy the blank AMSTAR 2 form and score three reviews in your specialty — pattern recognition for failed critical items develops faster than reading the guidance alone.

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.