Strata Academy

How to Critically Appraise a Paper (Step-by-Step)

Q: What does critical appraisal mean?

Critical appraisal is a structured assessment of whether a study's design, conduct, analysis, and reporting support its conclusions. It asks if results are valid, precise, and applicable to your question — not whether you agree with the authors.

Q: How do you critically appraise a research paper step by step?

Define your PICO first, identify the true study design, then apply the matching checklist (RoB 2, ROBINS-I, QUADAS-2, or AMSTAR 2 + PRISMA). Judge bias domain by domain, check effect sizes with confidence intervals, and finish with applicability to your setting.

Q: Which checklist should I use for critical appraisal?

RCT → RoB 2 (+ CONSORT for reporting). Non-randomised intervention → ROBINS-I. Diagnostic accuracy → QUADAS-2 (+ STARD). Systematic review / meta-analysis → AMSTAR 2, PRISMA 2020, and GRADE for certainty.

Q: What is the difference between internal and external validity?

Internal validity concerns whether the study design and conduct support causal or accurate inference — risk of bias. External validity concerns generalisability — whether findings apply beyond the studied population, setting, and intervention as implemented.

A practical workflow: PICO, match the right checklist (RoB 2, AMSTAR 2, QUADAS-2), judge bias, read the statistics, and summarise for journal club

Quick answer

Critical appraisal steps: (1) define your PICO question, (2) identify true study design, (3) apply the matching framework (ROB 2 for RCTs, ROBINS-I for non-randomised interventions, QUADAS-2 for diagnostics, AMSTAR 2 + PRISMA for reviews), (4) evaluate statistics with CIs not p-values alone, (5) judge applicability to your setting.

1. Why critical appraisal matters

Reading a paper is not the same as trusting it. Critical appraisal asks whether the study design, conduct, analysis, and reporting are strong enough for the conclusions the authors draw.

Supervisors, journal clubs, and clinical guidelines all depend on this skill. A structured approach stops you from being swayed only by a significant p-value or a confident abstract.

In coursework and theses, examiners look for explicit framework use – not a vague paragraph saying 'the study had some limitations'.

Validity – could the results be explained by bias or chance?
Precision – how uncertain are the estimates?
Applicability – does this study answer your patient or research question?

Tip: Keep your clinical or research question written down before you open the PDF. Appraisal is always question-driven.

2. First pass: title, abstract, and PICO

Start with the abstract, but never stop there. Identify the population, intervention or exposure, comparator, outcomes, and study design (PICO/PECO).

Ask: Is this paper attempting to establish causation, describe association, measure diagnostic accuracy, or synthesise existing studies? The answer determines which appraisal tool you need next.

Check trial registration (ClinicalTrials.gov, ISRCTN) and protocol documents if the paper is a trial or review – compare registered outcomes to published outcomes.

Who was studied, and can you generalise to your setting?
What was done or measured?
What outcomes matter for your decision?
Is the design named correctly (RCT, cohort, case–control, cross-sectional, systematic review)?
Who funded the study and are conflicts declared?

3. Match study design to the right framework

Using the wrong checklist is the most common student error. Randomised trials need ROB 2, not tools built for cohort studies. Systematic reviews need AMSTAR 2 and PRISMA – not ROB 2 applied to the review as if it were a trial.

Use our interactive framework picker on the guides hub if you are unsure after reading the methods section.

RCT → ROB 2 + CONSORT reporting
Non-randomised intervention → ROBINS-I + STROBE
Cohort / case–control → NOS or design-specific tools + STROBE
Diagnostic accuracy → QUADAS-2 + STARD
Systematic review / meta-analysis → AMSTAR 2, PRISMA, ROBIS, GRADE

Tip: Read our ROB 2 hub for a full domain walkthrough if the paper is an RCT.

4. Appraise risk of bias domain by domain

Official tools break bias into domains (e.g. randomisation, deviations from intended interventions, missing outcome data). Work through signalling questions from the official tool rather than gut feeling.

Distinguish risk of bias from reporting quality. A poorly written paper may still be low bias if methods were sound; conversely, polished writing cannot fix fundamental design flaws.

For each domain, note your judgement and one sentence of justification – examiners and journal club audiences expect reasoning, not only a traffic-light colour.

Selection bias – who entered the study and who was analysed?
Performance bias – were groups comparable during the intervention?
Detection bias – could outcome assessment differ between groups?
Attrition bias – is missing data related to outcome?
Reporting bias – are all prespecified outcomes reported?

Note: Do not conflate 'not reported' with 'not done'. Note unclear reporting as a limitation, but separate it from judged bias where possible.

5. Evaluate the statistics (not just the p-value)

Check whether the analysis matches the design: logistic regression for binary outcomes, survival methods for time-to-event data, paired tests only when pairing exists.

Look for effect sizes with confidence intervals, not only p-values. For trials, prefer intention-to-treat analyses unless there is a clear and justified per-protocol secondary analysis.

For subgroup analyses, ask whether they were pre-specified. Post-hoc fishing without multiplicity adjustment is a red flag.

Was the sample size justified a priori?
Are confidence intervals reported for main estimates?
Is multiple testing acknowledged or adjusted where needed?
How was missing data handled?
For reviews: was heterogeneity explored (I², τ², prediction intervals)?

Tip: Our statistics guides cover p-values, power, regression, missing data, and meta-analysis in depth.

6. Interpret results in context

Separate statistical significance from clinical or practical importance. A large sample can make trivial differences significant; a small sample may be underpowered even if p < 0.05.

Read the limitations section critically, then add your own – especially generalisability, confounding, and whether outcomes were patient-centred.

Consider whether the discussion overstates causation from observational data or extrapolates beyond the studied population.

7. Summarise for your supervisor or journal club

A good appraisal ends with a plain-language verdict: strengths, main biases, key numbers, and whether you would act on this evidence for your question.

Structured tools encode this workflow: study-type routing, framework-aligned domains, and explicit scoring – so your appraisal is reproducible and auditable.

One-sentence study aim in your own words
Design + framework used
Top 2–3 strengths and top 2–3 concerns
Bottom line for your PICO question
What would change your mind (future evidence)

8. Appraising systematic reviews differently

When the paper is a systematic review, add PRISMA flow reconciliation, AMSTAR 2 quality, risk of bias in included studies, and GRADE certainty. The failure modes are different from single-trial appraisal.

See our systematic review methodology guide for the full workflow.

9. AI tools – use with caution

Chat tools can summarise text but often mismatch frameworks and invent checklist items. For coursework, disclose AI use and verify every claim against the PDF.

Framework-aligned appraisal tools route study type automatically – compare AI chat output to structured appraisal on the same paper.

10. Practice with a structured workflow

Pick one paper from your reading list each week. Appraise it with the same template until domain thinking becomes automatic.

Upload the PDF to StrataResearch quick analysis and compare your manual ROB 2 or AMSTAR worksheet to the structured output – disagreement is where learning happens.

Frequently asked questions

What does critical appraisal mean?

Critical appraisal is a structured assessment of whether a study's design, conduct, analysis, and reporting support its conclusions. It asks if results are valid, precise, and applicable to your question — not whether you agree with the authors.

How do you critically appraise a research paper step by step?

Define your PICO first, identify the true study design, then apply the matching checklist (RoB 2, ROBINS-I, QUADAS-2, or AMSTAR 2 + PRISMA). Judge bias domain by domain, check effect sizes with confidence intervals, and finish with applicability to your setting.

Which checklist should I use for critical appraisal?

RCT → RoB 2 (+ CONSORT for reporting). Non-randomised intervention → ROBINS-I. Diagnostic accuracy → QUADAS-2 (+ STARD). Systematic review / meta-analysis → AMSTAR 2, PRISMA 2020, and GRADE for certainty.

What is the difference between internal and external validity?

Internal validity concerns whether the study design and conduct support causal or accurate inference — risk of bias. External validity concerns generalisability — whether findings apply beyond the studied population, setting, and intervention as implemented.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.