Strata Academy

How to Appraise a Cohort Study: STROBE, ROBINS-I & NOS

Prospective vs retrospective cohorts, confounding and immortal time bias, framework selection, and worked appraisal checklist

Quick answer

Cohort studies follow exposed and unexposed groups over time. Appraise with STROBE for reporting completeness, ROBINS-I (or NOS) for risk of bias — not ROB 2. Focus on confounding control, selection at baseline, follow-up completeness, and whether the analysis matches the question (ITT-like vs per-protocol).

ROB 2 is for randomised trials — cohort studies need ROBINS-I or NOS.
Check baseline comparability and what was adjusted in multivariable models.
Immortal time bias and selection at enrolment are common fatal flaws.
Reporting (STROBE) ≠ low risk of bias.

1. Recognising cohort design

A cohort study identifies groups by exposure (or risk factor) status and follows them forward in time to compare outcome incidence. Prospective cohorts enrol before outcomes occur; retrospective cohorts use existing records but still compare groups defined by exposure at a baseline time point.

Case-control studies start from outcome status and look backward — different framework (often NOS case-control variant or ROBINS-I for case-control if intervention-focused). Cross-sectional studies measure exposure and outcome simultaneously — weaker for causality.

Students often mislabel 'patients who received treatment X' as an RCT when allocation was by clinician choice. If there is no random assignment, use observational appraisal tools.

Prospective cohort — exposure defined at enrolment, follow-up planned
Retrospective cohort — exposure and outcomes from records; watch for misclassification
Historical controls — high risk of selection and confounding
Registry-linked cohorts — strong for follow-up if linkage is complete

2. Which framework to use

STROBE checks whether the paper reports what a cohort study should contain — eligibility, follow-up, numbers at each stage, confounders measured. It does not by itself tell you whether the estimate is trustworthy.

ROBINS-I assesses risk of bias in non-randomised studies of interventions — ideal when comparing treated vs untreated groups or policy changes.

Newcastle–Ottawa Scale (NOS) is widely used in reviews for cohort and case-control studies; it is simpler but less granular than ROBINS-I. Many systematic reviews report both AMSTAR 2 on the review and NOS/ROBINS-I on included cohorts.

Question type	Reporting	Risk of bias
Cohort of intervention vs usual care	STROBE	ROBINS-I
Cohort of prognostic factor	STROBE	QUIPS or adapted checklists
Cohort in systematic review	STROBE + PRISMA	ROBINS-I or NOS
Randomised trial mislabelled as cohort	CONSORT	Reclassify — may be quasi-RCT

3. Confounding and adjustment

Confounding occurs when a third factor is associated with both exposure and outcome and distorts the crude association. Age, severity, comorbidity, and socioeconomic status are frequent confounders in clinical cohorts.

Ask what variables were measured at baseline, what was entered into multivariable models, and whether propensity scores or instrumental variables were used appropriately. Unadjusted estimates from observational data rarely support causal claims.

Residual confounding remains even after adjustment — authors should discuss unmeasured confounders (e.g. smoking, frailty) and their likely direction.

Crude vs adjusted estimates — large changes suggest confounding
E-value or sensitivity analyses — stronger than assertion alone
Negative controls and falsification tests — advanced but informative when present
Immortal time bias — exposure time counted before it could occur (common in database studies)

Note: Immortal time bias inflates apparent benefit when treatment initiation is defined after a delay during which early deaths cannot occur in the 'treated' group. Read database cohort methods carefully.

4. Follow-up, attrition, and competing risks

Incomplete follow-up related to prognosis causes attrition bias. Compare baseline characteristics of completers vs lost to follow-up. Sensitivity analyses restricted to complete cases or using inverse probability weighting strengthen credibility.

Competing risks (death from other causes) matter when the outcome is non-fatal events in elderly populations. Kaplan–Meier censoring at competing events may overestimate incidence — cumulative incidence or competing-risks regression may be more appropriate.

Time-varying exposures require time-varying analysis — treating time-varying treatment as fixed baseline exposure biases results.

5. Ten-minute student appraisal checklist

Use this sequence in journal club or coursework — same logic as StrataResearch observational routing.

Confirm design: cohort (not case-control or cross-sectional).
Identify exposure, comparator, and primary outcome with time point.
Check STROBE flow: eligible → included → analysed at each stage.
Assess baseline comparability and statistical adjustment.
Apply ROBINS-I or NOS — note confounding and selection domains.
Read limitations for unmeasured confounding and reverse causation.
Judge whether the conclusion matches effect size and precision (CI width).
State applicability: does the cohort match your patient population?

6. Worked example (abstract-level)

Frequently asked questions

Can I use ROB 2 for a cohort study?

No. ROB 2 is for randomised trials. Use ROBINS-I for intervention cohorts or NOS for broader observational appraisal in reviews.

What is the difference between STROBE and ROBINS-I?

STROBE assesses reporting transparency. ROBINS-I assesses risk of bias in the effect estimate. A well-reported cohort can still be high risk of bias if confounding was poorly handled.

When is a cohort study 'good enough' for clinical practice?

High-quality cohorts with consistent findings across studies, plausible mechanisms, and GRADE assessment may support weak or moderate certainty evidence — rarely as strong as well-conducted RCTs for causal treatment claims.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.