Strata Academy

How to Appraise a Cohort Study: STROBE, ROBINS-I & NOS

Prospective vs retrospective cohorts, confounding and immortal time bias, framework selection, and worked appraisal checklist

Quick answer

Cohort studies follow exposed and unexposed groups over time. Appraise with STROBE for reporting completeness, ROBINS-I (or NOS) for risk of bias — not ROB 2. Focus on confounding control, selection at baseline, follow-up completeness, and whether the analysis matches the question (ITT-like vs per-protocol).

1. Recognising cohort design

A cohort study identifies groups by exposure (or risk factor) status and follows them forward in time to compare outcome incidence. Prospective cohorts enrol before outcomes occur; retrospective cohorts use existing records but still compare groups defined by exposure at a baseline time point.

Case-control studies start from outcome status and look backward — different framework (often NOS case-control variant or ROBINS-I for case-control if intervention-focused). Cross-sectional studies measure exposure and outcome simultaneously — weaker for causality.

Students often mislabel 'patients who received treatment X' as an RCT when allocation was by clinician choice. If there is no random assignment, use observational appraisal tools.

2. Which framework to use

STROBE checks whether the paper reports what a cohort study should contain — eligibility, follow-up, numbers at each stage, confounders measured. It does not by itself tell you whether the estimate is trustworthy.

ROBINS-I assesses risk of bias in non-randomised studies of interventions — ideal when comparing treated vs untreated groups or policy changes.

Newcastle–Ottawa Scale (NOS) is widely used in reviews for cohort and case-control studies; it is simpler but less granular than ROBINS-I. Many systematic reviews report both AMSTAR 2 on the review and NOS/ROBINS-I on included cohorts.

3. Confounding and adjustment

Confounding occurs when a third factor is associated with both exposure and outcome and distorts the crude association. Age, severity, comorbidity, and socioeconomic status are frequent confounders in clinical cohorts.

Ask what variables were measured at baseline, what was entered into multivariable models, and whether propensity scores or instrumental variables were used appropriately. Unadjusted estimates from observational data rarely support causal claims.

Residual confounding remains even after adjustment — authors should discuss unmeasured confounders (e.g. smoking, frailty) and their likely direction.

4. Follow-up, attrition, and competing risks

Incomplete follow-up related to prognosis causes attrition bias. Compare baseline characteristics of completers vs lost to follow-up. Sensitivity analyses restricted to complete cases or using inverse probability weighting strengthen credibility.

Competing risks (death from other causes) matter when the outcome is non-fatal events in elderly populations. Kaplan–Meier censoring at competing events may overestimate incidence — cumulative incidence or competing-risks regression may be more appropriate.

Time-varying exposures require time-varying analysis — treating time-varying treatment as fixed baseline exposure biases results.

5. Ten-minute student appraisal checklist

Use this sequence in journal club or coursework — same logic as StrataResearch observational routing.

  1. Confirm design: cohort (not case-control or cross-sectional).
  2. Identify exposure, comparator, and primary outcome with time point.
  3. Check STROBE flow: eligible → included → analysed at each stage.
  4. Assess baseline comparability and statistical adjustment.
  5. Apply ROBINS-I or NOS — note confounding and selection domains.
  6. Read limitations for unmeasured confounding and reverse causation.
  7. Judge whether the conclusion matches effect size and precision (CI width).
  8. State applicability: does the cohort match your patient population?

6. Worked example (abstract-level)

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.