Strata Academy

Newcastle–Ottawa Scale (NOS) explained – cohort and case–control studies

Selection, comparability, and exposure/outcome domains; star ratings; limits for intervention causal inference

Quick answer

NOS assigns up to nine stars across selection, comparability, and exposure/outcome domains for cohort and case–control studies. Cochrane prefers ROBINS-I for non-randomised intervention questions; NOS remains common in observational meta-analyses and environmental epidemiology.

NOS stars measure selected observational quality features — not proof of causation.
Three domains: selection (4 stars), comparability (2 stars), exposure/outcome (3 stars).
Cohort and case–control versions differ — use the checklist matched to the design.
For intervention causal questions, ROBINS-I is Cochrane's preferred tool over NOS.
Report domain-level stars in meta-analyses; arbitrary ≥7 cut-offs need justification.

1. What is the Newcastle–Ottawa Scale?

NOS is a widely used quality assessment tool for non-randomised cohort and case–control studies, particularly in systematic reviews and meta-analyses of observational data.

It assigns up to nine stars across three domains: selection of study groups, comparability of groups, and ascertainment of exposure (case–control) or outcome (cohort).

NOS was not designed as a risk-of-bias tool for causal intervention inference – it assesses selected observational quality features that correlate imperfectly with validity. A study can earn seven stars yet still be seriously confounded if key prognostic variables were unmeasured.

When appraising a meta-analysis, note whether authors used NOS, ROBINS-I, or an ad hoc checklist – tool choice affects how much to trust pooled estimates. Specialty journals in nutrition, environmental health, and surgery still cite NOS frequently; Cochrane intervention reviews increasingly expect ROBINS-I instead.

For UK medical students, NOS often appears when appraising observational papers in epidemiology modules, public health SSCs, or when reading meta-analyses that pre-date ROBINS-I adoption. Knowing both tools helps you critique older literature without applying the wrong framework to new coursework.

The Ottawa Hospital Research Institute hosts the canonical NOS forms. Some journals modified item wording — when appraising a meta-analysis, use the same NOS variant the review authors applied for fair comparison.

Star inflation: students sometimes award comparability stars because 'multivariable regression' appears in the abstract without reading the covariate list. Always open the statistical methods table before scoring.

Tip: Download the official NOS forms for cohort and case–control studies — they are not interchangeable.

2. Three NOS domains

Work through each domain independently – do not collapse to a single gut-feel star count without domain-level notes.

Selection stars reward representativeness of exposed and non-exposed cohorts, adequate case definitions, and community-based controls in case–control designs. Hospital controls often share exposure risks with cases and can inflate associations.

Comparability stars require control for key confounders in design or analysis. One star for controlling age/sex; a second often requires additional important prognostic factors relevant to the outcome. Check which variables were adjusted — not just that 'multivariable regression' was used.

Outcome or exposure stars reward blinded assessment, independent record linkage, or adequate follow-up duration in cohort studies. In case–control studies, the emphasis shifts to reliable exposure ascertainment — recall bias is a major threat when exposure is self-reported years later.

Selection (max 4 stars) – representativeness, non-exposed selection, exposure ascertainment
Comparability (max 2 stars) – adjustment for key confounders
Outcome/exposure (max 3 stars) – blinded assessment; adequate follow-up (cohort)

Domain	Max stars	Cohort focus	Case–control focus
Selection	4	Representative exposed/unexposed cohorts	Adequate case definition; community controls
Comparability	2	Adjust for key confounders	Same — design or analysis adjustment
Outcome / exposure	3	Blinded outcome; adequate follow-up	Blinded exposure ascertainment; same for non-respondents

3. Cohort vs case–control variants

NOS has design-specific prompts. Case–control studies emphasise adequate case definition, community controls rather than hospital controls, and reliable exposure ascertainment.

Cohort studies emphasise that the outcome was not present at start, follow-up duration was long enough for the disease biology, and loss to follow-up was acceptable. A cohort with 40% loss to follow-up rarely deserves full follow-up stars unless authors demonstrate loss was unrelated to outcome.

Read the version of NOS your review team or journal specifies – wording differs slightly between adaptations used in different meta-analysis traditions. The Ottawa Hospital Research Institute hosts the canonical forms.

Case–control designs are efficient for rare outcomes but prone to recall and selection bias – star ratings should reflect those threats. Nested case–control studies within prospective cohorts often score better on exposure measurement than population case–control studies relying on recall.

Cohort: outcome-free at baseline; follow-up long enough for disease biology
Case–control: cases defined consistently; controls from same underlying population
Both: comparability requires meaningful confounder control, not token adjustment
Record which NOS variant you used in your dissertation appendix

Note: Applying the cohort NOS form to a case–control paper (or vice versa) invalidates your appraisal.

4. NOS vs ROBINS-I

For causal questions about interventions in non-randomised data, Cochrane increasingly favours ROBINS-I because it targets intervention bias domains explicitly with signalling questions.

NOS remains common in historical meta-analyses, environmental epidemiology, and some specialty journals. When appraising a review, note which tool was used and whether it matched the causal question.

High NOS stars do not prove causation – only that common observational quality items were partially addressed. ROBINS-I asks directly about confounding, selection into the study, deviations from intended interventions, and selective reporting — threats NOS captures only indirectly.

In coursework, if the question is 'does treatment X cause outcome Y?', prefer ROBINS-I; if describing prognostic cohort quality in a non-intervention context (e.g. biomarker prediction), NOS may be acceptable. When in doubt, ask your supervisor which tool the module expects.

Feature	NOS	ROBINS-I
Primary use	Observational quality in meta-analyses	Risk of bias in non-randomised intervention studies
Output	Star count per domain (0–9 total)	Domain judgements: low to critical risk
Confounding	Partially via comparability stars	Dedicated domain with signalling questions
Cochrane intervention reviews	Legacy / field-dependent	Preferred for observational interventions

5. Interpreting star counts

Reviews sometimes dichotomise ≥7 stars as 'high quality' – thresholds are arbitrary and field-specific. The original NOS authors did not endorse a universal cut-off.

Report stars per domain and justify cut-points if you use them in meta-analysis sensitivity analyses. Examiners and reviewers increasingly reject blanket '≥7 stars included' statements without domain detail.

A study can score well on selection but poorly on comparability if key confounders were unmeasured – domain detail matters more than total stars. Always note which domain drove a low total.

Do not exclude studies from narrative synthesis solely on NOS thresholds without examining whether exclusion changes conclusions directionally. Sensitivity analyses that include and exclude lower-star studies strengthen your discussion.

Present domain stars, not only total (e.g. 4 + 1 + 2 = 7)
Justify any dichotomisation used for subgroup or sensitivity analysis
Compare NOS with study-level ROBINS-I when both are reported — they may disagree
Low stars in comparability often matter most for causal interpretation

6. Worked example – observational cohort

Apply NOS domain by domain to a published cohort study. The Nurses' Health Study paper below illustrates long follow-up, repeated exposure measurement, and multivariable adjustment — but still requires careful confounding judgement.

When scoring, quote evidence from the paper for each star awarded or withheld. 'Selection: 3/4 — exposed cohort drawn from registered nurses, reasonably representative of US female health professionals but not general population' is the level of detail examiners expect.

7. NOS in observational meta-analysis

When NOS appears in a systematic review, authors often present mean or median stars, subgroup analyses by quality, or exclusion of low-star studies. Appraise whether those choices were pre-specified in the protocol.

Pooling observational studies with different NOS profiles can obscure heterogeneity driven by confounding control rather than chance. Inspect the forest plot alongside NOS tables.

Publication bias and selective reporting are not NOS domains — a review can include only high-NOS studies yet still miss unpublished null results. Pair NOS tables with funnel plot discussion when meta-analysis is performed.

For student reviews using observational evidence, consider presenting NOS per study in an appendix table even if your primary bias tool is ROBINS-I — it helps readers familiar with older literature.

Sensitivity analyses excluding low comparability studies should report whether pooled effect direction changed — not only whether heterogeneity I² decreased.

Extract NOS domain stars per included study.
Check whether review authors pre-specified quality thresholds.
Run sensitivity analysis excluding low comparability stars if feasible.
Discuss whether star differences explain forest plot heterogeneity.
State limitations of NOS for causal inference in the discussion.

8. Item-by-item scoring tips

Selection star 1 (representativeness of exposed cohort): community-based or population register samples score better than single-specialty clinic series. UK Biobank and CPRD cohorts have known selection properties — note them when scoring.

Selection star 4 (follow-up adequate in cohort studies): follow-up must be long enough for the outcome to occur and complete enough that bias from loss is unlikely. Cancer registries with linkage often score well; questionnaire follow-up with >30% loss rarely does.

Comparability star 1: control for the most important factor in design or analysis — often age and sex. Comparing adjusted ORs without listing covariates does not earn the star.

Comparability star 2: control for additional important factors. In cardiovascular cohorts, smoking, diabetes, and blood pressure; in cancer, stage and comorbidity. Read the multivariable model table, not the abstract claim of 'fully adjusted'.

Outcome star (cohort): independent blind assessment or record linkage to national mortality/registries (ONS, NHS Digital) supports high ascertainment quality. Self-report of hospitalisation is weaker.

NOS item	Award star when	Withhold when
Representative exposed cohort	Community or population sampling	Convenience clinic series only
Comparability — 2nd star	Additional key confounders adjusted	Only age/sex adjusted for complex exposure
Adequate follow-up (cohort)	Long enough duration; minimal loss	High loss or differential dropout
Blind outcome assessment	Assessor unaware of exposure	Self-report without validation

10. Journal club checklist (NOS)

Confirm the design is cohort or case–control before opening NOS. If the paper randomised treatment, stop — use ROB 2 instead.

Present domain stars on three rows (selection, comparability, outcome/exposure), not only a total out of nine. Explain which domain drove a low total.

If the clinical question is causal intervention, state that ROBINS-I would be preferred in Cochrane reviews — even while scoring NOS for module requirements.

For meta-analysis papers, ask whether authors used NOS thresholds to exclude studies and whether that changed the pooled direction.

When writing coursework, paste the NOS form into your appendix and highlight each star decision in a third column with page references. This format consistently scores well in epidemiology marking schemes at UK medical schools.

Design check before NOS variant selection
Domain stars on slides, not total alone
Name comparability covariates explicitly
Acknowledge ROBINS-I for intervention causation
Question arbitrary quality thresholds in meta-analysis

11. Common mistakes

Students repeat predictable errors when first encountering NOS in journal club or meta-analysis coursework. Avoiding them keeps your appraisal defensible.

Applying NOS to RCTs or diagnostic accuracy studies.
Giving comparability stars without checking which covariates were adjusted and whether they are sufficient.
Ignoring loss to follow-up in cohort studies – especially informative censoring related to outcome.
Treating NOS as interchangeable with ROBINS-I in intervention meta-analyses without supervisor approval.
Awarding full outcome stars because the outcome was 'objective' when ascertainment was still differential.
Using total stars alone in slides without domain breakdown.

12. StrataResearch and NOS

Cohort and case–control manuscripts may receive NOS-aligned secondary appraisal alongside design-appropriate primary frameworks.

Intervention comparisons route primarily to ROBINS-I; NOS context may appear when design is prognostic rather than interventional.

Upload observational papers via quick analysis to see which framework StrataResearch selects before completing manual worksheets. Framework mismatch — scoring an intervention cohort with NOS alone — is flagged when study type suggests ROBINS-I.

When preparing dissertation evidence tables, export domain-level NOS stars alongside ROBINS-I judgements if your review includes both prognostic and interventional observational studies — readers expect transparent tool mapping.

Teaching tip: appraise the same cohort paper with NOS and ROBINS-I side by side once — the contrast clarifies why Cochrane moved toward ROBINS-I for intervention questions.

In environmental and nutritional epidemiology modules, NOS remains the expected tool — know the domain definitions even when ROBINS-I is your default for clinical intervention reviews.

Record total and domain stars in your reference manager notes — you will thank yourself when writing the discussion chapter months later.

Domain-level reporting is non-negotiable in systematic review appendices.

Design detection routes intervention papers to ROBINS-I first
NOS-aligned output for prognostic cohort and case–control designs
Domain-structured feedback supports evidence tables in dissertations

Frequently asked questions

What is the Newcastle–Ottawa Scale?

NOS is a nine-star quality assessment tool for cohort and case–control studies, widely used in observational meta-analyses. Stars are awarded across selection, comparability, and exposure/outcome domains.

What is a good NOS score?

There is no official cut-off. Reviews sometimes use ≥7 stars as 'high quality', but thresholds are arbitrary. Report domain-level stars and justify any dichotomisation used in sensitivity analyses.

Should I use NOS or ROBINS-I?

For non-randomised intervention studies and Cochrane-style causal questions, use ROBINS-I. NOS remains acceptable for prognostic observational meta-analyses and is still common in some specialty fields — follow your supervisor and journal guidance.

Can I use NOS for RCTs?

No. RCTs should be appraised with ROB 2 for risk of bias and CONSORT for reporting. NOS is designed for cohort and case–control observational designs.

Does high NOS mean the study shows causation?

No. NOS reflects whether selected quality features were partially addressed — representativeness, confounder control, and measurement. Residual confounding and selection bias can remain even in high-star studies.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.