Strata Academy

Regression Essentials for Critical Appraisal

Q: Does adjusting for confounders prove causation?

No. Adjustment removes bias from measured confounders on observed paths. Unmeasured confounding, selection bias, and reverse causation can remain. Causal claims require design support (randomisation, instrumental variables, careful natural experiments) — not merely a longer covariate list.

Q: When should I prefer risk ratio over odds ratio?

When the outcome is common (>10% in the study population), odds ratios diverge from risk ratios and can exaggerate perceived effects. Risk ratios and risk differences are more intuitive for clinicians. Case-control studies inherently estimate odds ratios — interpret accordingly.

Linear, logistic, and survival models – what authors should report

Quick answer

Match the regression model to the outcome type, insist on effect sizes with 95% CIs, and scrutinise how confounders were chosen. Adjusted associations are not causal without design support.

Model type must match outcome: continuous → linear; binary → logistic; time-to-event → survival (Cox).
Adjusted coefficients describe association holding other variables constant — not causation unless design supports it.
Odds ratios approximate relative risks when outcomes are rare (<10%); with common outcomes, ORs exaggerate effects.
Confounder selection should be pre-specified from clinical knowledge — stepwise data-driven selection inflates false positives.
Missing data, clustering, and time-varying exposures require appropriate methods — naive models may be invalid.

1. Why regression appears in clinical papers

Regression models estimate associations between predictors and an outcome while adjusting for other variables — typically confounders that would otherwise distort the relationship of interest. They appear in most observational papers and in many secondary analyses of trials.

You will encounter regression in cohort studies ('adjusted hazard ratio for mortality'), case-control studies ('adjusted odds ratio for disease'), cross-sectional surveys ('β coefficient for depression score'), and trial sub-studies ('treatment effect adjusted for baseline severity'). The same statistical machinery underpins all of these — but the estimand and causal interpretation differ by design.

The model type must match the outcome: continuous outcomes → linear regression; binary outcomes → logistic regression; time-to-event outcomes → survival models such as Cox proportional hazards. Mis-specified models — linear regression on bounded scores, logistic regression on correlated cluster data without adjustment — produce misleading coefficients.

Appraisal question one: does the model answer the same question the abstract claims? A paper titled 'Effect of smoking on lung function' that adjusts only for age but not occupational exposure may not support its conclusion. A trial sub-analysis that adjusts for post-randomisation variables may bias the treatment effect.

Appraisal question two: is the result an adjusted association or a causal estimate? Randomisation supports causal interpretation for the primary ITT analysis; multivariable adjustment in observational data does not, no matter how many covariates are included.

Regression adjusts for measured confounders — not unmeasured ones
Same software output, different causal meaning by study design
Abstract claims often exceed what adjustment can support
Check whether exposure and outcome timing is coherent

2. Linear regression

Linear regression models the mean of a continuous outcome as a linear function of predictors. Coefficients represent change in the outcome per unit change in the predictor, holding other variables in the model constant.

Units matter enormously in interpretation. 'β = −0.5 per year of age' differs from 'β = −5 per decade'. Authors should state units explicitly; if they do not, calculate them yourself before quoting results in coursework.

Check whether outcomes or predictors were transformed (log, square root, standardised z-scores). Log-transforming a skewed biomarker may be appropriate — but the coefficient then represents change on the log scale, not the original units.

Assumptions include linearity (effect is constant across predictor values), independent errors, and homoscedasticity (roughly equal residual spread). Authors rarely report formal assumption checks — note the gap and consider whether non-linearity would matter clinically.

Prefer coefficients with 95% confidence intervals and a clear statement of which variables were entered together in the final model. Partial regression coefficients without the full model specification are hard to appraise.

Report coefficients with 95% CIs
Clarify units (e.g. mmHg per year of age)
Distinguish unstandardised from standardised coefficients
Check for influential outliers and whether they were explored

Output	Meaning	Appraisal note
β coefficient	Change in outcome per 1-unit predictor increase	Verify units and scale
R²	Proportion of variance explained	Low R² does not invalidate predictors
Adjusted R²	Penalised for number of predictors	Compare nested models cautiously
95% CI for β	Range compatible with data	Does interval exclude clinically important effects?

3. Logistic regression and odds ratios

Logistic regression models the log-odds of a binary outcome as a linear function of predictors. Exponentiated coefficients are odds ratios (ORs). ORs above 1 indicate higher odds of the outcome; below 1, lower odds.

Odds ratios approximate relative risks when outcomes are rare (typically <10% in the study population). With common outcomes, ORs exaggerate effects compared with risk ratios or risk differences. A case-control study of MI in hospital patients may have event rates far above 10% — quote ORs cautiously or seek risk ratios if authors provide them.

Distinguish adjusted ORs from unadjusted comparisons. Ask which covariates were included, whether they were prespecified in a protocol, and whether the model was built in one step (enter all confounders) or via stepwise selection.

Stepwise selection (forward, backward, or both) without external validation inflates false positives — common in exploratory observational papers. STROBE and good practice favour pre-specified confounder sets based on causal diagrams or clinical knowledge.

Interaction terms test whether the exposure–outcome association differs by a third variable (e.g. treatment effect differs by sex). Interactions should be pre-specified; post-hoc interaction claims are exploratory and frequently false positives.

OR ≈ RR when outcome is rare
Common outcomes: prefer risk ratio or risk difference if available
Adjusted OR controls for included variables only
Pre-specify confounders; avoid unvalidated stepwise models

Note: High correlation between predictors (multicollinearity) can make individual coefficients unstable — sign reversal and huge standard errors signal this problem.

4. Survival analysis and Cox models

Kaplan–Meier curves describe time-to-event outcomes in the presence of censoring (patients followed until event, loss, or end of study). Log-rank tests compare curves without adjustment — useful for unadjusted visual comparison but not a substitute for multivariable models when confounding exists.

Cox proportional hazards models estimate hazard ratios (HRs) adjusting for covariates. HRs represent relative instantaneous rates of the event — not the same as risk ratios at a fixed time point. An HR of 2 does not mean double the 5-year risk unless hazards are proportional and baseline risks similar.

Check censoring assumptions: non-informative censoring means censoring is unrelated to prognosis beyond what is in the model. Informative censoring (e.g. sicker patients drop out) biases HRs.

Proportional hazards means the ratio of hazards is constant over time. Test with Schoenfeld residuals or stated graphical checks. If hazards cross (curves diverge then converge), a simple Cox model may be invalid — time-varying coefficients or alternative models may be needed.

Competing risks matter when one event prevents another (death vs cancer recurrence). Standard Cox models treat competing events as censoring, which can distort sub-hazard interpretation. Look for cause-specific or subdistribution hazard models when competing risks are clinically important.

Immortal time bias and time-varying exposures require specialised models — a naive Cox model that treats post-diagnosis exposure as baseline exposure may be invalid. Landmark analysis or time-varying covariate models address this.

HR ≠ risk ratio at fixed time point
Check proportional hazards assumption
Competing risks: death may censor recurrence inappropriately
Immortal time bias: exposure defined after time zero incorrectly

Tip: When Kaplan–Meier curves cross, inspect whether authors used an appropriate survival model or only reported an unadjusted log-rank p-value.

5. Confounding, colliders, and what adjustment cannot fix

Confounding occurs when a third variable is associated with both exposure and outcome and lies on a non-causal path. Adjustment blocks that path — if the confounder is measured without error. Age, sex, and severity often confound observational treatment comparisons.

Over-adjustment is a real problem. Adjusting for variables on the causal pathway between exposure and outcome (mediators) can block the effect you want to estimate. Adjusting for colliders (common effects of exposure and outcome) can open spurious paths and create bias.

Unmeasured confounding remains the central limitation of observational regression. E-values and sensitivity analyses quantify how strong an unmeasured confounder would need to be to explain away an observed association — look for these in stronger papers.

Propensity scores summarise confounding into a single probability of treatment. They are useful for matching or weighting but do not replace thoughtful confounder selection. Propensity scores do not adjust for unmeasured confounders either.

In randomised trials, primary ITT analysis should not require extensive adjustment for baseline covariates — randomisation balances confounders in expectation. Adjusted analyses may be pre-specified for precision (ANCOVA with baseline) but should not replace unadjusted ITT as primary.

Variable role	Adjust?	Risk if mishandled
Confounder	Yes — if measured	Residual confounding if omitted
Mediator	Usually no for total effect	Over-adjustment attenuates true effect
Collider	No	Collider stratification bias
Instrument	No (different method)	Biased causal estimate

6. Missing data, clustering, and model validity

Missing data must be handled explicitly: complete-case analysis, multiple imputation, inverse probability weighting, or full maximum likelihood. Complete-case deletion can bias adjusted estimates if missingness relates to outcome or exposure — link to our missing data guide and ROB 2 Domain 3.

For cluster trials, repeated measures, or hierarchical data (patients within hospitals), standard regression assumes independent observations. Ignoring clustering underestimates standard errors and inflates false positives. Mixed models, generalised estimating equations (GEE), or robust standard errors address this.

Sample size for regression depends on events per variable (EPV) for logistic and Cox models — rough rules suggest at least 10 events per predictor for stable estimates, though context varies. A logistic model with 30 events and 15 predictors is likely overfit.

Non-linear effects require explicit modelling — splines, polynomial terms, or categorisation (with caution about cut-point choice). Treating a continuous dose as binary 'high vs low' loses information and can bias effects.

External validation on an independent dataset is rare in clinical papers but distinguishes predictive models from overfit exploratory ones. Machine learning papers should report calibration and discrimination on held-out data — not only training-set performance.

Complete-case analysis: note who was excluded
Clustering: mixed models or robust SEs required
EPV matters for logistic and Cox models
Non-linearity: check dose–response shape

Note: A statistically significant adjusted OR from a model with 20 predictors and 50 events is unlikely to replicate — flag overfitting in appraisal.

7. Appraisal checklist

Use this checklist when reading multivariable results tables, forest plots of adjusted estimates, or StrataResearch statistical feedback on regression analyses. The goal is to judge whether the model supports the authors' conclusions — not to recalculate coefficients.

STROBE reporting guidelines for observational studies specify minimum regression reporting standards: model type, variables entered, how continuous variables were handled, missing data approach, and software. CONSORT extensions apply to regression in trial sub-analyses.

Is the model type appropriate for the outcome and design?
Are effect sizes and 95% CIs reported for all key predictors?
Were confounders pre-specified from clinical/causal reasoning?
Is missing data handled appropriately — not only complete cases?
Was clustering accounted for if data are hierarchical?
Are interactions pre-specified or labelled exploratory?
Do authors distinguish adjusted association from causation?

Is the model type appropriate for the outcome?
Are effect sizes and CIs reported for all key predictors?
Is missing data handled in the model or via imputation?
Are interactions pre-specified or exploratory?
Does the abstract claim match the estimand?

Frequently asked questions

Does adjusting for confounders prove causation?

No. Adjustment removes bias from measured confounders on observed paths. Unmeasured confounding, selection bias, and reverse causation can remain. Causal claims require design support (randomisation, instrumental variables, careful natural experiments) — not merely a longer covariate list.

When should I prefer risk ratio over odds ratio?

When the outcome is common (>10% in the study population), odds ratios diverge from risk ratios and can exaggerate perceived effects. Risk ratios and risk differences are more intuitive for clinicians. Case-control studies inherently estimate odds ratios — interpret accordingly.

What is the difference between HR and RR?

A hazard ratio compares instantaneous event rates over time in Cox models. A risk ratio compares cumulative risks at a specific time point. They coincide approximately when events are rare and proportional hazards hold; they diverge for common events or when hazards are non-proportional.

Is stepwise variable selection acceptable?

Data-driven stepwise selection without external validation is discouraged for confirmatory analyses because it inflates false positives and produces overfit models. Pre-specify confounders from clinical knowledge or causal diagrams. If stepwise was used, treat findings as hypothesis-generating.

How many events do I need per variable in logistic regression?

Rules of thumb suggest at least 10 events per predictor variable for stable estimates, though this is debated. Models with many predictors relative to events produce unreliable coefficients — note EPV when appraising small observational studies.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.