Strata Academy

Sample size and statistical power

Why size matters for interpreting negative and positive findings

Quick answer

Power is the chance of detecting a real effect of a prespecified size. Underpowered studies produce inconclusive 'negative' results – wide CIs, not proof of no difference.

1. Core concepts

Power is the probability of detecting an effect of a given size if it truly exists, usually set at 80% or 90%. Underpowered studies are common in clinical research.

Pre-specified sample size calculations should state the primary outcome, expected effect size, alpha, power, and assumed control event rate or standard deviation.

Effect size in the calculation should be clinically meaningful – detecting a trivial difference with huge n is statistically possible but wasteful.

Type I error (α) is the false-positive rate (typically 0.05). Type II error (β) is a false negative; power = 1 − β.

2. What to check in a manuscript

Was sample size justified before recruitment began? Post-hoc power calculations after a non-significant result are generally discouraged – they do not recover a failed study.

For cluster trials, crossover designs, or non-inferiority trials, specialised methods apply. Generic 'n per group' statements may hide inflation factors for clustering.

Multiplicity adjustments for several primary outcomes should be reflected in the sample size if claimed.

Early stopping rules, if used, should be pre-specified with statistical monitoring plans.

3. Interpreting 'negative' trials

A non-significant p-value with wide confidence intervals is inconclusive, not proof of no difference. Ask whether the interval excludes clinically important harm or benefit.

Non-inferiority and equivalence trials use different framing – they aim to show an effect is not worse than a margin, not that two treatments are identical.

Observational studies rarely have formal power calculations; imprecision should be discussed via CI width and clinical context.

CASP and ROB 2 both intersect here: was the study large enough to answer the question the authors later claim?

4. Common mistakes

Treating p > 0.05 as 'no effect' without inspecting CIs.

Accepting post-hoc power as reassurance after a null result.

Ignoring dropout when interpreting ITT results.

Assuming a significant result from an underpowered study is robust – winner's curse can inflate effect sizes.

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.