Strata Academy

Clinical vs Statistical Significance Explained

Q: Is p < 0.05 always clinically meaningful?

No. With large samples, trivial effects become statistically significant. Always interpret effect size, absolute measures, and clinical context — not the p-value alone.

Q: What is the difference between ARR and RRR?

Relative risk reduction (RRR) is the proportional reduction in risk. Absolute risk reduction (ARR) is the actual difference in event rates between groups. ARR drives NNT and patient-facing decisions.

Q: How does GRADE relate to clinical significance?

GRADE imprecision domain downgrades certainty when confidence intervals are wide or include values on both sides of clinical thresholds. High statistical significance with wide CI spanning trivial and large effects may still be low certainty.

Q: Should I report NNT in my dissertation?

Yes when you discuss binary primary outcomes from trials or meta-analyses. Show calculation from control event rate and relative effect. Discuss whether NNT is acceptable given harms and alternatives.

P-values, confidence intervals, minimally important differences, and absolute effects — when a significant result still does not matter clinically

Quick answer

Statistical significance means the observed effect is unlikely due to chance alone (often p < 0.05). Clinical significance means the effect is large enough to matter to patients — judged by absolute effects, minimally important differences, and guideline thresholds. A result can be statistically significant but clinically trivial, or clinically important but imprecise (wide CI).

p < 0.05 does not mean the effect is large or important for patients.
Always pair relative effects with absolute effects (ARR, NNT, events per 1,000).
Confidence interval width shows imprecision — a GRADE downgrade domain.
Pre-specify minimally important differences where guidelines exist.

1. Two different meanings of 'significant'

In statistics, 'significant' usually means the p-value falls below a pre-specified alpha (commonly 0.05) — the data are incompatible with a null effect assuming the model is correct. This is a mathematical statement about chance, not about patient benefit.

In clinical practice, 'significant' means the effect is large enough to change management, burden, or outcomes that patients care about. A 2-point reduction on a 100-point scale may be statistically significant with n = 5,000 but clinically meaningless if the minimally important difference is 10 points.

Medical students must use both lenses: statistical inference (is there an effect?) and clinical interpretation (does it matter?). Examiners and journal clubs reward this distinction.

Statistical significance → unlikely due to chance (hypothesis testing framework)
Clinical significance → meaningful for patients, clinicians, or policy
They can align, diverge, or conflict — always report both perspectives
Null results can be clinically important (ruling out harm or benefit)

2. Limits of p-values alone

P-values depend on sample size. Large trials can detect trivial differences as statistically significant; small trials may miss clinically important effects (low power).

P-values do not measure effect size. p = 0.001 does not mean a large clinical benefit — only that the observed effect is precise enough to reject the null.

Multiple comparisons inflate false positives without adjustment. Subgroup p-values in post-hoc analyses are hypothesis-generating, not confirmatory.

Note: Do not write 'highly significant' to mean 'very important clinically'. Use 'statistically significant' for p-values and 'clinically important' only with effect size and context.

3. Confidence intervals bridge statistics and clinical judgement

A 95% confidence interval shows the range of effects compatible with the data. If the interval spans from trivial benefit to large harm, the estimate is imprecise — even if the point estimate is statistically significant.

For binary outcomes, ask whether the CI for risk difference or NNT includes values that would change practice. For continuous outcomes, compare the CI to the minimally clinically important difference (MCID).

In meta-analysis, wide pooled CIs trigger GRADE imprecision downgrades — statistical and clinical significance frameworks connect here.

Scenario	Statistical picture	Clinical picture
Large n, tiny effect	p < 0.05, narrow CI	Clinically trivial — may not change practice
Small n, moderate effect	p > 0.05, wide CI	Potentially important but imprecise — need more data
CI excludes null and MCID	p < 0.05	Statistically and clinically persuasive
CI excludes null but inside MCID	p < 0.05	Statistically significant, clinically uncertain

4. Absolute effects: ARR, NNT, and events per 1,000

Relative risk reduction sounds compelling ('50% reduction!') but hides baseline risk. Absolute risk reduction (ARR) and number needed to treat (NNT) translate effects into patient terms.

Example: RR 0.75 with control event rate 4% → absolute reduction 1% → NNT 100. Whether NNT 100 is clinically worthwhile depends on treatment cost, harms, and alternatives.

GRADE Summary of Findings tables present absolute effects per 1,000 for this reason — read them before the abstract conclusion.

ARR = control event rate minus intervention event rate
NNT = 1 / ARR (when ARR > 0) — round sensibly and state confidence interval if available
NNH for harms — same logic for adverse events
Baseline risk must match your patient's context — indirectness if different population

Tip: When authors report only relative effects, calculate absolute effects using the control group's event rate from the paper — show this working in coursework.

5. Minimally clinically important differences (MCID)

For continuous outcomes (pain, disability scores, quality of life), field-specific MCIDs define the smallest change patients perceive as beneficial. Compare mean differences to MCID, not only to zero.

MCIDs are context-specific: the same point change on a scale may matter in chronic pain but not in a surrogate laboratory marker.

Anchor-based and distribution-based methods exist for deriving MCIDs — cite established thresholds from guidelines or validation studies when available.

Pain scales — field-specific MCID thresholds (e.g. 10–20 mm on VAS in some conditions)
HbA1c — guideline targets matter more than any p-value
Surrogate endpoints — statistically significant change may not predict patient outcomes
Patient-reported outcomes — prioritise MCID over p-value in shared decisions

6. Clinical vs statistical significance in meta-analysis

A pooled odds ratio whose diamond barely excludes 1 may be statistically significant but clinically weak — especially if absolute event rates are low.

Heterogeneity complicates interpretation: a significant pooled effect may not apply to all patient subgroups represented in included trials.

Forest plots show statistical precision; clinical importance requires absolute effect translation and GRADE certainty — see our meta-analysis and SoF guides.

7. Appraisal workflow for journal club

Read the primary outcome result. Note p-value and 95% CI. Calculate or locate absolute effect and NNT. Compare continuous outcomes to MCID or guideline threshold. State whether you would change practice for a typical patient in your setting.

Document imprecision: if the CI includes both meaningful benefit and meaningful harm, the trial is inconclusive for practice regardless of p-value.

For systematic reviews, repeat per outcome in the SoF table — do not rely on a single significant secondary endpoint in the abstract.

Identify primary outcome and pre-registration status
Record effect estimate, 95% CI, and p-value
Translate to absolute effect or compare to MCID
Assess precision — would another trial plausibly shift conclusion?
State clinical bottom line separate from statistical conclusion

Frequently asked questions

Can a result be clinically significant but not statistically significant?

Yes. A moderate mean difference exceeding MCID with a wide confidence interval including zero is potentially clinically important but statistically inconclusive — often due to small sample size. More data may be needed.

Is p < 0.05 always clinically meaningful?

No. With large samples, trivial effects become statistically significant. Always interpret effect size, absolute measures, and clinical context — not the p-value alone.

What is the difference between ARR and RRR?

Relative risk reduction (RRR) is the proportional reduction in risk. Absolute risk reduction (ARR) is the actual difference in event rates between groups. ARR drives NNT and patient-facing decisions.

How does GRADE relate to clinical significance?

GRADE imprecision domain downgrades certainty when confidence intervals are wide or include values on both sides of clinical thresholds. High statistical significance with wide CI spanning trivial and large effects may still be low certainty.

Should I report NNT in my dissertation?

Yes when you discuss binary primary outcomes from trials or meta-analyses. Show calculation from control event rate and relative effect. Discuss whether NNT is acceptable given harms and alternatives.

Interactive walkthroughs and quizzes load when JavaScript is enabled — the checklist and tables above are fully readable without it.