Strata Academy

Cochrane heterogeneity explained – I², τ², and prediction intervals

Interpreting statistical heterogeneity in meta-analysis, when pooling is misleading, and GRADE inconsistency downgrades

Quick answer

Heterogeneity means study results differ more than chance alone would predict. Use I², τ², and prediction intervals together – never I² alone – and investigate clinical differences before trusting a pooled effect.

1. What is heterogeneity?

In meta-analysis, heterogeneity is variability among study results beyond what would be expected from sampling error alone. It asks a simple question: are these studies estimating the same underlying effect?

The Cochrane Handbook treats heterogeneity investigation as mandatory before interpreting a pooled estimate or claiming subgroup effects. Ignoring heterogeneity is one of the most common errors in student meta-analysis coursework.

Heterogeneity can arise from true clinical differences (populations, doses, settings), methodological differences (design, bias), or chance in small reviews. Your job as appraiser is to decide which explanation fits.

A statistically significant pooled effect can still be misleading when heterogeneity is high and unexplained – the mean may not represent any single patient population.

2. Clinical vs statistical heterogeneity

Clinical heterogeneity describes differences in PICO elements: populations, interventions, comparators, outcomes, timing, and setting. Statistical heterogeneity is the mathematical expression of variation in effect estimates after accounting for chance.

High statistical heterogeneity often signals clinical heterogeneity, measurement differences, or differential bias – but I² alone does not tell you which. You must return to the study table.

Two reviews can show similar I² with opposite clinical implications: one pools different doses of the same drug (may be fixable), another pools different interventions entirely (often not poolable).

When appraising a published review, check whether authors defined a priori which clinical differences would be acceptable for pooling.

3. I², τ², and Q statistic

Cochran's Q tests whether heterogeneity differs from zero. With few studies, Q has low power – a non-significant Q is not proof of homogeneity.

I² estimates the proportion of total variability due to heterogeneity rather than chance, expressed as 0–100%. It is widely reported but frequently misinterpreted in isolation.

τ² (tau-squared) estimates the between-study variance on the effect scale. It feeds random-effects weights and prediction intervals. Report τ² alongside I² when possible.

Historical cut-offs (25%, 50%, 75% for low, moderate, high I²) are rough guides only. Context – number of studies, direction of effects, clinical similarity – matters more than a threshold.

4. Prediction intervals

A 95% prediction interval estimates where a future study's true effect might lie – it is wider than the confidence interval for the pooled mean because it includes between-study variance.

Cochrane recommends reporting prediction intervals in random-effects meta-analyses when clinically relevant. They answer: if we run another trial tomorrow, what effect might we see?

If the prediction interval crosses the null (or a clinically important threshold), the pooled mean effect may be unhelpful for decision-making even when statistically significant.

Students should quote both CI and prediction interval when critiquing reviews in coursework – examiners increasingly expect this distinction.

5. Investigating heterogeneity

Pre-specified subgroup analyses (age, dose, risk of bias, geography), meta-regression with extreme caution, and separate syntheses when interventions are clinically distinct are standard approaches.

Post-hoc subgroups discovered after seeing the forest plot are exploratory. Label them as hypothesis-generating, not confirmatory.

Sensitivity analyses – excluding high risk-of-bias studies, leave-one-out analyses, fixed vs random effects – test robustness of conclusions.

When heterogeneity remains unexplained and clinically important, narrative synthesis or presenting ranges may be more honest than a single pooled number.

6. Heterogeneity and GRADE

Unexplained or clinically important inconsistency can downgrade GRADE certainty for the outcome under the inconsistency domain.

Document whether review authors investigated heterogeneity, whether the pooled estimate remains clinically meaningful, and whether prediction intervals support the conclusion.

A review can report a statistically significant meta-analysis yet receive low GRADE certainty if inconsistency is serious and unexplained.

Pair GRADE inconsistency judgements with the forest plot and study characteristics table – not with I² alone.

7. Common mistakes

Reporting I² without τ², Q, or prediction interval.

Pooling clearly different interventions because software allowed it.

Treating non-significant Q as proof of homogeneity.

Ignoring opposing study directions because the diamond sits favourably.

Using subgroup analyses without multiplicity caution or protocol pre-specification.

8. StrataResearch and meta-analysis statistics

Meta-analysis manuscripts receive heterogeneity and robustness feedback aligned to Cochrane concepts, alongside AMSTAR 2, PRISMA, ROBIS, and GRADE pathways.

Compare automated heterogeneity commentary to your manual forest plot reading – discrepancies often reveal outcomes or subgroups you had not prioritised.

Interactive version (quizzes, walkthroughs) loads when JavaScript is enabled.