Skip to main content
Cover/08 · Substitution
M Measurement Field Guide All topics
08
Figure 08 · Interference · Substitution

Turn the subject off and demand crowds into a substitute — the holdout looks better than it should, and lift gets understated.

Crowds out. Naïve lift understates. Condition out the substitute.

Substitution happens across channels: when the subject is suppressed in the holdout, demand reroutes to a substitute publisher or campaign. In causal-inference terms, this is interference (a SUTVA violation across cells) — one cell's outcome depends on the other cell's treatment because the substitute responds to total demand. The substitute runs hotter in holdout than in BAU, the holdout's outcome inflates, and naïve Δ understates the subject channel's effect. Concrete substitute pairs: TV display + YouTube pre-roll, paid-search brand + organic SEO, Meta + TikTok, email + push.

The bias mechanism

A clean two-cell test assumes everything except the subject channel is held constant. In reality, when the subject is turned off in the holdout, demand doesn't disappear — it gets crowded into a substitute channel. Users who would have come through the subject reach the same outcome via a sister publisher, app push, or organic search. The substitute therefore runs hotter in the holdout than in BAU, the holdout looks better than it should, and the measured Δ is understated.

Clean — substitutes balanced
Hypothetical
Substitute channel runs at the same intensity in both cells (no crowd-out). Δ is the subject channel's true causal effect.
Holdout · subject OFF
substitute
subject (off)
BAU · subject ON
substitute
subject (on)
Measured Δ +12%   (true)
Biased — substitute crowds into holdout
In practice
With the subject off, demand reroutes to the substitute — it surges in the holdout. The holdout's outcome rises; the measured Δ shrinks.
Holdout · subject OFF
substitute · ↑ crowd-out
subject (off)
BAU · subject ON
substitute · ↓ suppressed
subject (on)
Naïve Δ +6%   (understated)
Naïve lift vs conditional lift
Naïve lift
As measured
Direct Δ between cells, ignoring that the substitute ran hotter in the holdout. Holdout is propped up by the substitute; the gap looks small.
+6% naïve Δ · understated by crowd-out

Why it's wrong: credits the substitute's holdout surge against the subject channel, masking the true lift.

What it answers: only what the channel adds net of cannibalization — useful for some portfolio questions, misleading for channel-level decisions.

How to condition out the substitute

Use the substitute channel's traffic — clicks, impressions, exposed users — as a covariate. Run a regression on the cell-level outcome, with cell assignment and substitute exposure both as predictors. The coefficient on cell is the subject channel's lift, holding substitute exposure constant. This recovers the direct effect only under one assumption: nothing unobserved drives both substitute exposure and the outcome. The substitute is a mediator, so a latent demand shock hitting both would open a collider path and bias the coefficient — name that assumption, it is rarely free.

Naïve Δ = E[Y | BAU] − E[Y | Holdout]
Conditional model Y = α + βcell·cell + βsub·substitute_traffic + ε
Lift estimate subject lift = βcell   (at fixed substitute exposure)
  • 1
    Identify the substitute(s). Channels or publishers with overlapping audiences whose intensity moved during the test window. Get cell-level traffic counts for each.
  • 2
    Confirm the substitute is causally upstream of the outcome — not a downstream consequence of the subject channel. Conditioning on a downstream effect introduces collider bias and will pull the lift toward zero.
  • 3
    Stratify or regress. Either compute lift within strata of substitute exposure and re-aggregate, or run a single regression with substitute traffic as a covariate. Stratification is more robust to non-linearity.
  • 4
    Widen the CIs. Conditioning consumes degrees of freedom and adds covariate uncertainty. Bootstrap or use the regression's full variance, not the naïve cell-Δ variance.
Direct vs total effect

β_cell is the direct effect of the subject channel — holding substitute exposure fixed. Naïve Δ is the total effect under interference — letting the substitute respond. Neither is universally "right." For a channel-kill decision (the substitute WILL surge when you turn off subject), total effect is what you'll actually see. For an isolation analysis ("what does this channel do by itself?"), direct effect is the answer. Conditional lift isn't "the truth" so much as the direct-effect estimand — pick the one your decision needs, not the one that sounds most causal.

Mediator vs collider

The collider warning in checklist step 2 is correct but the structural concern is subtler: the substitute is a mediator when its exposure responds to the subject's status. Conditioning on a mediator removes the indirect causal pathway and biases β_cell toward a smaller-than-total effect. This isn't bias in a statistical sense — it's a clean estimate of a different quantity (the direct effect). Make the choice explicit, not implicit.

Complementarity is the mirror

Substitution is one direction (channels compete). Complementarity is the other (channels enable each other): display awareness drives later search conversion, brand TV lifts paid-search CTR, email reactivation feeds push engagement. Same SUTVA-violation mechanic; opposite sign. When complementarity dominates, naïve Δ overstates the subject's standalone effect because turning off subject in holdout also weakens the complement's performance. The figure shows one direction — interference is structural to multi-channel media, not a one-sided edge case.

Within vs across channels

Within a single channel's holdout test (this figure), naïve Δ understates because of crowd-out. Across channels, when you sum per-channel lifts (each from its own holdout), the aggregate overstates because each channel claims credit for the substitutable demand the others would have captured. Use the right framing for the question: single-channel = within (understates without conditioning); portfolio = across (overstates without joint modeling).

Takeaway Pollution is leakage within the cell. Substitution is crowd-out across channels — a SUTVA-violating interference effect. The bias isn't a defect to fix — it's a feature of multi-channel media. Decide which estimand matches your decision. Channel-kill or channel-add? Total effect (naïve Δ, including substitute response). Isolating one channel's intrinsic contribution? Direct effect (β_cell from conditioning, with the limitations above). Portfolio sizing across many channels? Neither; you need joint modeling or a factorial design that randomizes both subject and substitute together.

The rigorous fix is a factorial design. Randomize both subject and substitute independently — four cells: (subject on/off) × (substitute on/off). Directly identifies the direct effects, the interaction, and the substitution coefficient. The main effects cost about the same sample as a two-cell test — each is estimated by averaging over the other factor (hidden replication), so they use the full sample; the extra budget buys a well-powered interaction term, which needs more. The real barrier is operational: you have to be able to turn off the substitute too, which means coordinating two channel owners — so most teams condition post-hoc instead. When stakes warrant — channel-deprecation decisions, budget reallocations > 30% of channel spend — a 2×2 design is the only way to get a causally clean answer. Otherwise, name the estimand you reported (direct or total), state the substitute you conditioned on (or didn't), and remember that the naïve number is also a real quantity — it's what the channel actually adds in the world where the substitute responds.

Methods note

Numbers throughout are illustrative. The balanced-vs-crowded substitute split (60/60 vs 90/50) and the +12% true / +6% naïve deltas are chosen to make the crowd-out visible; real substitution depends on channel overlap, demand elasticity, and test design.

Further reading
  • Localized Shift vs Overall Causal Impact
  • Adstock & attribution window considerations
  • Test Design · Power, α, p-value, tails
  • Superiority vs Non-inferiority