Turn the subject off and demand crowds into a substitute — the holdout looks better than it should, and lift gets understated.
Crowds out. Naïve lift understates. Condition out the substitute.
Substitution happens across channels: when the subject is suppressed in the holdout, demand reroutes to a substitute publisher or campaign. In causal-inference terms, this is interference (a SUTVA violation across cells) — one cell's outcome depends on the other cell's treatment because the substitute responds to total demand. The substitute runs hotter in holdout than in BAU, the holdout's outcome inflates, and naïve Δ understates the subject channel's effect. Concrete substitute pairs: TV display + YouTube pre-roll, paid-search brand + organic SEO, Meta + TikTok, email + push.
A clean two-cell test assumes everything except the subject channel is held constant. In reality, when the subject is turned off in the holdout, demand doesn't disappear — it gets crowded into a substitute channel. Users who would have come through the subject reach the same outcome via a sister publisher, app push, or organic search. The substitute therefore runs hotter in the holdout than in BAU, the holdout looks better than it should, and the measured Δ is understated.
Why it's wrong: credits the substitute's holdout surge against the subject channel, masking the true lift.
What it answers: only what the channel adds net of cannibalization — useful for some portfolio questions, misleading for channel-level decisions.
What it represents: the lift attributable specifically to the subject channel, with the substitute's crowd-out conditioned out.
Use it for: cross-channel comparison, channel-specific budget reallocation, anything that requires single-channel attribution.
Use the substitute channel's traffic — clicks, impressions, exposed users — as a covariate. Run a regression on the cell-level outcome, with cell assignment and substitute exposure both as predictors. The coefficient on cell is the subject channel's lift, holding substitute exposure constant. This recovers the direct effect only under one assumption: nothing unobserved drives both substitute exposure and the outcome. The substitute is a mediator, so a latent demand shock hitting both would open a collider path and bias the coefficient — name that assumption, it is rarely free.
-
1
Identify the substitute(s). Channels or publishers with overlapping audiences whose intensity moved during the test window. Get cell-level traffic counts for each.
-
2
Confirm the substitute is causally upstream of the outcome — not a downstream consequence of the subject channel. Conditioning on a downstream effect introduces collider bias and will pull the lift toward zero.
-
3
Stratify or regress. Either compute lift within strata of substitute exposure and re-aggregate, or run a single regression with substitute traffic as a covariate. Stratification is more robust to non-linearity.
-
4
Widen the CIs. Conditioning consumes degrees of freedom and adds covariate uncertainty. Bootstrap or use the regression's full variance, not the naïve cell-Δ variance.
β_cell is the direct effect of the subject channel —
holding substitute exposure fixed. Naïve Δ is the total effect under
interference — letting the substitute respond. Neither is universally "right."
For a channel-kill decision (the substitute WILL surge when you turn off subject), total
effect is what you'll actually see. For an isolation analysis ("what does this channel do by
itself?"), direct effect is the answer. Conditional lift isn't "the truth" so much as the
direct-effect estimand — pick the one your decision needs, not the one that sounds most causal.
The collider warning in checklist step 2 is correct but the structural concern is subtler:
the substitute is a mediator when its exposure responds to the subject's
status. Conditioning on a mediator removes the indirect causal pathway and biases
β_cell toward a smaller-than-total effect. This isn't bias in a statistical
sense — it's a clean estimate of a different quantity (the direct effect). Make the choice
explicit, not implicit.
Substitution is one direction (channels compete). Complementarity is the other (channels enable each other): display awareness drives later search conversion, brand TV lifts paid-search CTR, email reactivation feeds push engagement. Same SUTVA-violation mechanic; opposite sign. When complementarity dominates, naïve Δ overstates the subject's standalone effect because turning off subject in holdout also weakens the complement's performance. The figure shows one direction — interference is structural to multi-channel media, not a one-sided edge case.
Within a single channel's holdout test (this figure), naïve Δ understates because of crowd-out. Across channels, when you sum per-channel lifts (each from its own holdout), the aggregate overstates because each channel claims credit for the substitutable demand the others would have captured. Use the right framing for the question: single-channel = within (understates without conditioning); portfolio = across (overstates without joint modeling).
β_cell from conditioning, with the limitations above). Portfolio sizing across many
channels? Neither; you need joint modeling or a factorial design that randomizes both subject and
substitute together.
The rigorous fix is a factorial design. Randomize both subject and substitute independently — four cells: (subject on/off) × (substitute on/off). Directly identifies the direct effects, the interaction, and the substitution coefficient. The main effects cost about the same sample as a two-cell test — each is estimated by averaging over the other factor (hidden replication), so they use the full sample; the extra budget buys a well-powered interaction term, which needs more. The real barrier is operational: you have to be able to turn off the substitute too, which means coordinating two channel owners — so most teams condition post-hoc instead. When stakes warrant — channel-deprecation decisions, budget reallocations > 30% of channel spend — a 2×2 design is the only way to get a causally clean answer. Otherwise, name the estimand you reported (direct or total), state the substitute you conditioned on (or didn't), and remember that the naïve number is also a real quantity — it's what the channel actually adds in the world where the substitute responds.