Skip to main content
Cover/07 · Pollution
M Measurement Field Guide All topics
07
Figure 07 · Non-compliance · Pollution

The holdout cell is rarely truly clean. The raw lift you measure is a diluted answer.

Two flavors of leakage. Raw lift understates. Intrinsic lift restores.

Pollution happens within the test cell: marketing leaks into the holdout (or fails to reach part of BAU). In causal-inference language, this is non-compliance with assignment. Raw lift estimates the ITT (Intent-to-Treat) effect — the impact of assignment, not of treatment. Intrinsic lift is the CACE or LATE (Complier / Local Average Treatment Effect) — the effect among compliers. The standard correction intrinsic = raw / (1−p) is the Wald estimator from instrumental-variables analysis with binary treatment.

Two flavors of pollution
Holdout pollution
Detectable
Users assigned to holdout still receive marketing — cross-device exposure, retargeting from prior visits, ads on shared logins. Their conversions partly reflect marketing impact that should have been excluded.
Holdout · marketing OFF
! ! !
~33% still touched visible in traffic logs ✓

Detection: we can see touches in the traffic data — the holdout user appears in click / impression logs even though they shouldn't have been served. The polluted share is observable.

BAU exclusion
Hidden
Users assigned to BAU never receive marketing — frequency-capped out, ad-blocked, missed by the bidder, opted out. They sit in the “marketing on” cell but never actually got marketed to.
BAU · marketing ON
? ? ?
unknown share missed no log entry ✗

Detection: there is no event to count. Absence of a touch does not generate a record, so the share is not directly observable at the user level.

Raw lift vs intrinsic lift
Raw lift
As measured
The lift you read directly off the test, with both cells as they actually were — holdout including the polluted users.
+8% diluted lift · what the test reported

What it represents: the marketing effect you can expect under real-world operating conditions — including the leakage that's part of how the channel actually runs.

Use it for: in-market performance forecasts, year-over-year comparisons, anything where you want today's lived performance, not the platonic ideal.

Adjustment method

Let p = share of holdout users actually touched by marketing (read from traffic logs). Under random assignment, a touched holdout user behaves like a BAU user. The raw lift is therefore a weighted average of the intrinsic lift and zero — diluted by p. This leans on the IV exclusion restriction: assignment moves conversions only through actual marketing exposure, never on its own.

Observed (raw) raw lift = (1 − p) · intrinsic lift + p · 0
Solve for intrinsic intrinsic lift = raw lift / (1 − p)
Worked example raw = 8% · pollution p = 33%  →  intrinsic = 8% / 0.67 ≈ 12%
  • 1
    Estimate p from holdout users who appear in marketing traffic logs (touch tables, click streams). Use the same channel definition the test used.
  • 2
    Compute raw lift directly from the cells as assigned (intent-to-treat).
  • 3
    Divide by (1 − p) for the point estimate. For the CI, use the delta method: Var(intrinsic) ≈ Var(raw)/(1−p)² + raw² · Var(p̂)/(1−p)⁴. Raw 8 ± 1pp with p = 33 ± 5pp → intrinsic ≈ 12 ± 1.7pp.
  • 4
    Sanity check: if p is large or noisy, the inflation factor explodes — report both raw and intrinsic, never just one.
Dose equivalence

The Wald correction assumes a touched holdout user got the same dose as a BAU user — same impressions, same frequency, same timing. Usually false. Pollution is typically one stray cross-device touch, not the full ad-stock that BAU users accumulate. When polluted dose is lighter than BAU dose, raw / (1−p) understates the true intrinsic effect (the polluted users responded weakly because they got less marketing, not because the marketing didn't work).

Per-channel pollution rate

Typical industry ranges: Connected TV 30–50% (household-level targeting → device sharing); walled-garden display 10–30% (cross-device + lookalike); open programmatic 5–15% (ID fragmentation actually helps the holdout); branded paid search 0–5% (intent-driven, hard to "miss" the holdout). The 33% example here is mid-range — use your channel's actual logs, not a default.

Both sides at once

When holdout pollution (rate p) and BAU exclusion (rate q) happen together, the combined adjustment is intrinsic = raw / (1 − p − q) — the same Wald form, with the denominator now the gap in treated share between the cells: (1 − q) − p. Running 2SLS with assignment as the instrument for actual treatment gives exactly this, plus proper standard errors, and handles both sides jointly. q is undetectable from logs, so it always requires a stated assumption (e.g., "we model 10% silent exclusion based on platform delivery rates"). Document it.

Takeaway Raw lift describes how the channel performs today; intrinsic lift describes what the channel can do per exposure. Both numbers are honest — they answer different questions. The gap between them (intrinsic − raw) is recoverable budget: the lift you'd capture with cleaner targeting infrastructure. Use raw for forecasts that keep current operating leakage, intrinsic for projecting scale to a clean population. Detectable pollution (holdout-side) is adjustable with traffic logs; hidden pollution (BAU-side) requires stated assumptions, so document them.

The "clean holdout" is structurally impossible now. Cross-device exposure, walled gardens, Apple's App Tracking Transparency, third-party cookie deprecation — these have made zero pollution unattainable on most digital channels. Intrinsic lift is no longer measurable; it is estimable with assumptions. Practical detection methods: matching holdout IDs against platform "delivered" logs (most reliable, requires deterministic ID), cookie-level cross-device matching (degraded post-ATT), probabilistic matching for unauthenticated traffic (noisy). When p is genuinely hard to pin down, propensity-matched untouched analysis is the modern alternative: among holdout users with similar touch-propensity to BAU, compare the matched-untouched subset and skip estimating p altogether.

Methods note

Numbers throughout are illustrative. The +8% raw / +12% intrinsic example, the p = 33% pollution rate, and the per-channel ranges are the simplest case that surfaces the dilution; real pollution rates vary by channel, identity infrastructure, and test design.

Further reading
  • Localized Shift vs Overall Causal Impact
  • Adstock & attribution window considerations
  • Test Design · Power, α, p-value, tails
  • Superiority vs Non-inferiority