Bayesian bridge — test → observations · Marketing Measurement Field Guide

Figure 06 · Bayesian bridge

A test measures reality at one moment. Observations help us infer if it has shifted.

Prior. Likelihood. Posterior. The bridge from then to now.

A measured iROAS is a point in time. MMM (Marketing Mix Modeling) regresses outcomes on spend over time — its β coefficient is the marginal effect per dollar implied by the daily data. Bayesian updating fuses the test result (set as a prior on β) with the daily likelihood to estimate today's iROAS. The industry name for this procedure is calibration: the test calibrates the MMM. The bridge assumes the MMM β estimates the same quantity the test estimated — true under linear, non-saturated, non-interacting spend; less true the further you stretch from those.

Step 1 · two readings of reality

Prior belief

Measured · Jan 2026

Causal RCT — holdout vs BAU. A direct measurement of iROAS at the time of the test.

15.0× iROAS · 95% CI [12 — 18]

Measured · causal Slow · authoritative · stale by design

Recent observations

Observed · Apr 2026

Daily readouts of GMV ≈ β × spend + organic . β is the implied iROAS each day.

~10× implied β · spread [7 — 13]

Observed · in-flight Fast · noisy · timely

Step 2 · the rationale

The test was a snapshot of reality. Has reality shifted since?

The Jan test gave us a true number for that moment. Today — months later — the question is whether the underlying iROAS is still the same. To answer it, we ask a simple question of the new data: “if nothing had changed, how likely would we be to see what we're seeing?” If it would be very unlikely, that rarity is the signal that reality has moved.

Mechanically, we score every candidate truth — iROAS = 8, 10, 12, 15, 18, … — by how well it fits two things at once: the recent data (the likelihood) and the Jan test (the prior). Whichever scenario is most compatible with both wins — not the one that best fits the data alone, which would simply track the recent ~10×. The result isn't a single number, it's a distribution of how compatible each scenario is with the test and the data together.

Scenario A unlikely

Reality unchanged · iROAS = 15×

Under this world, daily β should cluster near 15. We're seeing it cluster near 10. The observed data is rare if this scenario were true.

P(iROAS = 15 | data) — low

Scenario B most plausible

Partial shift · iROAS ≈ 12×

Compatible with both sides — close enough to the prior that the test still informs us, close enough to recent days that the data is not surprising.

P(iROAS = 12 | data) — high

Scenario C unlikely

Full collapse · iROAS = 7×

Daily β would cluster well below 10. Possible, but the prior strongly disagrees and the recent spread doesn't reach this far either.

P(iROAS = 7 | data) — low

Method

Bayesian regression, sampled with MCMC. The MMM doesn't pick a single iROAS — it draws thousands of plausible coefficient sets, scoring each by prior × likelihood, and keeps what survives. The posterior is the cloud of values consistent with both the test and the recent data.

Step 3 · the answer

Posterior · today's truth

Estimated · Apr 2026

Bayesian update fuses prior & likelihood. Tighter, shifted, current.

~12× iROAS · CI [10.5 — 13.5]

Estimated current truth What MMM acts on today

P(iROAS | data) ∝ P(data | iROAS) · P(iROAS)

posterior ∝ likelihood · prior

Each candidate iROAS is weighted by how much we already believed it (prior) and how well it predicts the recent data (likelihood). The peak of that product is the posterior estimate.

Prior weight is a knob

The posterior shown above isn't the data's answer — it's your answer for a particular prior weight. A tight prior (high confidence in the Jan test) pulls the posterior near 15×; a weak prior lets daily data dominate and the posterior sits near 10×. Best practice: use the test posterior (point estimate plus uncertainty) as the prior, not just the point estimate. And decay the prior weight with elapsed time — a 3-month-old test on a stable channel is still strong; a 12-month-old test on a fast-moving channel is mostly noise.

When to re-test, not average

Weighted averaging is the right move when the test and the daily β are consistent within their uncertainty. When they disagree by more than that uncertainty allows — intervals barely overlapping, or more than ~2 pooled SE apart (e.g., test says 15×, daily says 5×) — the gap usually signals something broke: marketing strategy changed, the channel saturated, seasonality shifted, or the MMM specification is wrong. And note the asymmetry: the prior is a causal RCT estimate while the likelihood is an observational MMM coefficient — so a large gap can mean the β is biased (omitted variables, spend–demand collinearity), not that reality moved. Posterior = 10× is then a statistical compromise without causal meaning — investigate the discrepancy before acting on the average.

Likelihood mechanics

"Spread [7–13]" above is the SE of the implied β under the daily regression — not the range of single-day observations. If your channel's spend doesn't vary much over time, the MMM has no identification and the likelihood is nearly flat — the posterior just sits at the prior. The posterior reported here (≈12×, CI [10.5, 13.5]) is roughly the normal-normal conjugate result for these inputs; the actual MCMC samples from a higher-dimensional posterior including covariate coefficients, saturation, and adstock parameters.

Takeaway A test gives you a snapshot of reality at that point in time; new observations show if and by how much reality might have shifted. The prior anchors the answer with causal authority. The daily data pulls it toward the present. Bayesian regression — explored via MCMC — finds the iROAS region most compatible with both sides. Three guard conditions: (1) the prior weight is your choice, decay it with time; (2) if the disagreement is large, re-test instead of averaging; (3) the bridge is only valid when MMM β estimates the same quantity as test iROAS.

The practitioner pipeline. Run a causal incrementality test for the channel and metric you care about. Pass the test posterior — both point and CI — into your MMM as an informative prior on the corresponding β, with a weight that down-weights as the test ages. Run the MMM weekly or monthly; the posterior updates with new daily data. Set a re-test trigger: when the posterior drifts more than, say, 2 SE from the prior, the test is stale enough to commission a new one. The Bayesian bridge isn't a permanent substitute for testing — it's the machine that lets you act between tests, and the alarm bell that says when the next test is due.