From Prediction to Intervention: The Missing Half of ML
You've built a model that predicts. Now what? The architecture of decision-making looks nothing like the architecture of forecasting
There’s a moment in every ML project where the architecture changes. You stop thinking about features and start thinking about decisions. You stop asking “what patterns exist in the data?” and start asking “what interventions are available to us?”
This shift is subtle but profound. It changes everything: the math you use, the data you collect, the way you evaluate success. Yet most ML curricula, most ML teams, most ML pipelines never make this shift. They optimize for prediction accuracy and call it done.
This is a mistake. And it’s costing organizations billions in missed opportunities.
The Prediction-Intervention Gap
Let’s be clear about what prediction does and doesn’t do.
A predictive model answers: “What is likely to happen?” It looks at historical data, learns patterns, and forecasts future outcomes. If you build a model that predicts customer churn with 90% accuracy, you’ve solved the prediction problem. You know which customers are likely to leave.
But here’s the gap: knowing who will churn is not the same as knowing what to do about it.
Should you offer a discount? A personalized message? A premium feature? Different customers respond to different interventions. Some are price-sensitive. Others value recognition. Still others are simply done with your product. A churn prediction model doesn’t distinguish between these groups. It just says: “This customer will churn.”
The intervention question is different. It asks: “If we do X, what will happen?” This requires causal reasoning, not just pattern recognition. It requires understanding not just what is correlated with outcomes, but what causes outcomes.
This distinction matters everywhere. A healthcare system predicts which patients will have adverse events—but doesn’t know which intervention prevents them. A marketing team predicts which customers will convert—but doesn’t know which message drives conversion. A nonprofit predicts which families will escape poverty—but doesn’t know which program actually helps them.
The prediction-intervention gap is the reason billions are spent on programs that don’t work, campaigns that don’t convert, and treatments that don’t help. We optimize for forecasting accuracy and assume that translates to better decisions. It doesn’t.
Why the Architectures Differ
The architectural difference between prediction and intervention is fundamental.
Predictive models optimize for accuracy on held-out data. You train on historical examples, validate on a test set, and deploy the model that minimizes your chosen loss function. The objective is clear: predict the outcome as accurately as possible.
The data structure is simple: features and labels. You have X (features) and y (outcome), and you learn the relationship between them. Whether that relationship is causal or correlational doesn’t matter for prediction. If shoe size predicts height, that’s fine—you’re not trying to change height by changing shoe size.
Causal models optimize for intervention effectiveness. You’re not trying to predict what will happen under the current conditions. You’re trying to understand what will happen if you change something. The objective is different: identify which interventions work for which people.
The data structure is more complex. You need to understand not just features and outcomes, but the causal relationships between them. You need to distinguish between:
Confounders (variables that affect both treatment and outcome)
Mediators (variables through which treatment affects outcome)
Colliders (variables affected by both treatment and outcome)
This is why causal graphs matter. A feature matrix tells you correlations. A causal graph tells you causal relationships. And causal relationships are what determine whether an intervention will work.
Consider a simple example. You’re building a model to predict whether someone will buy a premium subscription. Your features include: age, income, previous purchase history, email engagement, and time since signup.
In a predictive model, all these features are equivalent. They’re just inputs to your model. If email engagement is highly predictive, great—use it.
In a causal model, you need to ask: why is email engagement predictive? Is it because engaged users are more likely to buy (direct effect)? Or is it because engaged users are already interested in your product, and that interest is what drives both engagement and purchase (confounding)?
If it’s the former, then showing more emails might increase purchases. If it’s the latter, then showing more emails won’t help—the underlying interest is what matters.
This is the causal reasoning that predictive models skip. And it’s essential for intervention.
The Tools: Uplift, HTE, and Policy Learning
Once you understand that prediction and intervention require different architectures, you can start building the right tools.
Uplift models are the bridge between prediction and intervention. Instead of predicting the outcome under current conditions, uplift models predict the effect of an intervention on an individual. They answer: “If we treat this person, how much better will their outcome be compared to if we don’t treat them?”
Mathematically, this is the treatment effect for an individual:
It’s the difference between the outcome with treatment and without.
Uplift models are typically built using one of several approaches:
Two-model approach: Train separate models for treated and control groups, then compute the difference
Single-model approach: Train one model with treatment as a feature, then compute the difference between predictions with T=1 and T=0
Causal forests: Use random forests with modifications to estimate heterogeneous treatment effects
X-learner, S-learner, T-learner: Meta-learners that combine multiple models to estimate treatment effects
The key insight is that different people respond differently to interventions. This is called heterogeneous treatment effect (HTE). Some customers respond to discounts. Others respond to personalization. Still others don’t respond to anything. An uplift model captures these differences.
Once you have uplift estimates for each person, you can make better decisions. Instead of treating everyone the same way, you can target interventions to people who will actually respond to them.
Policy learning takes this one step further. Instead of just estimating treatment effects, policy learning optimizes which intervention to apply to each person. It answers: “For each person, which intervention will produce the best outcome?”
This is a different optimization problem. You’re not maximizing accuracy. You’re maximizing the total value of interventions across your population. You might have multiple interventions available (discount, personalization, premium feature, nothing), and you want to assign each person to the intervention that will help them most.
Policy learning algorithms include:
Q-learning: Learn the value of each action in each state, then choose the action with highest value
Doubly robust learning: Combine outcome regression and propensity score weighting to estimate optimal policies
Contextual bandits: Sequentially learn which interventions work for which contexts
The beauty of policy learning is that it’s adaptive. As you learn which interventions work, you can update your policy. You’re not stuck with a static model—you’re continuously optimizing based on new data.
Causal Graphs vs Feature Matrices
Here’s where the conceptual shift becomes concrete.
A feature matrix is what you’re used to working with. Rows are observations, columns are features. You feed it to your model, and out comes a prediction. Simple, scalable, works great for prediction.
A causal graph is different. It’s a directed acyclic graph (DAG) where nodes are variables and edges represent causal relationships. It explicitly encodes your assumptions about how the world works.
For example, consider a simple causal graph:
Age → Income
Income → Purchase
Email_Engagement → PurchaseThis says: Age affects Income. Income affects Purchase. Email Engagement affects Purchase.
But now consider a more complex graph:
Age → Income
Income → Email_Engagement
Income → Purchase
Email_Engagement → PurchaseThis says: Age affects Income. Income affects both Email Engagement and Purchase. Email Engagement also affects Purchase.
In this second graph, Email Engagement is a mediator—it’s part of the causal pathway from Income to Purchase. If you want to increase purchases, you could either increase income or increase email engagement. But they’re not independent—email engagement is partially determined by income.
This distinction matters for intervention. If you’re trying to increase purchases, you need to know whether to target income, email engagement, or both. A feature matrix doesn’t tell you this. A causal graph does.
Building causal graphs requires domain knowledge. You need to think about your problem, talk to domain experts, and encode your assumptions about causality. This is harder than just throwing data at a model. But it’s essential for getting intervention right.
Do-Calculus: A Light Touch
For those interested in the formal framework, do-calculus provides a mathematical language for reasoning about interventions.
The core idea is simple: there’s a difference between observation and intervention.
When you observe data, you’re looking at the world as it is. When you intervene, you’re changing something and seeing what happens.
In causal inference, we use the do operator to represent intervention.
The formula above means “the probability of Y if we intervene to set X to x.” This is different from this one:
which means “the probability of Y given that we observe X=x.”
The difference is subtle but crucial. If you observe that people who take a medication recover faster, that doesn’t mean the medication causes recovery. Maybe sick people take the medication, and they recover anyway. Observation confuses correlation with causation.
But if you intervene—randomly assign people to take the medication or not—then you can estimate the causal effect.
Do-calculus provides rules for computing do-probabilities from observational data when you have a causal graph. The three rules allow you to:
Remove observations that are independent of the intervention
Swap interventions and observations under certain conditions
Remove interventions under certain conditions
These rules let you reason formally about when you can estimate causal effects from observational data, and when you need randomized experiments.
For most practitioners, you don’t need to dive deep into do-calculus. But it’s useful to know it exists—it’s the formal foundation for causal inference.
From Theory to Practice
So how do you actually build this?
Step 1: Define your causal graph. What are the variables in your system? How do they relate causally? This requires domain knowledge and conversations with stakeholders. Don’t skip this step.
Step 2: Identify confounders. What variables affect both your treatment and outcome? You need to measure and control for these. If you don’t, your treatment effect estimates will be biased.
Step 3: Estimate treatment effects. Use an uplift modeling approach to estimate how each person responds to your intervention. Causal forests are a good starting point—they’re flexible, interpretable, and don’t require you to specify a parametric model.
Step 4: Validate your estimates. If you have randomized data (A/B tests), use it to validate your causal estimates. If you only have observational data, use sensitivity analysis to understand how robust your estimates are to violations of your assumptions.
Step 5: Optimize your policy. Once you have treatment effect estimates, use policy learning to decide which intervention to apply to each person. Start simple (e.g., treat the top 20% by estimated effect) and iterate.
Step 6: Monitor and iterate. Deploy your policy, measure outcomes, and update your causal graph and treatment effect estimates as you learn more.
This is harder than building a predictive model. It requires more domain knowledge, more careful thinking about causality, and more validation. But the payoff is enormous: instead of predicting what will happen, you’re optimizing what to do.
Why This Matters
The prediction-intervention gap isn’t just a technical problem. It’s a strategic problem.
Every organization has a prediction problem and an intervention problem. You want to predict which customers will churn and decide which intervention will prevent churn. You want to predict which patients will have adverse events and decide which treatment prevents them. You want to predict which programs will succeed and decide which to fund.
Most organizations solve the prediction problem and ignore the intervention problem. They build accurate models and assume that translates to better decisions. It doesn’t.
The organizations that win are the ones that solve both. They predict what will happen, but more importantly, they understand what interventions will change it. They optimize not for forecasting accuracy, but for decision quality.
This is why causal inference is becoming essential. It’s not a niche academic tool anymore. It’s a competitive advantage.
The architecture of decision-making looks nothing like the architecture of forecasting. But once you understand the difference, you can build systems that actually work—systems that don’t just predict the future, but shape it.
Further Reading
For those interested in diving deeper:
Causal Inference: The Mixtape by Scott Cunningham (free online) — Excellent introduction to causal inference with practical examples
Causal Inference for the Brave and True by Matheus Facure — Practical guide to causal inference in Python
Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning by Künzel et al. — Technical deep dive into meta-learners for HTE



