Teaching the Machine to Play Ball
How soccer reveals the art of exploratory data analysis — and what finance can learn from it.
Every dataset has a heartbeat, but you only notice it once you stop forcing it to speak.
In 2016, Leicester City won the Premier League at 5000-to-1 odds. 5000. To. 1.
Analysts called it luck. Commentators called it magic. But when we later looked at the numbers — expected goals, pass recovery positions, pressing intensity — a different story appeared. Leicester hadn’t defied probability. They’d simply played a different game, one that the models hadn’t yet learned to see.
That’s when I realized something fundamental: data is not static. It flows, drifts, collides, and cooperates — much like players on a pitch. You can’t understand it from the stands; you have to step onto the field.
And before you try to predict what will happen, you have to learn to see what is happening.
That’s what this piece — and, in many ways, the book I’m writing — is about: teaching the machine to play ball.
Incidentally (though not co-incidentally, pun intended), I’m coauthoring a book on this exact topic with my delightful colleagues Haipeng Gao, Weining Shen and Guanyu Hu! But this piece is meant to be good to read as standalone, whether it inspires you to read the book or not.
The Analyst’s Playing Field
Every game needs a field. Every analysis does too.
A soccer pitch is 105 by 68 meters — measured, trimmed, and bounded. An analyst’s field is a Jupyter notebook: a controlled environment where logic, creativity, and curiosity meet. Before a match begins, the referee inspects the turf. Before analysis begins, we clean the data, set up the environment, and check for uneven ground.
It’s easy to underestimate how much of analysis is infrastructure: installing Python, setting up virtual environments, loading libraries. Yet these invisible steps define the quality of everything that follows.
When the tools are aligned, the flow becomes effortless — like a team that knows where each player will be before the ball arrives.
In soccer, structure doesn’t stifle creativity; it enables it. The same is true in data work. When you respect the boundaries of your field, you gain the freedom to play beautifully inside it.
Seeing the Game Before Playing It
Before a coach designs tactics, they watch the match tape. Before a data scientist builds a model, they explore the data. Exploratory Data Analysis—EDA—is our pre-match reconnaissance, the art of understanding the terrain before we decide how to attack.
In soccer analytics, data comes in layers. Match-level data offers the scoreboard view: who played, who won, and how often. Player-level data zooms closer, revealing performance patterns—passes completed, distance covered, goals scored. Event-level data brings us right onto the pitch, tracing every touch, tackle, and shot.
Each level carries its own rhythm. Match data tells us what happened, player data tells us who did it, and event data tells us how it unfolded. The parallels to finance are uncanny: market-level trends, portfolio-level performance, and transaction-level detail. You can’t predict returns from a single trade, just as you can’t infer a team’s style from a single pass, yet each micro-moment matters.
EDA teaches patience. Instead of rushing to fit models, you listen first: what’s missing, what’s noisy, what’s beautifully consistent. You draw histograms not to decorate a report but to watch the data breathe. Scatterplots become maps of intention, revealing where the flow of play bends or breaks.
EDA, at its best, is not a procedure but a conversation—a slow unfolding of understanding between analyst and data. It’s the moment when curiosity replaces control.
When Features Become Tactics
Once you’ve learned to see, the next step is to decide how to act. In soccer, that means shaping tactics. In analytics, it means engineering features.
Feature engineering isn’t about adding columns to a spreadsheet; it’s about encoding context—transforming what we believe matters into variables the model can understand. In soccer, we might quantify momentum through rolling averages of goals scored, account for home advantage by separating performance by venue, or measure shot quality through expected-goals values.
Finance plays the same game under different names. Rolling volatility windows mirror rolling form metrics. Regime effects stand in for home advantage. Expected returns play the role of xG. Each is a hypothesis, rendered in numbers.
A good feature is an act of translation. It captures an intuition—that recent performance matters more than distant history, that conditions shape behavior, that quality often hides behind quantity—and gives it mathematical voice. In that sense, feature engineering is the tactical layer of data science: a 4-3-3 formation written in code, every variable aware of its role in the larger strategy.
When the Model Starts to Think
Then comes the moment when the patterns we’ve shaped begin to take on a life of their own. We feed our carefully designed features into a model—perhaps a neural network—and watch it learn.
A neural network resembles a team in formation. Inputs are players; weights are the chemistry that links them; activations are their decisions in motion. Each layer processes the flow, passing information forward until, at the final whistle, a prediction emerges.
Traditional models are like strict managers: if X then Y. Neural networks are more like dynamic squads. They improvise, adapt, discover. They find patterns we didn’t specify—connections we might never have seen ourselves. In doing so, they remind us that intelligence isn’t about rules; it’s about relationships.
The beauty of deep learning is not that it imitates the human brain, but that it forces us to re-examine our own. When the model spots something we missed, it invites humility. What did it notice that we ignored? What bias or blind spot do our human heuristics still carry? Teaching a machine to play ball is ultimately about learning to think in motion ourselves—to reason probabilistically, contextually, with the same fluid awareness that great players bring to the pitch.
The Bottom Line: Pattern Before Prediction
Every analysis, whether in sport or finance, follows the same quiet rhythm. First we prepare the field, then we explore, then we design tactics, and finally we play.
Modeling is often described as automation, but it’s really choreography. It’s the orchestration of relationships, the transformation of data into dialogue. Each stage mirrors a human virtue: discipline, curiosity, creativity, and courage.
To teach a machine to play ball is to blend all four. It demands rigor but rewards playfulness. It turns logic into motion, algorithms into art.
As I’ve been writing Soccer Analytics with Machine Learning, I’ve come to see it not as a sports manual but as a meditation on how we think with data—how we can combine structure with intuition, play with precision, and curiosity with care.
We analyze because we care. We model because we wonder. And if we do both well, the data begins to move again.
A rough and un-edited early release of our book is already out now! Check it out here.
Reads of The Week
For Wangari Digest readers tracking the future of sports and data, this piece by Viniit Mehta spotlights how AI-driven analytics are reshaping performance and investment trends. The $91M acquisition of IMPECT by Catapult Sports shows the rising value of proprietary sports data. Meanwhile, Indian startups like StepOut (football) and Rally Vision (squash) are making elite-level insights accessible, positioning themselves as key players in the fast-growing, exit-friendly sports tech space.
This research-backed piece by Nicole Williams explores how machine learning could transform project and portfolio management. Instead of just streamlining tasks, ML enables smarter forecasting, dynamic resource allocation, and early risk detection—turning portfolios into adaptive systems. It’s a compelling vision of how data and foresight could replace gut instinct and hindsight in how big projects get done.
If you are aiming to break into data engineering or sharpen your systems thinking, this piece by Dmitry Anoshin is a goldmine. It breaks down how to tackle data system design interviews with clarity and structure, offering a six-layer framework that covers everything from data sources to ML integration. Whether you’re prepping for your next role or building real systems, it’s a practical, well-organized guide to thinking like a senior data engineer.




Thank you for the shoutout! I appreciate it.