Why I’m Teaching the Boring Parts of AI

The real innovation in enterprise AI isn’t happening in the models. It’s happening in the orchestration, evaluation, and governance layers.

May 19, 2026

AI, these days, is less about science and more about organizational systems (though I enjoy the science-heavy part, too). Image generated with Leonardo AI

When I transitioned from theoretical particle physics to enterprise AI, I expected the challenges to be primarily mathematical. I anticipated spending my days fine-tuning neural network architectures, optimizing hyperparameters, and debating the merits of different attention mechanisms.

I was wrong.

The mathematics of modern AI are undeniably fascinating, but they are increasingly commoditized. The foundation models available via API today are more capable than anything a typical enterprise could build from scratch. The real challenge—the problem that actually prevents organizations from realizing value from AI—is not the model itself. It is everything that surrounds the model.

It is the orchestration. It is the evaluation. It is the governance.

In short, it is the “boring” parts of AI. And these boring parts are exactly what I have decided to focus on teaching.

The Glamour vs. The Reality

The AI industry has a glamour problem. The discourse is dominated by discussions of parameter counts, benchmark scores, and the existential implications of artificial general intelligence. This is the exciting, visionary side of the field.

But when you sit down with a Chief Actuary at a major insurance firm, or a Head of Compliance at a global bank, the conversation shifts dramatically. They do not care about the latest benchmark on a generic reasoning task. They care about auditability. They care about data privacy. They care about whether a system will hallucinate a regulatory filing that could result in a multi-million dollar fine.

They are grappling with the reality of deploying probabilistic systems into deterministic business environments.

This is where the glamour fades and the hard engineering begins. Building a robust AI system requires solving problems that are decidedly unsexy but absolutely critical. As the IFoA GenAI Working Party points out, the complexity and autonomy of a network of AI agents add a new dimension to risks, making them harder to manage and requiring dynamic governance frameworks.

Orchestration: The Unsung Hero

Consider orchestration. A single prompt to an LLM is rarely sufficient to complete a complex enterprise task. Real-world workflows require multiple steps: retrieving data from disparate sources, validating that data, processing it through various models, handling errors, and formatting the final output.

Designing these multi-step agentic workflows is an exercise in systems engineering. It requires choosing the right orchestration pattern—whether a sequential chain, a parallel fan-out, or a complex graph-based approach. It demands robust error handling, retry logic, and state management.

When an orchestration layer is designed well, it is invisible. The system simply works. But getting to that point requires rigorous architectural thinking that goes far beyond writing a clever prompt. As Gary Marcus highlights, [autonomous agents are often vulnerable to subtle but dangerous tool-chaining attacks](https://garymarcus.substack.com/p/breaking-autonomous-agents-are-a), proving that orchestration is not just about functionality, but security [2]. If an agent is granted access to a database and an email client without strict guardrails, a simple prompt injection can turn a helpful assistant into a massive security breach.

Evaluation: Beyond the Vibe Check

Then there is evaluation. How do you know if an AI system is actually performing well?

In the early days of generative AI, evaluation often consisted of a “vibe check”—running a few queries and subjectively deciding if the answers looked reasonable. This is entirely inadequate for enterprise deployment.

Decision-grade evaluation requires a multi-dimensional framework. We must measure not just accuracy, but reliability, latency, and cost. We must build automated test suites that evaluate output characteristics against golden datasets. We must implement regression testing to ensure that an update to a prompt or a model does not silently degrade performance on edge cases.

Building a comprehensive evaluation scorecard is tedious work. It requires defining specific, measurable metrics and establishing acceptable thresholds for each. But without it, deploying an AI system is essentially flying blind. You cannot improve what you cannot measure, and in the context of enterprise AI, failing to measure performance accurately is a dereliction of duty.

Governance: The Prerequisite for Trust

Finally, there is governance. In regulated industries, an AI system must be auditable. If a system makes a recommendation or generates a report, the organization must be able to trace exactly how that output was produced. What data was used? What logical steps were taken?

This is where causal AI becomes essential. Unlike purely correlative models, causal models provide a transparent chain of reasoning. They allow us to understand not just what the system predicted, but why.

Implementing robust governance also means establishing clear ownership, defining escalation paths, and ensuring compliance with data privacy regulations. It is the bureaucratic scaffolding that makes trust possible. The IFoA GenAI Working Party emphasizes that static governance frameworks are doomed to fail if they do not recognize how the capabilities of these agents might change over time. Governance must be as dynamic and adaptable as the systems it seeks to control.

The Shift from Development to Operations

The transition from building a prototype to running a production system is a fundamental shift in mindset. It is the shift from development to operations.

In development, the goal is to prove that something is possible. In operations, the goal is to ensure that it happens reliably, every single time, regardless of the input or the environment. This requires a different set of skills, a different set of tools, and a different set of priorities.

It requires embracing the boring parts.

Embracing the Boring

I have come to realize that the most valuable skill in enterprise AI today is not the ability to train a model. It is the ability to operationalize one.

The teams that will win in this era are not necessarily the ones with the most advanced algorithms. They are the ones that master the boring parts. They are the ones that build resilient orchestration layers, rigorous evaluation frameworks, and transparent governance structures.

This is the work we do at Wangari. We focus on the infrastructure that makes AI reliable and auditable for complex regulatory reporting.

And it is exactly why I am so passionate about teaching these concepts. The industry needs fewer prompt engineers and more AI systems engineers. We need professionals who understand how to bridge the gap between a fragile prototype and a robust production system.

The boring parts of AI may not make headlines, but they are the foundation upon which the future of enterprise technology will be built.

Meanwhile, at Wangari

If you are ready to master the “boring” (but essential) parts of enterprise AI, my upcoming course is designed for you.

**From Demo to Production: Operationalize an Enterprise-Grade Agentic AI Reporting System** is a 6-week intensive program that focuses entirely on the operational realities of AI deployment.

We will not spend time debating model architectures. Instead, we will dive deep into orchestration patterns, decision-grade evaluation metrics, automated testing, and governance frameworks. By the end of the course, you will have developed a complete Production Blueprint for your own AI system.

The cohort begins on June 9th. If you want to move your AI initiatives out of the lab and into production, I invite you to join us.

Enrollment is open now at GenAI Academy.

Reads of the Week

Emerging Risks of Agentic AI in Actuarial Work by Nnamdi Odozi and Josh Blake: An exploration of the ethical considerations and practical implications of deploying AI agents in highly regulated environments. The authors provide a sobering look at how the autonomy of these systems introduces entirely new categories of risk that traditional governance frameworks are ill-equipped to handle.
Breaking: Autonomous Agents are a Shitshow by Gary Marcus: A critical look at the vulnerabilities of autonomous agents, particularly regarding tool-chaining attacks and security nightmares. Marcus argues that until we solve the fundamental security flaws inherent in granting LLMs access to external tools, deploying them in enterprise environments is dangerously irresponsible.
Can analysis ever be automated? by Benn Stancil: A thoughtful discussion on the challenges and catch-22s of AI analysts and the automation of data analysis. Stancil explores the paradox that while AI can generate code and run queries faster than humans, it still lacks the contextual understanding required to know which questions are actually worth asking.

Discussion about this post

Ready for more?