MLOps#Microservices #Architecture #ML Systems #Infrastructure

Microservices Were a Mistake for ML Systems

The industry cargo-culted microservice architecture into ML platforms and created distributed systems nightmares. Monoliths are the answer.

Misha Lubich

July 18, 20252 min read

Microservices Were a Mistake for ML Systems

How We Got Here

Around 2020, someone at a conference said "ML systems should be microservices" and the entire industry nodded along without thinking. The logic seemed sound: separate services for feature computation, model training, inference, and monitoring. Clean boundaries. Independent scaling. DevOps best practices.

It was a disaster.

I've spent the last two years migrating ML systems from microservice architectures back to monoliths at three different companies. Every time, the team's velocity increased 3-5x and infrastructure costs dropped 40-60%.

ML Microservices Nightmare

Why Microservices Fail for ML

ML workloads are fundamentally different from web services:

Data locality matters. ML operations are data-intensive. Shipping gigabytes of feature data across network boundaries for every inference call is insane.
Tight coupling is inherent. Your feature computation, model, and post-processing are intimately coupled. Pretending they're independent services doesn't make them so.
Debugging distributed inference is a nightmare. When your model output is wrong, is it the feature service? The serialization? The model? The post-processing? With microservices, answering this takes hours. With a monolith, it takes minutes.
Cold start kills latency. Kubernetes pods spinning up separate inference containers adds 5-30 seconds of latency that no user will tolerate.

The Majestic ML Monolith

Here's what a well-designed ML monolith looks like:

One service that handles feature computation, inference, and post-processing
Horizontal scaling at the service level (not the component level)
Model files loaded at startup, hot-swapped in memory
Feature computation done in-process with vectorized operations

It's boring. It's simple. It works. And your team can actually debug it without a PhD in distributed systems.

#Microservices #Architecture #ML Systems #Infrastructure

Back to all posts

MLOps1 min1k views

Silent Tool Failures Are the Quiet Killer of Agent Reliability

The model says the row was updated. The audit log disagrees. Until you treat tool I/O like distributed systems, agents will keep shipping confident lies.

April 5, 2026Read more →

MLOps3 min1k views

AI Evaluation Is the Hardest Unsolved Problem in Engineering

We've gotten incredibly good at building AI systems. We're still terrible at knowing whether they actually work. Evals are the bottleneck nobody's fixing.

September 1, 2025Read more →

MLOps2 min1k views

Your ML Pipeline Is Technical Debt Disguised as Innovation

That fancy Kubeflow/Airflow/Prefect ML pipeline you built? It's the most expensive, fragile, and unnecessary code in your entire stack.

August 28, 2025Read more →