☕ Welcome to The Coder Cafe! Today, we discuss cache use cases. When we think about caching, it’s pretty frequent to focus on where it happens; for example, client-side, server-side, or in a CDN. Yet, there’s a more important question that should be answered first: What’s the use case? In this post, we will break down two common cache use cases: reducing latency and improving capacity. And we will see why the line between the two is blurrier than it seems. Get cozy, grab a coffee, and let’s begin!
A Cache for Latency
Latency is the time between when a request is sent and when a response is received. A cache for latency exists to reduce the average latency of a service.
The classic access pattern looks like this1:
We check the cache first.
On a cache hit, we return the data directly without touching the backend.
On a miss, we go to the backend, return the result, and store it in the cache for future requests.
Why does this reduce latency? The cache keeps data in memory, which is significantly faster to read from than a remote database that may involve network round-trips, disk I/O, and query execution. On a hit, all of that work is skipped.
In Soft vs. Hard Dependency, we introduced two kinds of dependencies:
A soft dependency is a non-critical dependency for the service to operate properly.
A hard dependency is a critical dependency for the service to operate properly.
A cache for latency is a soft dependency. If the cache becomes unavailable, requests fall through to the backend. The system keeps working, just at a higher latency. Keep this in mind, because it’s the key difference we’ll come back to.
A Cache for Capacity
A cache for capacity exists to serve higher throughput than the backend can handle on its own.
The access pattern is identical to the latency case: cache first, then backend on a miss.
So what actually makes these two different?
The difference is not in the code; it’s in what the backend can absorb. In a capacity scenario, the backend would be overwhelmed if it received all the traffic directly. The cache absorbs a large portion of the requests, keeping the backend load manageable.
This changes the nature of the dependency. If the cache goes down, the backend is suddenly hit with all the traffic it was previously shielded from. Whether the system survives depends on the backend’s own capacity. If the backend can scale fast enough, the cache is still a soft dependency: there will be a rough period, but the system recovers. If the backend can’t cope with the load, the cache becomes a hard dependency. Without it, the system fails.
When a Latency Cache Becomes a Capacity Cache
Here’s a question worth asking: if the access pattern for both types is identical, how do we know which one we have?
In most cases, caches are introduced to reduce latency. But here’s what can happen over time:
Our system is stable.
Cache hit rates are high, backend load is low.
Traffic grows. The backend load stays low because the cache is absorbing most of it. Nothing breaks. No alerts fire.
Six months pass. Nothing has changed, no code, no configuration, no architecture decision. And yet the cache is no longer reducing latency. It’s keeping the backend alive.
The cache didn’t change. The code didn’t change. The system grew around the cache, and the cache quietly became load-bearing.
The same risk appears when a cache goes cold. For example:
A migration to a new cache instance
A data format change that requires purging existing entries
A cache restart after maintenance
Any of these can produce a large wave of cache misses in a short window. If we were running a latency cache, we would see higher latency for a while. If we were running a capacity cache, we would see a traffic spike that the backend can’t absorb.
The unsettling part is that the code is identical in both cases. The difference only becomes visible at failure time.
How to Manage This Risk
The root problem is that teams often don’t know which type of cache they’re running. They built it for latency, and that’s still how they think about it, even as the system outgrows that assumption.
A few approaches help here:
Periodically ask: could the backend handle the current traffic if the cache were completely removed? Load testing without the cache, or estimating backend capacity against current traffic levels, gives you a concrete answer.
Treat cache hit rate as a meaningful operational signal, not just a performance metric. A sustained drop in hit rate means the backend is absorbing more traffic than usual. If that trend continues, it’s an early warning that you may be drifting toward a capacity problem.
When migrating a cache or invalidating a large portion of its data, warm the new cache before routing live traffic to it. This prevents a cold-start burst from hitting the backend all at once.
Finally, once we recognize that a cache is operating as a capacity cache, we should treat it accordingly. It’s no longer optional infrastructure and it deserves proper alerting and a clear plan for what happens if it goes down.
Summary
A cache for latency serves data from memory to reduce average response time. It is a soft dependency: if unavailable, the system degrades in latency but continues to work.
A cache for capacity absorbs traffic that the backend couldn’t handle on its own. It can be a soft or a hard dependency, depending on whether the backend can absorb the load without it.
Both types share the same access pattern, which makes them easy to confuse.
A latency cache can silently become a capacity cache as traffic grows, without any code change.
When a capacity cache goes cold or fails, the backend can be overwhelmed. Hit rate monitoring, periodic load testing, and cache warming are practical ways to manage this risk.
Resources
More From the Distributed Systems Category
Sources
Explore Further
Even though variations exist.





It's a very interesting framing and one I had never come across before - this definitely helps in evolving my mental model on how to qualify a cache. Most of the caches I deal with are now capacity-caches; probably some of them became so collaterally rather than intentionally.