Lesson 5
Caching — Why & When
Learn how caching speeds up applications, reduces database load, and the tradeoffs involved in cache design.
20 min read · Beginner
The speed layer
Imagine you run a popular news site. Every time someone loads the homepage, your server queries the database for the latest headlines. With 10 users, that is fine. With 10,000 users per second, your database buckles — even though the headlines only change every few minutes.
Caching solves this by storing copies of frequently accessed data in a fast storage layer. Instead of hitting the database every time, the server checks the cache first. If the data is there (a cache hit), it returns immediately. If not (a cache miss), it fetches from the database, stores the result in the cache, and returns it.
Caching is one of the highest-impact optimizations in system architecture. It is also one of the trickiest, because cached data can become stale or inconsistent with the source of truth.
Cache layers in a system
Caches exist at every level, each trading freshness for speed:
Each layer is faster but further from the source of truth. Data flows left to right on a cache miss.
| Layer | Speed | TTL typical | What gets cached |
|---|---|---|---|
| Browser | Fastest | Hours to days | CSS, JS, images, fonts |
| CDN | Very fast | Minutes to hours | Static assets, API responses |
| App cache (Redis) | Fast | Seconds to minutes | Query results, sessions |
| DB buffer pool | Fast | Managed by DB | Frequently accessed pages |
Cache-aside pattern
The most common application-level caching strategy:
The application manages the cache explicitly — check cache, on miss read DB and populate.
Read path: Check cache → if miss, query DB → store in cache → return. Write path: Update DB → delete cache entry (invalidate).
Caching strategies compared
| Strategy | How it works | When to use | Tradeoff |
|---|---|---|---|
| Cache-aside | App manages cache reads/writes | General purpose, most apps | App code handles cache logic |
| Read-through | Cache loads from DB on miss | When you want simpler app code | Cache layer must know about DB |
| Write-through | Write to cache AND DB together | When consistency is critical | Slower writes |
| Write-behind | Write to cache, async flush to DB | High write volume | Risk of data loss on crash |
In practice
For a typical web app, start with cache-aside + TTL:
- Cache user profiles for 5 minutes
- Cache product listings for 60 seconds
- Invalidate on write (delete the cache key when data changes)
Cache hit rate matters
The effectiveness of a cache is measured by its hit rate — the percentage of requests served from cache.
A 90% hit rate means only 1 in 10 requests reaches the database. That is a 10x reduction in database load.
Hit rate depends on:
- Cache size — can you fit the hot data in memory?
- TTL — how long before cached entries expire?
- Access patterns — is the same data requested repeatedly?
- Eviction policy — when full, what gets removed? (LRU is common)
Cache Hit Rate Simulator
Adjust the sliders to see how cache size, TTL, and request volume affect hit rates.
50%
Hit rate
50
Cache hits/s
50
Cache misses/s
The thundering herd problem
When a popular cache entry expires, thousands of requests can hit the database simultaneously — all seeing a cache miss at the same time. This is the thundering herd (or cache stampede).
Solutions:
| Technique | How it works |
|---|---|
| Probabilistic early expiration | Refresh cache before TTL expires, staggered randomly |
| Mutex / lock | Only one request rebuilds cache; others wait |
| Stale-while-revalidate | Serve stale data while refreshing in background |
| Longer TTL + event invalidation | Don’t rely on expiration; invalidate on data change |
For a homepage with 10,000 req/sec, a naive 60-second TTL can cause a database spike every minute. Add jitter to TTLs or use background refresh.
Cache invalidation: the hard problem
There is a famous quote: “There are only two hard things in distributed systems: cache invalidation and naming things.” When underlying data changes, your cache still holds the old version.
Common approaches:
- TTL-based expiration — entries expire after a set time. Simple but data can be stale until expiration.
- Write-through — update cache whenever you update the database. Consistent but adds write latency.
- Cache-aside with invalidation — on write, update DB and delete cache entry. Next read repopulates.
In practice
Before adding caching, measure. Is the database actually your bottleneck? Use query logs and APM tools. Cache the top 5 slowest or most frequent queries first. A single well-placed cache often matters more than caching everything.
Key takeaways
- Caching stores frequently accessed data in fast storage to reduce load on slower systems
- Caches exist at every layer — browser, CDN, application, and database
- Cache-aside is the default pattern for application-level caching
- Hit rate determines effectiveness — optimize for high hit rates on hot data
- Thundering herd is a real problem — use jitter, locks, or background refresh
- Cache invalidation is hard — plan your strategy before you need it
Common mistakes
- Caching everything — unique, one-off queries see no benefit
- Setting TTL too long — users see stale data; too short defeats the purpose
- Ignoring cache stampedes — popular entries expiring simultaneously can crash your database
- Caching without monitoring — track hit rate, memory usage, and eviction count
Go deeper
- Redis Documentation — the most popular in-memory cache
- AWS Caching Best Practices — caching strategies in cloud architectures
- ByteByteGo: How Discord Stores Billions of Messages — real-world caching at scale