Lesson 5

Caching — Why & When

Learn how caching speeds up applications, reduces database load, and the tradeoffs involved in cache design.

20 min read · Beginner

The speed layer

Imagine you run a popular news site. Every time someone loads the homepage, your server queries the database for the latest headlines. With 10 users, that is fine. With 10,000 users per second, your database buckles — even though the headlines only change every few minutes.

Caching solves this by storing copies of frequently accessed data in a fast storage layer. Instead of hitting the database every time, the server checks the cache first. If the data is there (a cache hit), it returns immediately. If not (a cache miss), it fetches from the database, stores the result in the cache, and returns it.

Caching is one of the highest-impact optimizations in system architecture. It is also one of the trickiest, because cached data can become stale or inconsistent with the source of truth.

Cache layers in a system

Caches exist at every level, each trading freshness for speed:

Cache layers

Each layer is faster but further from the source of truth. Data flows left to right on a cache miss.

Layer	Speed	TTL typical	What gets cached
Browser	Fastest	Hours to days	CSS, JS, images, fonts
CDN	Very fast	Minutes to hours	Static assets, API responses
App cache (Redis)	Fast	Seconds to minutes	Query results, sessions
DB buffer pool	Fast	Managed by DB	Frequently accessed pages

Cache-aside pattern

The most common application-level caching strategy:

Cache-aside pattern

The application manages the cache explicitly — check cache, on miss read DB and populate.

Read path: Check cache → if miss, query DB → store in cache → return. Write path: Update DB → delete cache entry (invalidate).

Caching strategies compared

Strategy	How it works	When to use	Tradeoff
Cache-aside	App manages cache reads/writes	General purpose, most apps	App code handles cache logic
Read-through	Cache loads from DB on miss	When you want simpler app code	Cache layer must know about DB
Write-through	Write to cache AND DB together	When consistency is critical	Slower writes
Write-behind	Write to cache, async flush to DB	High write volume	Risk of data loss on crash

In practice

For a typical web app, start with cache-aside + TTL:

Cache user profiles for 5 minutes
Cache product listings for 60 seconds
Invalidate on write (delete the cache key when data changes)

Cache hit rate matters

The effectiveness of a cache is measured by its hit rate — the percentage of requests served from cache.

A 90% hit rate means only 1 in 10 requests reaches the database. That is a 10x reduction in database load.

Hit rate depends on:

Cache size — can you fit the hot data in memory?
TTL — how long before cached entries expire?
Access patterns — is the same data requested repeatedly?
Eviction policy — when full, what gets removed? (LRU is common)

Cache Hit Rate Simulator

Adjust the sliders to see how cache size, TTL, and request volume affect hit rates.

Cache size (MB)50 MB

TTL (seconds)30s

Requests per second100/s

50%

Hit rate

Cache hits/s

Cache misses/s

The thundering herd problem

When a popular cache entry expires, thousands of requests can hit the database simultaneously — all seeing a cache miss at the same time. This is the thundering herd (or cache stampede).

Solutions:

Technique	How it works
Probabilistic early expiration	Refresh cache before TTL expires, staggered randomly
Mutex / lock	Only one request rebuilds cache; others wait
Stale-while-revalidate	Serve stale data while refreshing in background
Longer TTL + event invalidation	Don’t rely on expiration; invalidate on data change

For a homepage with 10,000 req/sec, a naive 60-second TTL can cause a database spike every minute. Add jitter to TTLs or use background refresh.

Cache invalidation: the hard problem

There is a famous quote: “There are only two hard things in distributed systems: cache invalidation and naming things.” When underlying data changes, your cache still holds the old version.

Common approaches:

TTL-based expiration — entries expire after a set time. Simple but data can be stale until expiration.
Write-through — update cache whenever you update the database. Consistent but adds write latency.
Cache-aside with invalidation — on write, update DB and delete cache entry. Next read repopulates.

In practice

Before adding caching, measure. Is the database actually your bottleneck? Use query logs and APM tools. Cache the top 5 slowest or most frequent queries first. A single well-placed cache often matters more than caching everything.

Key takeaways

Caching stores frequently accessed data in fast storage to reduce load on slower systems
Caches exist at every layer — browser, CDN, application, and database
Cache-aside is the default pattern for application-level caching
Hit rate determines effectiveness — optimize for high hit rates on hot data
Thundering herd is a real problem — use jitter, locks, or background refresh
Cache invalidation is hard — plan your strategy before you need it

Common mistakes

Caching everything — unique, one-off queries see no benefit
Setting TTL too long — users see stale data; too short defeats the purpose
Ignoring cache stampedes — popular entries expiring simultaneously can crash your database
Caching without monitoring — track hit rate, memory usage, and eviction count

Go deeper

Redis Documentation — the most popular in-memory cache
AWS Caching Best Practices — caching strategies in cloud architectures
ByteByteGo: How Discord Stores Billions of Messages — real-world caching at scale