Arch Tutor

Lesson 5

Caching — Why & When

Learn how caching speeds up applications, reduces database load, and the tradeoffs involved in cache design.

20 min read · Beginner

The speed layer

Imagine you run a popular news site. Every time someone loads the homepage, your server queries the database for the latest headlines. With 10 users, that is fine. With 10,000 users per second, your database buckles — even though the headlines only change every few minutes.

Caching solves this by storing copies of frequently accessed data in a fast storage layer. Instead of hitting the database every time, the server checks the cache first. If the data is there (a cache hit), it returns immediately. If not (a cache miss), it fetches from the database, stores the result in the cache, and returns it.

Caching is one of the highest-impact optimizations in system architecture. It is also one of the trickiest, because cached data can become stale or inconsistent with the source of truth.

Cache layers in a system

Caches exist at every level, each trading freshness for speed:

Cache layers

miss

miss

miss

Browser

CDN

App Cache

Database

Each layer is faster but further from the source of truth. Data flows left to right on a cache miss.

LayerSpeedTTL typicalWhat gets cached
BrowserFastestHours to daysCSS, JS, images, fonts
CDNVery fastMinutes to hoursStatic assets, API responses
App cache (Redis)FastSeconds to minutesQuery results, sessions
DB buffer poolFastManaged by DBFrequently accessed pages

Cache-aside pattern

The most common application-level caching strategy:

Cache-aside pattern

1. check

2a. hit → return

2b. miss → query

3. data

4. store

App Server

Redis

Database

The application manages the cache explicitly — check cache, on miss read DB and populate.

Read path: Check cache → if miss, query DB → store in cache → return. Write path: Update DB → delete cache entry (invalidate).

Caching strategies compared

StrategyHow it worksWhen to useTradeoff
Cache-asideApp manages cache reads/writesGeneral purpose, most appsApp code handles cache logic
Read-throughCache loads from DB on missWhen you want simpler app codeCache layer must know about DB
Write-throughWrite to cache AND DB togetherWhen consistency is criticalSlower writes
Write-behindWrite to cache, async flush to DBHigh write volumeRisk of data loss on crash

In practice

For a typical web app, start with cache-aside + TTL:

  • Cache user profiles for 5 minutes
  • Cache product listings for 60 seconds
  • Invalidate on write (delete the cache key when data changes)

Cache hit rate matters

The effectiveness of a cache is measured by its hit rate — the percentage of requests served from cache.

A 90% hit rate means only 1 in 10 requests reaches the database. That is a 10x reduction in database load.

Hit rate depends on:

  • Cache size — can you fit the hot data in memory?
  • TTL — how long before cached entries expire?
  • Access patterns — is the same data requested repeatedly?
  • Eviction policy — when full, what gets removed? (LRU is common)

Cache Hit Rate Simulator

Adjust the sliders to see how cache size, TTL, and request volume affect hit rates.

50%

Hit rate

50

Cache hits/s

50

Cache misses/s

The thundering herd problem

When a popular cache entry expires, thousands of requests can hit the database simultaneously — all seeing a cache miss at the same time. This is the thundering herd (or cache stampede).

Solutions:

TechniqueHow it works
Probabilistic early expirationRefresh cache before TTL expires, staggered randomly
Mutex / lockOnly one request rebuilds cache; others wait
Stale-while-revalidateServe stale data while refreshing in background
Longer TTL + event invalidationDon’t rely on expiration; invalidate on data change

For a homepage with 10,000 req/sec, a naive 60-second TTL can cause a database spike every minute. Add jitter to TTLs or use background refresh.

Cache invalidation: the hard problem

There is a famous quote: “There are only two hard things in distributed systems: cache invalidation and naming things.” When underlying data changes, your cache still holds the old version.

Common approaches:

  • TTL-based expiration — entries expire after a set time. Simple but data can be stale until expiration.
  • Write-through — update cache whenever you update the database. Consistent but adds write latency.
  • Cache-aside with invalidation — on write, update DB and delete cache entry. Next read repopulates.

In practice

Before adding caching, measure. Is the database actually your bottleneck? Use query logs and APM tools. Cache the top 5 slowest or most frequent queries first. A single well-placed cache often matters more than caching everything.

Key takeaways

  • Caching stores frequently accessed data in fast storage to reduce load on slower systems
  • Caches exist at every layer — browser, CDN, application, and database
  • Cache-aside is the default pattern for application-level caching
  • Hit rate determines effectiveness — optimize for high hit rates on hot data
  • Thundering herd is a real problem — use jitter, locks, or background refresh
  • Cache invalidation is hard — plan your strategy before you need it

Common mistakes

  • Caching everything — unique, one-off queries see no benefit
  • Setting TTL too long — users see stale data; too short defeats the purpose
  • Ignoring cache stampedes — popular entries expiring simultaneously can crash your database
  • Caching without monitoring — track hit rate, memory usage, and eviction count

Go deeper