Lab 03 · The First Cache

Run it: make lab-03
Source: labs/lab-03-first-cache/main.go

The Problem

The reverse proxy in Lab 02 blindly forwards every request to the origin. A cache short-circuits that path: if we’ve seen this URL recently and have a stored response, serve it directly from memory without touching the origin.

The fundamental trade-off: freshness vs. cost. A cached response might be stale, but serving it is:

Orders of magnitude faster (memory read vs. network round trip)
Origin-free (no database query, no CPU work)
Deterministic (no dependency on origin availability)

The Cache Lifecycle: MISS → HIT → EXPIRED

Request arrives
    │
    ▼
┌─────────────────────────────────────┐
│  Lookup key = normalize(URL)         │
└─────────────────────────────────────┘
    │
    ├─► Entry not found → MISS
    │       │
    │       ▼
    │   Fetch from origin
    │   Store in cache with deadline = now + TTL
    │   Return response to client
    │
    ├─► Entry found, not expired → HIT
    │       │
    │       ▼
    │   Return cached response immediately
    │
    └─► Entry found, expired → EXPIRED (= MISS)
            │
            ▼
        Revalidate or re-fetch
        Replace cache entry

The X-Cache response header tells the client (and debugging engineers) which branch was taken:

X-Cache: MISS      # first request for this URL
X-Cache: HIT       # served from cache

Implementation: `sync.Map` + TTL

type cacheEntry struct {
    response []byte
    headers  http.Header
    status   int
    expiry   time.Time
}

var cache sync.Map   // map[string]*cacheEntry

func get(key string) (*cacheEntry, bool) {
    v, ok := cache.Load(key)
    if !ok { return nil, false }
    entry := v.(*cacheEntry)
    if time.Now().After(entry.expiry) {
        cache.Delete(key)    // lazy expiry
        return nil, false
    }
    return entry, true
}

Why sync.Map? The standard map plus sync.RWMutex would work, but sync.Map is optimized for a specific workload: many reads, few writes, stable key set. CDN caches have a hot set of URLs that are read millions of times per second and written (populated) far less often. sync.Map achieves this via an atomic “read map” that requires no locking on reads for existing keys.

However, sync.Map has a known weakness: its internal dirty map can accumulate entries and requires a periodic promotion step. For very write-heavy caches (cold start, high churn), a sharded map + sync.RWMutex pattern can be more efficient.

TTL: Where Does It Come From?

In Lab 03 the TTL is hardcoded. Lab 04 shows how to parse it properly from Cache-Control headers:

Cache-Control: public, max-age=300
→ TTL = 300 seconds

Cache-Control: no-store
→ Do not cache at all

Cache-Control: private
→ Do not store in shared (CDN) cache

Cache-Control: no-cache
→ Store but always revalidate before serving

Ignoring Cache-Control is the #1 cause of CDN misconfiguration. If you cache a private response, you may serve one user’s data to another. If you cache no-store, you violate the application’s contract.

Background Sweep: Avoiding Memory Leaks

A cache without eviction grows unboundedly. Lab 03 runs a background goroutine that sweeps expired entries:

go func() {
    ticker := time.NewTicker(30 * time.Second)
    for range ticker.C {
        var expired []string
        cache.Range(func(k, v any) bool {
            if time.Now().After(v.(*cacheEntry).expiry) {
                expired = append(expired, k.(string))
            }
            return true
        })
        for _, k := range expired {
            cache.Delete(k)
        }
    }
}()

Note the two-phase delete: first collect expired keys (during which we hold the range lock), then delete. You cannot modify sync.Map during a Range iteration.

Production caches use more sophisticated eviction:

Policy	Description	Use case
TTL expiry	Remove at expiry	All caches
LRU	Evict least-recently-used	Bounded memory (Lab 08)
LFU	Evict least-frequently-used	Popularity-skewed workloads
ARC	Adaptive Replacement Cache	Self-tuning between LRU and LFU
S3-FIFO	Simple, Scalable, Segmented FIFO	Modern alternative to LRU (lower overhead)

The Deliberate Limitations of Lab 03

The lab explicitly documents what it doesn’t do yet:

No Cache-Control parsing — TTL is hardcoded. Fixed in Lab 04.
No singleflight — concurrent misses all hammer origin. Fixed in Lab 06.
Unbounded memory — LRU eviction arrives in Lab 08.
No content negotiation — same key for Accept-Encoding: gzip and Accept-Encoding: br. Fixed in Lab 05 via Vary.
No conditional requests — always fetches full response, no 304. Fixed in Lab 04.

This incremental approach is pedagogically important: each lab adds exactly one concept so the interaction is clear.

Production Detail: Cache Serialization Format

Real CDN disk caches store responses in compact binary formats. Varnish uses its own VCL-controlled storage. Nginx uses a format with:

[8 bytes: key hash]
[8 bytes: expiry timestamp]
[4 bytes: headers length]
[4 bytes: body length]
[headers (HTTP/1.1 text)]
[body bytes]

Lab 08 uses file-system storage with xxhash-named files, which is functionally equivalent but less efficient (filesystem metadata overhead).

For in-memory caches, Google’s Groupcache and Fastly’s own cache daemon use Protocol Buffers for serialization, enabling:

Zero-copy responses via io.WriterTo
Shared-memory between processes
Binary compatibility across versions

What to Measure

# Hit ratio (requests)
sum(rate(cache_hits_total[5m])) /
sum(rate(cache_requests_total[5m]))

# Miss rate (triggers origin fetches)
rate(cache_misses_total[5m])

# Cache entries currently stored
cache_entries_current

# Evictions (if bounded cache)
rate(cache_evictions_total[5m])

Try It

make lab-03

# First request — should be MISS
curl http://localhost:8080/article/1 -v | grep X-Cache

# Second request — should be HIT (< 1ms)
curl http://localhost:8080/article/1 -v | grep X-Cache

# X-Origin-Hit should only increment on first request
curl http://localhost:8080/article/1 -H "X-Debug: origin-count"

Keyboard shortcuts

The Hitchhiker's Guide to CDNs