Lab 08 · Tiered Cache: Memory + Disk

Run it: make lab-08
Source: labs/lab-08-tiered-cache/main.go

The Problem

A single in-memory cache has two opposing constraints:

Memory is fast but limited: A 32 GB edge node has ~28 GB usable after OS. At an average response size of 10 KB, that’s ~2.8 million cached objects.
Disk is large but slower: NVMe at 500 µs is 50× slower than DRAM at 100 ns — but a 4 TB NVMe holds 400 million 10 KB objects.

The solution is a two-tier cache: hot objects in L1 memory, warm objects in L2 disk. On an L1 miss, check L2 before going to origin.

Request
  │
  ▼
L1 (Memory LRU)  ─ HIT (< 1 µs) → response
  │ miss
  ▼
L2 (Disk NVMe)   ─ HIT (< 1 ms) → promote to L1 → response
  │ miss
  ▼
Origin           ─ fetch (80+ ms) → store in L2 → promote to L1 → response

This is the architecture of Nginx’s proxy_cache with memory + file tiers, Varnish’s Massive Storage Engine (MSE), and Cloudflare’s tiered cache storage.

L1: LRU Memory Cache

The LRU Data Structure

LRU (Least Recently Used) eviction requires O(1) Get and O(1) Put. The classic implementation uses a doubly-linked list + hash map:

Hash map: key → *list.Element  (O(1) lookup by key)
List:     MRU [elem4, elem2, elem1, elem3] LRU  (O(1) move to front, remove from back)

On Get(key):
  1. Lookup element in hash map          O(1)
  2. Move element to front of list       O(1)  ← "recently used"
  3. Return value

On Put(key, value):
  1. If key exists: update + move to front
  2. If cache full: remove LRU (list.Back()), delete from hash map
  3. Insert new element at front

Go’s container/list provides the doubly-linked list. Combined with a sync.RWMutex for concurrent access:

type LRUCache struct {
    cap   int
    mu    sync.RWMutex
    list  *list.List
    items map[string]*list.Element
}

type item struct {
    key   string
    value []byte
    expiry time.Time
}

func (c *LRUCache) Get(key string) ([]byte, bool) {
    c.mu.Lock()
    defer c.mu.Unlock()
    el, ok := c.items[key]
    if !ok { return nil, false }
    entry := el.Value.(*item)
    if time.Now().After(entry.expiry) {
        c.list.Remove(el)
        delete(c.items, key)
        return nil, false
    }
    c.list.MoveToFront(el)  // Mark as recently used
    return entry.value, true
}

Note: Get takes a write lock (not read lock) because it modifies the list order. This is a subtle but important point: LRU Get is a mutation. Solutions include:

Accept write lock on every Get (simple, correct, ~50 ns per lock)
Use a CLOCK algorithm (approximate LRU, read-only Get — used by Linux VM)
Use a concurrent LRU like Ristretto (shard by key hash, reduces lock contention)

L2: Disk Cache with xxhash

Key-to-Filename Mapping

Storing arbitrary URL strings as filenames is error-prone (path traversal, length limits, special characters). Map the key to a hash:

import "github.com/cespare/xxhash/v2"

func diskPath(dir, key string) string {
    h := xxhash.Sum64String(key)
    // Use a two-level directory structure to avoid large directories:
    // "a1b2c3d4e5f60718" → "a1/b2/a1b2c3d4e5f60718"
    hex := fmt.Sprintf("%016x", h)
    return filepath.Join(dir, hex[:2], hex[2:4], hex)
}

The two-level directory sharding (common in systems like git’s object store) prevents the filesystem from having too many entries in a single directory. ext4 directories degrade at ~10 million entries; most filesystems struggle past 100k entries in a flat dir.

Why xxhash, Not MD5/SHA1?

Hash	Throughput	Output size	Collision resistance
MD5	~500 MB/s	128 bit	Low (broken for crypto)
SHA-1	~300 MB/s	160 bit	Low (broken for crypto)
SHA-256	~200 MB/s	256 bit	Cryptographically strong
xxhash64	~30 GB/s	64 bit	Sufficient for cache keys
xxhash128	~30 GB/s	128 bit	Sufficient for cache keys

For cache key → filename, we don’t need cryptographic resistance — we need speed and low collision probability. With 10 million cached items, xxhash64’s 64-bit space (1.8 × 10^19) gives a collision probability of ~2.7 × 10^-9. Acceptable.

xxhash is also used internally by Fastly and in ClickHouse, Kafka, and numerous storage systems.

Atomic File Writes

To prevent a partially-written cache file from being read:

func writeDisk(path string, data []byte) error {
    // Write to temp file first
    tmp := path + ".tmp"
    if err := os.WriteFile(tmp, data, 0644); err != nil {
        return err
    }
    // Atomic rename: no reader can see a partial write
    return os.Rename(tmp, path)
}

os.Rename is atomic on POSIX systems (guaranteed by the kernel). This is the same technique used by SQLite (WAL mode), databases, and every serious storage system.

Cache Promotion

When an item is fetched from L2 (disk) to serve a request, promote it to L1 (memory) so subsequent requests for the same item are served from memory:

if data, ok := l2.Get(key); ok {
    l1.Put(key, data, ttl)   // promote to memory
    return data, "L2-HIT"
}

Promotion policies to consider:

Always promote: simple; hot items quickly migrate to L1
Promote after N hits: avoids polluting L1 with items only needed once
Promote based on size: don’t promote large files to L1 (wastes memory with low incremental benefit)

Production Numbers

Tier	Technology	Latency	Size (single node)
L1 memory	DRAM	< 1 µs	4–128 GB
L1.5 NVMe	PCIe 4.0 NVMe	50–200 µs	1–8 TB
L2 disk	HDD RAID	3–10 ms	4–100 TB
L3 shield	Origin Shield PoP	5–20 ms	100s of TB
Origin	App server / S3	50–500 ms	unlimited

Cloudflare uses NVMe SSDs as L2 across all PoPs, with the L1 being in-process memory (implemented in Rust/C++). The transition from HDD to NVMe across CDN infrastructure (2015–2020) reduced L2 miss penalty by ~50× and enabled much larger object counts.

Try It

make lab-08

# First request: L2 miss → origin fetch → stored in L2 + promoted to L1
curl http://localhost:8080/article/1

# Second request: L1 HIT (< 1µs)
curl http://localhost:8080/article/1

# Kill and restart the process — L1 (memory) is gone but L2 (disk) remains
# Restart: first request should be L2 HIT, then promoted to L1

Keyboard shortcuts

The Hitchhiker's Guide to CDNs