Lab 08 · Tiered Cache: Memory + Disk
Run it:
make lab-08
Source:labs/lab-08-tiered-cache/main.go
The Problem
A single in-memory cache has two opposing constraints:
- Memory is fast but limited: A 32 GB edge node has ~28 GB usable after OS. At an average response size of 10 KB, that’s ~2.8 million cached objects.
- Disk is large but slower: NVMe at 500 µs is 50× slower than DRAM at 100 ns — but a 4 TB NVMe holds 400 million 10 KB objects.
The solution is a two-tier cache: hot objects in L1 memory, warm objects in L2 disk. On an L1 miss, check L2 before going to origin.
Request
│
▼
L1 (Memory LRU) ─ HIT (< 1 µs) → response
│ miss
▼
L2 (Disk NVMe) ─ HIT (< 1 ms) → promote to L1 → response
│ miss
▼
Origin ─ fetch (80+ ms) → store in L2 → promote to L1 → response
This is the architecture of Nginx’s proxy_cache with memory + file
tiers, Varnish’s Massive Storage Engine (MSE), and Cloudflare’s tiered
cache storage.
L1: LRU Memory Cache
The LRU Data Structure
LRU (Least Recently Used) eviction requires O(1) Get and O(1) Put.
The classic implementation uses a doubly-linked list + hash map:
Hash map: key → *list.Element (O(1) lookup by key)
List: MRU [elem4, elem2, elem1, elem3] LRU (O(1) move to front, remove from back)
On Get(key):
1. Lookup element in hash map O(1)
2. Move element to front of list O(1) ← "recently used"
3. Return value
On Put(key, value):
1. If key exists: update + move to front
2. If cache full: remove LRU (list.Back()), delete from hash map
3. Insert new element at front
Go’s container/list provides the doubly-linked list. Combined with
a sync.RWMutex for concurrent access:
type LRUCache struct {
cap int
mu sync.RWMutex
list *list.List
items map[string]*list.Element
}
type item struct {
key string
value []byte
expiry time.Time
}
func (c *LRUCache) Get(key string) ([]byte, bool) {
c.mu.Lock()
defer c.mu.Unlock()
el, ok := c.items[key]
if !ok { return nil, false }
entry := el.Value.(*item)
if time.Now().After(entry.expiry) {
c.list.Remove(el)
delete(c.items, key)
return nil, false
}
c.list.MoveToFront(el) // Mark as recently used
return entry.value, true
}
Note: Get takes a write lock (not read lock) because it modifies the
list order. This is a subtle but important point: LRU Get is a mutation.
Solutions include:
- Accept write lock on every Get (simple, correct, ~50 ns per lock)
- Use a CLOCK algorithm (approximate LRU, read-only Get — used by Linux VM)
- Use a concurrent LRU like Ristretto (shard by key hash, reduces lock contention)
L2: Disk Cache with xxhash
Key-to-Filename Mapping
Storing arbitrary URL strings as filenames is error-prone (path traversal, length limits, special characters). Map the key to a hash:
import "github.com/cespare/xxhash/v2"
func diskPath(dir, key string) string {
h := xxhash.Sum64String(key)
// Use a two-level directory structure to avoid large directories:
// "a1b2c3d4e5f60718" → "a1/b2/a1b2c3d4e5f60718"
hex := fmt.Sprintf("%016x", h)
return filepath.Join(dir, hex[:2], hex[2:4], hex)
}
The two-level directory sharding (common in systems like git’s object store) prevents the filesystem from having too many entries in a single directory. ext4 directories degrade at ~10 million entries; most filesystems struggle past 100k entries in a flat dir.
Why xxhash, Not MD5/SHA1?
| Hash | Throughput | Output size | Collision resistance |
|---|---|---|---|
| MD5 | ~500 MB/s | 128 bit | Low (broken for crypto) |
| SHA-1 | ~300 MB/s | 160 bit | Low (broken for crypto) |
| SHA-256 | ~200 MB/s | 256 bit | Cryptographically strong |
| xxhash64 | ~30 GB/s | 64 bit | Sufficient for cache keys |
| xxhash128 | ~30 GB/s | 128 bit | Sufficient for cache keys |
For cache key → filename, we don’t need cryptographic resistance — we need speed and low collision probability. With 10 million cached items, xxhash64’s 64-bit space (1.8 × 10^19) gives a collision probability of ~2.7 × 10^-9. Acceptable.
xxhash is also used internally by Fastly and in ClickHouse, Kafka, and numerous storage systems.
Atomic File Writes
To prevent a partially-written cache file from being read:
func writeDisk(path string, data []byte) error {
// Write to temp file first
tmp := path + ".tmp"
if err := os.WriteFile(tmp, data, 0644); err != nil {
return err
}
// Atomic rename: no reader can see a partial write
return os.Rename(tmp, path)
}
os.Rename is atomic on POSIX systems (guaranteed by the kernel).
This is the same technique used by SQLite (WAL mode), databases, and
every serious storage system.
Cache Promotion
When an item is fetched from L2 (disk) to serve a request, promote it to L1 (memory) so subsequent requests for the same item are served from memory:
if data, ok := l2.Get(key); ok {
l1.Put(key, data, ttl) // promote to memory
return data, "L2-HIT"
}
Promotion policies to consider:
- Always promote: simple; hot items quickly migrate to L1
- Promote after N hits: avoids polluting L1 with items only needed once
- Promote based on size: don’t promote large files to L1 (wastes memory with low incremental benefit)
Production Numbers
| Tier | Technology | Latency | Size (single node) |
|---|---|---|---|
| L1 memory | DRAM | < 1 µs | 4–128 GB |
| L1.5 NVMe | PCIe 4.0 NVMe | 50–200 µs | 1–8 TB |
| L2 disk | HDD RAID | 3–10 ms | 4–100 TB |
| L3 shield | Origin Shield PoP | 5–20 ms | 100s of TB |
| Origin | App server / S3 | 50–500 ms | unlimited |
Cloudflare uses NVMe SSDs as L2 across all PoPs, with the L1 being in-process memory (implemented in Rust/C++). The transition from HDD to NVMe across CDN infrastructure (2015–2020) reduced L2 miss penalty by ~50× and enabled much larger object counts.
Try It
make lab-08
# First request: L2 miss → origin fetch → stored in L2 + promoted to L1
curl http://localhost:8080/article/1
# Second request: L1 HIT (< 1µs)
curl http://localhost:8080/article/1
# Kill and restart the process — L1 (memory) is gone but L2 (disk) remains
# Restart: first request should be L2 HIT, then promoted to L1