Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Hitchhiker’s Guide to CDNs

“Don’t Panic.”

This guide is for engineers who want to understand Content Delivery Networks from first principles — not the marketing brochure version, but the real, production-grade, failure-mode-and-all version that principal engineers at Cloudflare, Fastly, and AWS think about every day.


What This Guide Is

You are reading the companion to a 21-lab Go codebase. Each lab is a fully runnable program (go run ./labs/lab-XX-name/) that demonstrates one specific CDN concept. The code is intentionally duplicated across labs — each lab is self-contained, not a library — so you can read it in isolation.

This guide gives every lab the depth it deserves: the why behind every design decision, the failure modes, the real-world vendor implementations, and the production nuances that only seasoned engineers with scar tissue know.


Who This Is For

  • Principal engineers evaluating or building CDN infrastructure
  • Staff engineers integrating CDNs into large-scale distributed systems
  • Platform/infrastructure engineers owning edge architecture
  • Engineers who want to stop treating the CDN as a black box

Prerequisites: solid Go knowledge, comfort with HTTP internals, basic distributed systems familiarity (you know what a TCP connection is).


The 10,000-foot Architecture

Before diving into individual labs, ground yourself in the full picture:

User (browser / mobile app)
  │
  │   DNS resolves cdn.example.com to nearest PoP IP (Anycast BGP or GeoDNS)
  ▼
┌──────────────────────────────────────────────┐
│  Edge PoP  (e.g. Cloudflare NYC)             │
│                                              │
│  1. TLS termination           (ECDH, TLS 1.3)│
│  2. HTTP/3 + QUIC or HTTP/2   (lab 18)       │
│  3. Signed-URL verification   (lab 16)       │
│  4. Edge compute (WASM)       (lab 17)       │
│  5. Cache lookup — L1 memory  (lab 08)       │
│  6. Cache lookup — L2 NVMe    (lab 08)       │
│  7. Request collapsing        (lab 06)       │
│  8. Compression               (lab 10)       │
│  9. Range request support     (lab 11)       │
└──────────────────┬───────────────────────────┘
                   │ cache MISS only
                   ▼
┌──────────────────────────────────────────────┐
│  Origin Shield  (e.g. Cloudflare Tiered Cache│
│  or Fastly Shield PoP)                       │
│                                              │
│  1. Consistent-hashed routing  (lab 12)      │
│  2. Singleflight collapse      (lab 13)      │
│  3. Gossip invalidation        (lab 14)      │
└──────────────────┬───────────────────────────┘
                   │ shield MISS only
                   ▼
┌──────────────────────────────────────────────┐
│  Origin  (S3 / App Server / Database)        │
│  (lab 01)                                    │
└──────────────────────────────────────────────┘

The CDN’s purpose is simple: serve as many requests as possible without touching the origin. Every lab in this series improves that ratio.


The Numbers That Matter

MetricTypical production target
Cache hit ratio (by request)85–95%
Cache hit ratio (by bytes)often higher (large objects)
Edge L1 miss-to-shield latency1–5 ms
Shield miss-to-origin latency10–100 ms
TLS handshake (session resume)< 1 ms
TTFB (Time To First Byte) to user< 50 ms at p99
Availability SLA99.99% (52 min downtime/year)

Cloudflare publicly reported ~60 million requests/second in peak traffic (2024). At that scale, a 1% cache hit ratio improvement saves ~600,000 origin requests per second.


How to Run the Labs

# Clone and install deps
git clone https://github.com/10xdev/cdn && cd cdn
go mod download

# Run any lab
make lab-01    # or: go run ./labs/lab-01-origin-server/

# Build all labs to verify compilation
go build ./...

Each lab:

  1. Starts an embedded mock origin on :9001
  2. Starts the edge/proxy on :8080 (sometimes :8081, :8082 too)
  3. Runs a self-contained demo with printed observations
  4. Blocks at the end so you can curl endpoints manually

Lab Map

#LabCore ConceptKey Go API
01Origin ServerLatency baselinenet/http
02Reverse ProxyForwarding, connection poolshttputil.ReverseProxy
03First CacheMiss/hit, TTLsync.Map
04HTTP Cache HeadersETag, 304, Cache-ControlRFC 7234
05Cache Key DesignVary, tracking paramsurl.Values
06Thundering HerdRequest collapsingsingleflight.Group
07Stale ContentRFC 5861 SWR/SIEcustom TTL windows
08Tiered CacheLRU + diskcontainer/list + xxhash
09Cache TagsSurrogate-Key purgesync.RWMutex
10Compressiongzip/brotli/zstd negotiationandybalholm/brotli
11Range Requests206 Partial Contenthttp.ServeContent
12Consistent HashingStable node routingburaksezer/consistent
13Origin ShieldTiered PoPs + singleflightgolang.org/x/sync
14Gossip ClusterDistributed invalidationhashicorp/memberlist
15Geo RoutingHaversine, PoP failovercustom
16Signed URLsHMAC-SHA256 token authcrypto/hmac
17Edge ComputeWASM sandboxing at edgetetratelabs/wazero
18HTTP/3 + QUICQUIC transportquic-go/quic-go
19HLS StreamingAdaptive bitrate cachecustom
20ObservabilityPrometheus, SLOs, logsprometheus/client_golang
21Full SystemAll layers togetherAll of the above

Reading This Guide

Each chapter follows the same structure:

  1. The Problem — why this feature exists, what breaks without it
  2. The Protocol / Algorithm — the formal specification or academic basis
  3. The Implementation — walkthrough of the lab code with deep commentary
  4. Production Details — how Cloudflare, Fastly, AWS CloudFront do it
  5. Failure Modes — what goes wrong and how to detect it
  6. What to Measure — metrics, alerts, and SLO indicators
  7. Try It — curl commands and things to observe

Let’s start at the beginning.