The Hitchhiker’s Guide to CDNs
“Don’t Panic.”
This guide is for engineers who want to understand Content Delivery Networks from first principles — not the marketing brochure version, but the real, production-grade, failure-mode-and-all version that principal engineers at Cloudflare, Fastly, and AWS think about every day.
What This Guide Is
You are reading the companion to a 21-lab Go codebase. Each lab is a
fully runnable program (go run ./labs/lab-XX-name/) that demonstrates one
specific CDN concept. The code is intentionally duplicated across labs — each
lab is self-contained, not a library — so you can read it in isolation.
This guide gives every lab the depth it deserves: the why behind every design decision, the failure modes, the real-world vendor implementations, and the production nuances that only seasoned engineers with scar tissue know.
Who This Is For
- Principal engineers evaluating or building CDN infrastructure
- Staff engineers integrating CDNs into large-scale distributed systems
- Platform/infrastructure engineers owning edge architecture
- Engineers who want to stop treating the CDN as a black box
Prerequisites: solid Go knowledge, comfort with HTTP internals, basic distributed systems familiarity (you know what a TCP connection is).
The 10,000-foot Architecture
Before diving into individual labs, ground yourself in the full picture:
User (browser / mobile app)
│
│ DNS resolves cdn.example.com to nearest PoP IP (Anycast BGP or GeoDNS)
▼
┌──────────────────────────────────────────────┐
│ Edge PoP (e.g. Cloudflare NYC) │
│ │
│ 1. TLS termination (ECDH, TLS 1.3)│
│ 2. HTTP/3 + QUIC or HTTP/2 (lab 18) │
│ 3. Signed-URL verification (lab 16) │
│ 4. Edge compute (WASM) (lab 17) │
│ 5. Cache lookup — L1 memory (lab 08) │
│ 6. Cache lookup — L2 NVMe (lab 08) │
│ 7. Request collapsing (lab 06) │
│ 8. Compression (lab 10) │
│ 9. Range request support (lab 11) │
└──────────────────┬───────────────────────────┘
│ cache MISS only
▼
┌──────────────────────────────────────────────┐
│ Origin Shield (e.g. Cloudflare Tiered Cache│
│ or Fastly Shield PoP) │
│ │
│ 1. Consistent-hashed routing (lab 12) │
│ 2. Singleflight collapse (lab 13) │
│ 3. Gossip invalidation (lab 14) │
└──────────────────┬───────────────────────────┘
│ shield MISS only
▼
┌──────────────────────────────────────────────┐
│ Origin (S3 / App Server / Database) │
│ (lab 01) │
└──────────────────────────────────────────────┘
The CDN’s purpose is simple: serve as many requests as possible without touching the origin. Every lab in this series improves that ratio.
The Numbers That Matter
| Metric | Typical production target |
|---|---|
| Cache hit ratio (by request) | 85–95% |
| Cache hit ratio (by bytes) | often higher (large objects) |
| Edge L1 miss-to-shield latency | 1–5 ms |
| Shield miss-to-origin latency | 10–100 ms |
| TLS handshake (session resume) | < 1 ms |
| TTFB (Time To First Byte) to user | < 50 ms at p99 |
| Availability SLA | 99.99% (52 min downtime/year) |
Cloudflare publicly reported ~60 million requests/second in peak traffic (2024). At that scale, a 1% cache hit ratio improvement saves ~600,000 origin requests per second.
How to Run the Labs
# Clone and install deps
git clone https://github.com/10xdev/cdn && cd cdn
go mod download
# Run any lab
make lab-01 # or: go run ./labs/lab-01-origin-server/
# Build all labs to verify compilation
go build ./...
Each lab:
- Starts an embedded mock origin on
:9001 - Starts the edge/proxy on
:8080(sometimes:8081,:8082too) - Runs a self-contained demo with printed observations
- Blocks at the end so you can
curlendpoints manually
Lab Map
| # | Lab | Core Concept | Key Go API |
|---|---|---|---|
| 01 | Origin Server | Latency baseline | net/http |
| 02 | Reverse Proxy | Forwarding, connection pools | httputil.ReverseProxy |
| 03 | First Cache | Miss/hit, TTL | sync.Map |
| 04 | HTTP Cache Headers | ETag, 304, Cache-Control | RFC 7234 |
| 05 | Cache Key Design | Vary, tracking params | url.Values |
| 06 | Thundering Herd | Request collapsing | singleflight.Group |
| 07 | Stale Content | RFC 5861 SWR/SIE | custom TTL windows |
| 08 | Tiered Cache | LRU + disk | container/list + xxhash |
| 09 | Cache Tags | Surrogate-Key purge | sync.RWMutex |
| 10 | Compression | gzip/brotli/zstd negotiation | andybalholm/brotli |
| 11 | Range Requests | 206 Partial Content | http.ServeContent |
| 12 | Consistent Hashing | Stable node routing | buraksezer/consistent |
| 13 | Origin Shield | Tiered PoPs + singleflight | golang.org/x/sync |
| 14 | Gossip Cluster | Distributed invalidation | hashicorp/memberlist |
| 15 | Geo Routing | Haversine, PoP failover | custom |
| 16 | Signed URLs | HMAC-SHA256 token auth | crypto/hmac |
| 17 | Edge Compute | WASM sandboxing at edge | tetratelabs/wazero |
| 18 | HTTP/3 + QUIC | QUIC transport | quic-go/quic-go |
| 19 | HLS Streaming | Adaptive bitrate cache | custom |
| 20 | Observability | Prometheus, SLOs, logs | prometheus/client_golang |
| 21 | Full System | All layers together | All of the above |
Reading This Guide
Each chapter follows the same structure:
- The Problem — why this feature exists, what breaks without it
- The Protocol / Algorithm — the formal specification or academic basis
- The Implementation — walkthrough of the lab code with deep commentary
- Production Details — how Cloudflare, Fastly, AWS CloudFront do it
- Failure Modes — what goes wrong and how to detect it
- What to Measure — metrics, alerts, and SLO indicators
- Try It — curl commands and things to observe
Let’s start at the beginning.