The Hitchhiker’s Guide to CDNs
“Don’t Panic.”
This guide is for engineers who want to understand Content Delivery Networks from first principles — not the marketing brochure version, but the real, production-grade, failure-mode-and-all version that principal engineers at Cloudflare, Fastly, and AWS think about every day.
What This Guide Is
You are reading the companion to a 21-lab Go codebase. Each lab is a
fully runnable program (go run ./labs/lab-XX-name/) that demonstrates one
specific CDN concept. The code is intentionally duplicated across labs — each
lab is self-contained, not a library — so you can read it in isolation.
This guide gives every lab the depth it deserves: the why behind every design decision, the failure modes, the real-world vendor implementations, and the production nuances that only seasoned engineers with scar tissue know.
Who This Is For
- Principal engineers evaluating or building CDN infrastructure
- Staff engineers integrating CDNs into large-scale distributed systems
- Platform/infrastructure engineers owning edge architecture
- Engineers who want to stop treating the CDN as a black box
Prerequisites: solid Go knowledge, comfort with HTTP internals, basic distributed systems familiarity (you know what a TCP connection is).
The 10,000-foot Architecture
Before diving into individual labs, ground yourself in the full picture:
User (browser / mobile app)
│
│ DNS resolves cdn.example.com to nearest PoP IP (Anycast BGP or GeoDNS)
▼
┌──────────────────────────────────────────────┐
│ Edge PoP (e.g. Cloudflare NYC) │
│ │
│ 1. TLS termination (ECDH, TLS 1.3)│
│ 2. HTTP/3 + QUIC or HTTP/2 (lab 18) │
│ 3. Signed-URL verification (lab 16) │
│ 4. Edge compute (WASM) (lab 17) │
│ 5. Cache lookup — L1 memory (lab 08) │
│ 6. Cache lookup — L2 NVMe (lab 08) │
│ 7. Request collapsing (lab 06) │
│ 8. Compression (lab 10) │
│ 9. Range request support (lab 11) │
└──────────────────┬───────────────────────────┘
│ cache MISS only
▼
┌──────────────────────────────────────────────┐
│ Origin Shield (e.g. Cloudflare Tiered Cache│
│ or Fastly Shield PoP) │
│ │
│ 1. Consistent-hashed routing (lab 12) │
│ 2. Singleflight collapse (lab 13) │
│ 3. Gossip invalidation (lab 14) │
└──────────────────┬───────────────────────────┘
│ shield MISS only
▼
┌──────────────────────────────────────────────┐
│ Origin (S3 / App Server / Database) │
│ (lab 01) │
└──────────────────────────────────────────────┘
The CDN’s purpose is simple: serve as many requests as possible without touching the origin. Every lab in this series improves that ratio.
The Numbers That Matter
| Metric | Typical production target |
|---|---|
| Cache hit ratio (by request) | 85–95% |
| Cache hit ratio (by bytes) | often higher (large objects) |
| Edge L1 miss-to-shield latency | 1–5 ms |
| Shield miss-to-origin latency | 10–100 ms |
| TLS handshake (session resume) | < 1 ms |
| TTFB (Time To First Byte) to user | < 50 ms at p99 |
| Availability SLA | 99.99% (52 min downtime/year) |
Cloudflare publicly reported ~60 million requests/second in peak traffic (2024). At that scale, a 1% cache hit ratio improvement saves ~600,000 origin requests per second.
How to Run the Labs
# Clone and install deps
git clone https://github.com/10xdev/cdn && cd cdn
go mod download
# Run any lab
make lab-01 # or: go run ./labs/lab-01-origin-server/
# Build all labs to verify compilation
go build ./...
Each lab:
- Starts an embedded mock origin on
:9001 - Starts the edge/proxy on
:8080(sometimes:8081,:8082too) - Runs a self-contained demo with printed observations
- Blocks at the end so you can
curlendpoints manually
Lab Map
| # | Lab | Core Concept | Key Go API |
|---|---|---|---|
| 01 | Origin Server | Latency baseline | net/http |
| 02 | Reverse Proxy | Forwarding, connection pools | httputil.ReverseProxy |
| 03 | First Cache | Miss/hit, TTL | sync.Map |
| 04 | HTTP Cache Headers | ETag, 304, Cache-Control | RFC 7234 |
| 05 | Cache Key Design | Vary, tracking params | url.Values |
| 06 | Thundering Herd | Request collapsing | singleflight.Group |
| 07 | Stale Content | RFC 5861 SWR/SIE | custom TTL windows |
| 08 | Tiered Cache | LRU + disk | container/list + xxhash |
| 09 | Cache Tags | Surrogate-Key purge | sync.RWMutex |
| 10 | Compression | gzip/brotli/zstd negotiation | andybalholm/brotli |
| 11 | Range Requests | 206 Partial Content | http.ServeContent |
| 12 | Consistent Hashing | Stable node routing | buraksezer/consistent |
| 13 | Origin Shield | Tiered PoPs + singleflight | golang.org/x/sync |
| 14 | Gossip Cluster | Distributed invalidation | hashicorp/memberlist |
| 15 | Geo Routing | Haversine, PoP failover | custom |
| 16 | Signed URLs | HMAC-SHA256 token auth | crypto/hmac |
| 17 | Edge Compute | WASM sandboxing at edge | tetratelabs/wazero |
| 18 | HTTP/3 + QUIC | QUIC transport | quic-go/quic-go |
| 19 | HLS Streaming | Adaptive bitrate cache | custom |
| 20 | Observability | Prometheus, SLOs, logs | prometheus/client_golang |
| 21 | Full System | All layers together | All of the above |
Reading This Guide
Each chapter follows the same structure:
- The Problem — why this feature exists, what breaks without it
- The Protocol / Algorithm — the formal specification or academic basis
- The Implementation — walkthrough of the lab code with deep commentary
- Production Details — how Cloudflare, Fastly, AWS CloudFront do it
- Failure Modes — what goes wrong and how to detect it
- What to Measure — metrics, alerts, and SLO indicators
- Try It — curl commands and things to observe
Let’s start at the beginning.
Lab 01 · The Origin Server
Run it:
make lab-01
Source:labs/lab-01-origin-server/main.go
The Problem
Before you can understand what a CDN does, you need to understand what it protects. The origin server is the authoritative source of content — the thing that actually knows what the response should be. It might be:
- A Go/Python/Rails application querying a PostgreSQL database
- An S3 bucket serving static files
- A legacy monolith that someone is afraid to touch
- A media encoder writing MPEG-TS segments in real time
The origin’s fundamental problem is latency × concurrency. Every request pays the full cost of whatever work the origin must do: database queries, template rendering, business logic, external API calls.
The math
At 80 ms average latency per request, handling 1,000 requests/second requires 80 simultaneously active goroutines just to keep up. That’s 80 database connections, 80 in-flight external calls, 80 units of CPU work happening at once. At 500 rps it becomes 40. These numbers sound manageable until traffic spikes 10×.
Now imagine the home page of a news site during a breaking story. 50,000 concurrent users hit Refresh. At 80 ms latency, you need 4,000 simultaneous origin threads. No origin handles that gracefully — but a CDN can serve all 50,000 from a single cached response stored at the edge.
What This Lab Shows
Lab 01 is intentionally simple: just the origin, no proxy, no cache.
User → Origin (:9001)
Every request pays the full --latency cost (default: 80 ms). You can
see this directly in the output — 12 sequential requests each taking ~80 ms,
for ~960 ms total.
The key observable: X-Origin-Hit increments for every single request.
When you add a cache in Lab 03, you’ll see this counter stop growing after
the first few requests.
The Origin Server Contract
A well-behaved origin sets these headers:
| Header | Purpose |
|---|---|
Cache-Control: public, max-age=N | Tells CDN: cache for N seconds |
Cache-Control: private | Tells CDN: don’t cache (user-specific) |
Cache-Control: no-store | Never cache anywhere |
ETag: "abc123" | Content fingerprint for conditional requests |
Vary: Accept-Encoding | Different response per encoding |
X-Served-By: origin | Debug header: which tier served this |
The lab origin sets Cache-Control: public, max-age=30 — correct for
publicly cacheable content. Labs 04–05 build on this contract in depth.
Production Detail: Origin Capacity Planning
CDN engineers think about origin capacity as the residual load after the CDN absorbs its share. If your CDN achieves a 90% hit ratio and you expect 10,000 req/s peak traffic:
Origin load = 10,000 × (1 - 0.90) = 1,000 req/s
Capacity-plan your origin for this number, not the full 10,000. But factor in cold-start scenarios: after a deploy, a CDN cache flush, or a network partition that invalidates a large fraction of cache simultaneously. Your origin must survive a sudden 10× spike above its steady-state CDN-assisted load.
This is why Cloudflare, Fastly, and AWS CloudFront all have “origin overload protection” features (origin shield, request collapsing, retries with circuit breakers) — labs 06 and 13.
Failure Modes
| Failure | Symptom | Fix |
|---|---|---|
| Origin latency spike | All edge responses slow | Stale-while-revalidate (lab 07) |
| Origin error rate spike | 502/503 from CDN | Stale-if-error (lab 07) |
| Origin cold start | High latency on deploy | Warm cache before cutover |
| DDoS bypass | Attacker hits origin IP directly | IP allowlist: CDN IPs only |
Security note: Always allowlist your origin to accept connections only from CDN IP ranges. If attackers discover your origin IP, they can bypass the CDN entirely and DDoS it directly. All major CDNs publish their IP ranges (Cloudflare:
https://cloudflare.com/ips).
What to Measure
# Origin request rate (should stay low and stable)
rate(origin_requests_total[1m])
# Origin p99 latency (your SLA baseline)
histogram_quantile(0.99, rate(origin_response_duration_seconds_bucket[5m]))
# Origin error rate (alert at >0.1%)
rate(origin_errors_total[5m]) / rate(origin_requests_total[5m])
Try It
make lab-01
# In another terminal:
curl http://localhost:9001/article/1 -v
# With higher latency:
go run ./labs/lab-01-origin-server/ --latency 200ms --requests 5
# With errors:
go run ./labs/lab-01-origin-server/ --error-rate 0.3
Watch X-Origin-Hit increment with every single request. When you reach
Lab 03 and add a cache, you’ll see it stop.
Lab 02 · The Naive Reverse Proxy
Run it:
make lab-02
Source:labs/lab-02-naive-proxy/main.go
The Problem
Adding a proxy between users and the origin is the first step in CDN architecture — before caching, before edge compute, before any optimization.
But why does a proxy even help if it doesn’t cache? Several reasons:
- TLS offloading: The CDN terminates TLS on fast, dedicated hardware so the origin doesn’t pay the cryptographic overhead for every user.
- Connection pooling: The proxy maintains persistent HTTP/1.1 keep-alive or HTTP/2 multiplexed connections to the origin, amortizing TCP handshake cost across many requests.
- Protocol upgrade: Users connect via HTTP/2 or HTTP/3; the CDN speaks HTTP/1.1 to a legacy origin.
- DDoS surface reduction: The origin is invisible to the internet.
- Header normalization: Strip tracking headers, add forwarding metadata.
- Rate limiting, WAF: Applied at the proxy before the origin even sees the request.
How httputil.ReverseProxy Works
Go’s standard library httputil.ReverseProxy is the canonical building
block for a reverse proxy:
proxy := &httputil.ReverseProxy{
Director: func(req *http.Request) {
req.URL.Scheme = "http"
req.URL.Host = "origin:9001"
// Strip hop-by-hop headers
req.Header.Del("Connection")
req.Header.Del("Upgrade")
// Append X-Forwarded-For
if clientIP, _, err := net.SplitHostPort(req.RemoteAddr); err == nil {
req.Header.Add("X-Forwarded-For", clientIP)
}
},
ModifyResponse: func(resp *http.Response) error {
resp.Header.Set("X-Served-By", "proxy")
return nil
},
ErrorHandler: func(w http.ResponseWriter, r *http.Request, err error) {
http.Error(w, "Bad Gateway", http.StatusBadGateway)
},
}
Director mutates the request before forwarding. It runs in the same
goroutine as the handler, so it must be fast and side-effect-free.
ModifyResponse mutates the response before sending back to the
client. Use this to add headers like X-Cache, normalize Content-Type,
or strip internal headers.
Transport is the HTTP client used to reach the origin. Default is
http.DefaultTransport, which maintains a connection pool. For production,
tune:
Transport: &http.Transport{
MaxIdleConnsPerHost: 200, // connection pool per origin
MaxConnsPerHost: 500, // max concurrent connections
IdleConnTimeout: 90 * time.Second,
ResponseHeaderTimeout: 30 * time.Second,
DisableKeepAlives: false, // ALWAYS keep-alives on
ForceAttemptHTTP2: true, // H2 to origin if supported
TLSHandshakeTimeout: 5 * time.Second,
// DialContext: custom dialer for DNS override, binding, etc.
}
Hop-by-Hop Headers
HTTP defines two classes of headers:
End-to-end headers: forwarded unchanged through all proxies to the
final recipient. Examples: Content-Type, ETag, Cache-Control,
Authorization.
Hop-by-hop headers: meaningful only for the immediate connection. Must be stripped before forwarding. Defined in RFC 7230 §6.1:
Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization,
TE, Trailers, Transfer-Encoding, Upgrade
Additionally, any header listed in the Connection header value is
hop-by-hop for that hop:
Connection: X-Custom-Header, Keep-Alive
→ Strip X-Custom-Header too
Failing to strip hop-by-hop headers causes subtle bugs: the origin may
try to negotiate an Upgrade on a connection it doesn’t have, or the
downstream client may receive a Transfer-Encoding: chunked header
that doesn’t match the actual response framing.
X-Forwarded-For and the IP Chain
When a proxy adds X-Forwarded-For: 1.2.3.4, and then another proxy
adds another layer, you get:
X-Forwarded-For: 1.2.3.4, 10.0.0.1
The leftmost IP is the client (set by the first trusted proxy). The rightmost is the last proxy before the origin. Origin applications should parse the first untrusted IP from the left — but only if they know how many trusted proxies are in front of them.
In production, CDNs like Cloudflare expose the real client IP via:
CF-Connecting-IP: 1.2.3.4 (always the real client IP)
True-Client-IP: 1.2.3.4 (Cloudflare Enterprise)
This avoids the ambiguity of X-Forwarded-For in multi-proxy setups.
Security trap: Never trust
X-Forwarded-Forfor access control if any user can send it directly. Validate the header only when you can confirm the request came through a trusted proxy.
The Proxy Overhead Measurement
The lab measures raw proxy overhead by timing the same request through:
- Direct origin call
- Through the proxy
Typical result: < 0.5 ms proxy overhead. This is negligible vs. actual origin latency (80+ ms). The overhead comes from:
- Goroutine scheduling (< 1 µs)
- Memory copy of request/response buffers
- Two additional TCP reads/writes
This is why the caching layer in Lab 03 — which adds zero network hops on a hit — provides dramatic speedups: it collapses 80 ms to < 0.1 ms.
Production Detail: Connection Pools
http.DefaultTransport uses a connection pool per host:port. When
Go’s HTTP client gets a response, it returns the underlying TCP connection
to the pool for reuse on the next request to the same origin.
At scale, pool sizing matters:
| Scenario | MaxIdleConnsPerHost |
|---|---|
| Single origin, low traffic | 10 (default) |
| Single origin, high traffic | 100–500 |
| Origin cluster behind load balancer | 200+ (connections spread across backends) |
| Origin with connection limit (MySQL) | Match origin’s max_connections |
Setting this too low forces new TCP handshakes under load, adding ~5 ms of SYN/ACK round-trip on every miss. At 10,000 cache misses/second, that’s 50 seconds/second of wasted TCP overhead.
Failure Modes
| Failure | Symptom | Fix |
|---|---|---|
| Origin timeout | 504 Gateway Timeout | Set ResponseHeaderTimeout; circuit break |
| Origin 5xx | 502 Bad Gateway | ErrorHandler; retry on idempotent requests |
| Connection pool exhaustion | Latency spike | Increase MaxIdleConnsPerHost; queue requests |
| Memory leak | Unbounded growth | Always read resp.Body to EOF even if discarding |
| Hop-by-hop not stripped | Protocol negotiation failure | Explicit header removal in Director |
Try It
make lab-02
# Direct origin (no proxy)
curl http://localhost:9001/article/1
# Through proxy
curl http://localhost:8080/article/1 -v
# Compare response headers — should see X-Served-By: proxy
# and X-Forwarded-For header in origin logs
Lab 03 · The First Cache
Run it:
make lab-03
Source:labs/lab-03-first-cache/main.go
The Problem
The reverse proxy in Lab 02 blindly forwards every request to the origin. A cache short-circuits that path: if we’ve seen this URL recently and have a stored response, serve it directly from memory without touching the origin.
The fundamental trade-off: freshness vs. cost. A cached response might be stale, but serving it is:
- Orders of magnitude faster (memory read vs. network round trip)
- Origin-free (no database query, no CPU work)
- Deterministic (no dependency on origin availability)
The Cache Lifecycle: MISS → HIT → EXPIRED
Request arrives
│
▼
┌─────────────────────────────────────┐
│ Lookup key = normalize(URL) │
└─────────────────────────────────────┘
│
├─► Entry not found → MISS
│ │
│ ▼
│ Fetch from origin
│ Store in cache with deadline = now + TTL
│ Return response to client
│
├─► Entry found, not expired → HIT
│ │
│ ▼
│ Return cached response immediately
│
└─► Entry found, expired → EXPIRED (= MISS)
│
▼
Revalidate or re-fetch
Replace cache entry
The X-Cache response header tells the client (and debugging engineers)
which branch was taken:
X-Cache: MISS # first request for this URL
X-Cache: HIT # served from cache
Implementation: sync.Map + TTL
type cacheEntry struct {
response []byte
headers http.Header
status int
expiry time.Time
}
var cache sync.Map // map[string]*cacheEntry
func get(key string) (*cacheEntry, bool) {
v, ok := cache.Load(key)
if !ok { return nil, false }
entry := v.(*cacheEntry)
if time.Now().After(entry.expiry) {
cache.Delete(key) // lazy expiry
return nil, false
}
return entry, true
}
Why sync.Map? The standard map plus sync.RWMutex would work,
but sync.Map is optimized for a specific workload: many reads, few
writes, stable key set. CDN caches have a hot set of URLs that are read
millions of times per second and written (populated) far less often.
sync.Map achieves this via an atomic “read map” that requires no
locking on reads for existing keys.
However, sync.Map has a known weakness: its internal dirty map can
accumulate entries and requires a periodic promotion step. For very
write-heavy caches (cold start, high churn), a sharded map +
sync.RWMutex pattern can be more efficient.
TTL: Where Does It Come From?
In Lab 03 the TTL is hardcoded. Lab 04 shows how to parse it properly
from Cache-Control headers:
Cache-Control: public, max-age=300
→ TTL = 300 seconds
Cache-Control: no-store
→ Do not cache at all
Cache-Control: private
→ Do not store in shared (CDN) cache
Cache-Control: no-cache
→ Store but always revalidate before serving
Ignoring Cache-Control is the #1 cause of CDN misconfiguration.
If you cache a private response, you may serve one user’s data to
another. If you cache no-store, you violate the application’s contract.
Background Sweep: Avoiding Memory Leaks
A cache without eviction grows unboundedly. Lab 03 runs a background goroutine that sweeps expired entries:
go func() {
ticker := time.NewTicker(30 * time.Second)
for range ticker.C {
var expired []string
cache.Range(func(k, v any) bool {
if time.Now().After(v.(*cacheEntry).expiry) {
expired = append(expired, k.(string))
}
return true
})
for _, k := range expired {
cache.Delete(k)
}
}
}()
Note the two-phase delete: first collect expired keys (during which
we hold the range lock), then delete. You cannot modify sync.Map during
a Range iteration.
Production caches use more sophisticated eviction:
| Policy | Description | Use case |
|---|---|---|
| TTL expiry | Remove at expiry | All caches |
| LRU | Evict least-recently-used | Bounded memory (Lab 08) |
| LFU | Evict least-frequently-used | Popularity-skewed workloads |
| ARC | Adaptive Replacement Cache | Self-tuning between LRU and LFU |
| S3-FIFO | Simple, Scalable, Segmented FIFO | Modern alternative to LRU (lower overhead) |
The Deliberate Limitations of Lab 03
The lab explicitly documents what it doesn’t do yet:
- No
Cache-Controlparsing — TTL is hardcoded. Fixed in Lab 04. - No
singleflight— concurrent misses all hammer origin. Fixed in Lab 06. - Unbounded memory — LRU eviction arrives in Lab 08.
- No content negotiation — same key for
Accept-Encoding: gzipandAccept-Encoding: br. Fixed in Lab 05 via Vary. - No conditional requests — always fetches full response, no 304. Fixed in Lab 04.
This incremental approach is pedagogically important: each lab adds exactly one concept so the interaction is clear.
Production Detail: Cache Serialization Format
Real CDN disk caches store responses in compact binary formats. Varnish uses its own VCL-controlled storage. Nginx uses a format with:
[8 bytes: key hash]
[8 bytes: expiry timestamp]
[4 bytes: headers length]
[4 bytes: body length]
[headers (HTTP/1.1 text)]
[body bytes]
Lab 08 uses file-system storage with xxhash-named files, which is functionally equivalent but less efficient (filesystem metadata overhead).
For in-memory caches, Google’s Groupcache and Fastly’s own cache daemon use Protocol Buffers for serialization, enabling:
- Zero-copy responses via
io.WriterTo - Shared-memory between processes
- Binary compatibility across versions
What to Measure
# Hit ratio (requests)
sum(rate(cache_hits_total[5m])) /
sum(rate(cache_requests_total[5m]))
# Miss rate (triggers origin fetches)
rate(cache_misses_total[5m])
# Cache entries currently stored
cache_entries_current
# Evictions (if bounded cache)
rate(cache_evictions_total[5m])
Try It
make lab-03
# First request — should be MISS
curl http://localhost:8080/article/1 -v | grep X-Cache
# Second request — should be HIT (< 1ms)
curl http://localhost:8080/article/1 -v | grep X-Cache
# X-Origin-Hit should only increment on first request
curl http://localhost:8080/article/1 -H "X-Debug: origin-count"
Lab 04 · HTTP Cache Headers
Run it:
make lab-04
Source:labs/lab-04-http-cache-headers/main.go
The Problem
Lab 03 cached everything for a hardcoded TTL. That’s wrong in production: some content must never be cached (authentication tokens), some content is user-specific (shopping carts), and some content changes frequently (live feeds). HTTP defines a rich vocabulary for expressing caching intent — and CDNs are contractually obligated to honor it.
This lab implements the full RFC 7234 caching model: parsing
Cache-Control, handling conditional requests, and generating ETag-based
304 responses.
RFC 7234: The Caching Specification
HTTP caching is defined in RFC 7234 (HTTP/1.1 Caching, 2014), now superseded by RFC 9111 (HTTP Caching, 2022). The spec is 44 pages and covers:
- Freshness: when a cached response can be served without revalidation
- Validation: checking with the origin if the cached copy is still good
- Invalidation: removing cache entries when content changes
The Freshness Calculation
response_is_fresh = (freshness_lifetime > current_age)
freshness_lifetime = max-age directive (if present)
= s-maxage directive (CDN-specific override)
= Expires header - Date header (fallback)
= heuristic: 10% of (Date - Last-Modified) (last resort)
current_age = age_value (from Age header, added by previous CDN)
+ (now - response_time)
The Age header is critical in multi-tier setups: when a shield proxy
serves a cached response to an edge proxy, it adds Age: 120 meaning
“this response is 120 seconds old”. The edge node calculates remaining
freshness as max-age - Age = 300 - 120 = 180 seconds left.
Cache-Control Directives
Request-side (Cache-Control from the client)
| Directive | Meaning |
|---|---|
no-cache | Don’t use cached response; must revalidate |
no-store | Don’t cache this request or its response |
max-age=0 | Treat cached response as stale (same as no-cache in practice) |
max-stale=N | Accept stale response up to N seconds past expiry |
min-fresh=N | Only accept response fresh for at least N more seconds |
only-if-cached | Fail with 504 if no cached copy (offline mode) |
Response-side (Cache-Control from the origin)
| Directive | Meaning |
|---|---|
public | Shared caches (CDN) may store this |
private | Only browser cache; CDN must not store |
no-store | No cache anywhere |
no-cache | Store but always revalidate before use |
max-age=N | Fresh for N seconds |
s-maxage=N | Overrides max-age for CDN/shared caches only |
stale-while-revalidate=N | Serve stale for N seconds while revalidating (lab 07) |
stale-if-error=N | Serve stale for N seconds if origin errors (lab 07) |
immutable | Content won’t change during freshness window (no revalidation) |
must-revalidate | Never serve stale, even if origin is down |
proxy-revalidate | Same as must-revalidate but CDN-specific |
The CDN Override: s-maxage
s-maxage is the CDN operator’s tool. Use it to set a long CDN TTL
while browsers cache for a shorter time:
Cache-Control: public, max-age=60, s-maxage=86400
Browser caches for 60 seconds. CDN caches for 24 hours and refreshes on demand via cache invalidation APIs. This is the standard pattern for versioned static assets where you want browser protection (privacy mode resets) but long CDN caching.
ETags: Content Fingerprinting
An ETag (Entity Tag) is an opaque validator for a specific version of a resource. It can be:
- Strong:
"d41d8cd98f00b204e9800998ecf8427e"— byte-for-byte equality - Weak:
W/"20230101"— semantically equivalent (same meaning, maybe different encoding)
CDN caches store the ETag alongside the response. On the next request (at or after expiry), the cache can send a conditional request:
GET /article/1 HTTP/1.1
If-None-Match: "d41d8cd98f00b204e9800998ecf8427e"
If the content hasn’t changed, the origin returns:
HTTP/1.1 304 Not Modified
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Cache-Control: public, max-age=300
The cache then extends the TTL of the existing response without re-transferring the body. This is a huge bandwidth saving — imagine refreshing a 10 MB PDF that hasn’t changed; you pay ~200 bytes (304 response headers) instead of 10 MB.
ETag Generation Strategies
// MD5 of content (lab 04)
etag := fmt.Sprintf(`"%x"`, md5.Sum(body))
// xxhash (faster, non-cryptographic)
etag := fmt.Sprintf(`"%016x"`, xxhash.Sum64(body))
// Semantic versioning
etag := fmt.Sprintf(`"v%s-%s"`, version, contentHash)
// Timestamp-based (weak)
etag := fmt.Sprintf(`W/"%d"`, lastModified.Unix())
Prefer xxhash over MD5 for performance: xxhash is 10–20× faster and provides sufficient collision resistance for ETags (not a security primitive, just a freshness validator).
Last-Modified / If-Modified-Since
Before ETags existed (HTTP/1.0 era), conditional requests used timestamps:
GET /article/1 HTTP/1.1
If-Modified-Since: Wed, 01 Jan 2025 00:00:00 GMT
Origin returns 304 Not Modified if content hasn’t changed since that
timestamp. This is weaker than ETags because:
- 1-second granularity: If content changes and reverts within 1 second, the cache won’t detect the change.
- Time-zone ambiguity: Distributed systems with clock skew may return stale content.
- Database writes don’t always update mtime: Application-level content may be “modified” logically without a filesystem timestamp change.
Use ETags when possible. Use Last-Modified as a fallback for static
files where the mtime is reliable.
The 304 Path in Practice
CDN cache (entry expired, has ETag stored)
│
│ GET /article/1
│ If-None-Match: "abc123"
▼
Origin
│ → Content unchanged
│ 304 Not Modified
│ ETag: "abc123"
▼
CDN cache
│ → Reset TTL, keep cached body
│ → Serve existing body + new TTL to client
▼
Client receives 200 OK with cached body
The 304 shortcut saves:
- Body transfer bandwidth (origin → CDN)
- Origin CPU/DB work (content generation)
- CDN memory allocation (no new copy of body)
Production Detail: s-maxage=0
A common pattern for HTML pages that reference versioned assets:
Cache-Control: public, max-age=0, s-maxage=0, must-revalidate
“Must revalidate every time, but please cache the ETag so you can use 304 responses.” This ensures pages are always fresh while still using conditional requests to avoid full re-transfers.
Another pattern — Cloudflare’s “Edge Cache TTL” override: even if the
origin sends max-age=0, Cloudflare’s dashboard can set a longer CDN
TTL, overriding the origin’s preference. This is useful for origins you
don’t control. Fastly calls this “Surrogate-Control”:
Surrogate-Control: max-age=86400
Fastly honors Surrogate-Control for CDN TTL and strips it before sending
to browsers. The browser then sees only Cache-Control with shorter TTL.
What to Measure
# 304 rate (healthy revalidation; saves bandwidth)
rate(http_responses_total{status="304"}[5m])
# Ratio of 304 vs full responses (should be 20-50% on well-cached APIs)
rate(http_responses_total{status="304"}[5m]) /
rate(http_responses_total{status="200"}[5m])
# Uncacheable responses (watch for unexpected growth)
rate(cache_store_skipped_total{reason="no-store"}[5m])
rate(cache_store_skipped_total{reason="private"}[5m])
Try It
make lab-04
# Observe Cache-Control header
curl http://localhost:9001/article/1 -v 2>&1 | grep -i cache
# Conditional request (ETag)
ETAG=$(curl http://localhost:9001/article/1 -si | grep -i etag | awk '{print $2}')
curl http://localhost:9001/article/1 -H "If-None-Match: $ETAG" -v
# → Should return 304
# Test no-store (not cached)
curl http://localhost:9001/private/1 -v
Lab 05 · Cache Key Design
Run it:
make lab-05
Source:labs/lab-05-cache-key-design/main.go
The Problem
A cache key is the identifier under which a response is stored and looked up. If the key is wrong, everything downstream breaks:
- Too narrow (same key for different content): serve wrong content to wrong user, or collapse variations into one response.
- Too wide (include irrelevant query params): artificially low hit ratio, wasted storage, repeated origin fetches for identical content.
In production, cache key design is one of the highest-leverage activities a CDN engineer performs. A 10-minute key design review can improve hit ratio from 70% to 95%.
The Basic Key: URL Normalization
The naive key is req.URL.String(). This breaks immediately when:
/article/1?foo=bar # different key from
/article/1?bar=foo # same semantically
Query parameters don’t have a defined order. Two requests for the same resource with parameters in different order are identical, but a naive cache sees them as different URLs.
Normalization steps (applied by Lab 05):
func normalizeKey(u *url.URL) string {
// 1. Lowercase the path
u.Path = strings.ToLower(u.Path)
// 2. Remove tracking parameters
q := u.Query()
for _, p := range trackingParams {
q.Del(p)
}
// 3. Sort remaining parameters deterministically
for k, v := range q {
sort.Strings(v)
}
keys := make([]string, 0, len(q))
for k := range q { keys = append(keys, k) }
sort.Strings(keys)
// Rebuild in sorted order
...
// 4. Strip fragment (#anchor — never sent to server but can appear in
// reconstructed URLs)
u.Fragment = ""
}
Tracking Parameter Pollution
Marketing teams append tracking parameters to every URL. These are meaningless to the origin but fragment your cache:
| Parameter | Source |
|---|---|
utm_source, utm_medium, utm_campaign, utm_term, utm_content | Google Analytics |
fbclid | Facebook click ID |
gclid, gad_source | Google Ads |
mc_eid | Mailchimp email ID |
_ga | Google Analytics cross-domain |
msclkid | Microsoft Ads |
twclid | Twitter click ID |
ref, referral | Generic referral parameters |
Without stripping these, each user who clicks a Facebook ad link (?fbclid=XYZ)
generates a unique cache key even though they want the same article. A
single popular article shared on Facebook could generate millions of unique
cache keys — all for identical content.
Cloudflare, Fastly, and Akamai all maintain curated lists of these parameters and strip them from cache keys by default.
The Vary Header
Vary tells the cache: “this response may differ based on these
request headers”. Example:
Vary: Accept-Encoding
This means there are potentially multiple stored versions of the same URL: one for clients that accept gzip, one for brotli, one for uncompressed.
Key expansion for Vary
cache_key = normalize(url) + "|" + canonicalize(vary_headers)
GET /article/1
Accept-Encoding: br → key: "/article/1|br"
Accept-Encoding: gzip → key: "/article/1|gzip"
Accept-Encoding: (absent) → key: "/article/1|"
Common Vary values and their implications:
| Vary value | Cache behavior | Risk |
|---|---|---|
Accept-Encoding | Store one copy per encoding | Fine; well-enumerated set |
Accept-Language | Store per language | Can explode: 50+ languages |
User-Agent | Store per UA string | Catastrophic; millions of unique strings |
Cookie | Store per cookie | Never do this on shared cache |
Authorization | Per auth token | Response must also be private |
Vary: * means the response is unique per request and must never be
cached in a shared cache. CDNs treat it as uncacheable.
Vary: User-Agent is the most destructive mistake in CDN history.
Nginx docs used to recommend it for mobile detection, causing cache hit
ratios to collapse to near 0% as every browser sent a unique UA string.
The fix: perform device detection at the origin and emit Vary: User-Agent
only when necessary, or better, use a normalized X-Device-Type: mobile|desktop
header in a custom Vary.
Cache Keying in Production CDNs
Cloudflare Cache Rules
Cloudflare provides a Cache Rules UI and API to configure:
Cache Rule: "Strip marketing params"
When: hostname matches "example.com"
Cache Key: exclude query strings "utm_*", "fbclid", "gclid"
Fastly VCL
Fastly’s VCL (Varnish Configuration Language) gives full control:
sub vcl_hash {
# Normalize host
set req.hash += req.http.host;
# Normalize path (lowercase)
set req.hash += regsuball(req.url.path, "[A-Z]", {"\L&"});
# Strip tracking params from hash
declare local var.qs STRING;
set var.qs = regsuball(req.url.qs,
"(?:^|&)(?:utm_[^=]*|fbclid|gclid)[^&]*", "");
set req.hash += regsub(var.qs, "^&", "");
return(hash);
}
Akamai
Akamai uses “cache ID” rules configured via Property Manager or the APIs. Key parameters can be included/excluded per URL pattern.
Naive vs. Smart Key: The Hit Ratio Impact
The lab demonstrates this directly. Given traffic:
/article/1?utm_source=google&utm_medium=cpc
/article/1?utm_source=facebook&utm_medium=social
/article/1?utm_source=twitter
/article/1 ← direct visit
| Key strategy | Cache hits | Hit ratio |
|---|---|---|
| Naive (full URL) | 0/4 | 0% |
| Strip utm_* | 3/4 | 75% |
| Strip utm_* + normalize | 4/4 | 100% |
In production with millions of requests, this difference is the difference between a $200/month origin bill and a $20,000/month one.
Production Checklist: Cache Key Design
- Strip all known tracking parameters
- Sort query string parameters alphabetically
- Lowercase path components
- Handle
Varyexplicitly per resource type - Never vary on
User-AgentorCookiein shared cache - Use
Surrogate-Controlto set CDN-specific TTLs - Test key normalization with realistic traffic samples
Try It
make lab-05
# Same content, different tracking params — should be HIT on second request
curl "http://localhost:8080/article/1?utm_source=google"
curl "http://localhost:8080/article/1?utm_source=facebook"
# Second request should be a cache HIT
# Different Accept-Encoding → different Vary bucket
curl "http://localhost:8080/article/1" -H "Accept-Encoding: gzip"
curl "http://localhost:8080/article/1" -H "Accept-Encoding: br"
# Both should be stored separately
Lab 06 · The Thundering Herd
Run it:
make lab-06
Source:labs/lab-06-thundering-herd/main.go
The Problem
You have a cache. A popular URL’s TTL expires. In the next millisecond, 800 concurrent requests arrive for that URL. Your cache code:
func handler(w http.ResponseWriter, r *http.Request) {
if entry, ok := cache.Get(key); ok {
write(w, entry) // HIT
return
}
// MISS — 800 goroutines all reach here simultaneously
resp := fetch(key) // 800 fetches to origin — boom
cache.Set(key, resp)
}
This is the thundering herd (also: cache stampede, dog pile effect). A single cache expiry event becomes an instant DDoS against your origin.
At YouTube scale (2023), a single CDN node may have 10,000 concurrent viewers for a popular video. When that cache entry expires, 10,000 simultaneous origin requests arrive in sub-millisecond windows. Most origins cannot handle this.
Solution: singleflight.Group
Go’s golang.org/x/sync/singleflight package provides the exact
primitive needed: request collapsing (or request deduplication).
import "golang.org/x/sync/singleflight"
var group singleflight.Group
func fetch(key string) ([]byte, error) {
result, err, shared := group.Do(key, func() (interface{}, error) {
// This function runs ONCE, no matter how many concurrent
// callers invoke group.Do with the same key.
return fetchFromOrigin(key)
})
// 'shared' is true if this result was returned to multiple callers
return result.([]byte), err
}
The semantics:
- First caller with a given key triggers the actual fetch
- All subsequent callers with the same key block and wait for the first caller’s result
- When the fetch completes, all waiting callers receive the same result
- No extra origin requests are made
800 concurrent misses for /video/popular
│
├── Goroutine 1: starts group.Do("video/popular") → actual fetch
├── Goroutine 2: group.Do("video/popular") → blocks, waiting
├── Goroutine 3: group.Do("video/popular") → blocks, waiting
├── ...
└── Goroutine 800: group.Do("video/popular") → blocks, waiting
[~80ms later: origin responds]
└── All 800 goroutines receive the same result simultaneously
→ 1 origin request, not 800
The shared Return Value
group.Do returns three values: (v interface{}, err error, shared bool).
shared is true if the result was shared with other callers. This is
useful for metrics — you can measure how many requests were collapsed:
result, err, shared := group.Do(key, fetch)
if shared {
collapsedRequestsTotal.Inc()
}
Monitoring collapsed requests reveals thundering herd intensity. If you see thousands of requests being collapsed per second, your TTLs may be too low or your TTLs are expiring synchronously (lab 07 addresses this with staggered expiry).
Cascade Failure Without Singleflight
The lab demonstrates what happens without singleflight:
800 requests arrive at t=0
→ 800 goroutines all observe cache miss
→ 800 goroutines all call fetchFromOrigin()
→ Origin receives 800 simultaneous connections
→ Origin CPU spikes to 100%
→ Origin response time increases from 80ms to 5000ms
→ Each 800-request wave takes 5s instead of 80ms
→ Next cache entry expires while previous wave is still in-flight
→ Another 800 requests stampede
→ Origin never recovers (cascade failure)
This is a well-documented pattern in distributed systems and the root cause of many high-profile outages. Facebook described this exact failure mode in their 2010 memcache paper. Reddit’s 2012 outage was triggered by a thundering herd on a database backing a cached list.
Beyond singleflight: Production Patterns
1. Probabilistic Early Refresh (XFetch)
Instead of waiting for expiry, refresh slightly before expiry using probabilistic jitter. Each request has a small probability of triggering a background refresh:
// XFetch algorithm (Vattani et al., 2015)
func shouldRefresh(expiry time.Time, lastFetchDuration time.Duration, beta float64) bool {
remaining := time.Until(expiry).Seconds()
delta := lastFetchDuration.Seconds()
return -delta * beta * math.Log(rand.Float64()) > remaining
}
When remaining approaches 0, math.Log(rand.Float64()) (which is
negative) is multiplied by delta * beta (positive), and the result
becomes likely to exceed remaining. Higher beta = more aggressive
prefetching. This guarantees the cache is almost always warm.
2. Mutex per key (fine-grained locking)
type keyedMutex struct {
mu sync.Mutex
locks map[string]*sync.Mutex
}
func (km *keyedMutex) Lock(key string) {
km.mu.Lock()
l, ok := km.locks[key]
if !ok {
l = &sync.Mutex{}
km.locks[key] = l
}
km.mu.Unlock()
l.Lock()
}
Only one goroutine per key can fetch from origin. Others wait.
Simpler to reason about than singleflight but no result sharing
(each waiter re-fetches independently when the mutex is released).
3. Background refresh with locked TTL
Keep serving the stale entry while refreshing in background, preventing
any thundering herd entirely. See Lab 07 for the full stale-while-revalidate
implementation.
singleflight vs. Caching
Note that singleflight is not a cache. It deduplicates in-flight
requests, but once the first request completes, new requests will start
a new group.Do call (the key is removed from the group after completion).
singleflight + cache is the correct combination:
Request arrives
│
▼
Cache hit? → serve immediately
│ miss
▼
group.Do(key, fetch) → one origin request, all callers get result
│
▼
cache.Set(key, result, ttl)
│
▼
All waiters serve result
What to Measure
# Collapsed requests (singleflight saves)
rate(singleflight_shared_total[1m])
# Origin request rate — should be orders of magnitude lower than edge rate
rate(origin_requests_total[1m])
# Collapse ratio: how many requests per origin fetch
rate(edge_requests_total[1m]) / rate(origin_requests_total[1m])
# Target: 50-500x depending on content popularity
Try It
make lab-06
# The lab fires 800 concurrent requests to a cold cache entry
# Watch the output — it will show:
# "800 concurrent requests → 1 origin request"
# vs the naive version which fires all 800
Lab 07 · Stale Content & RFC 5861
Run it:
make lab-07
Source:labs/lab-07-stale-content/main.go
The Problem
When a cache entry expires, the CDN must go to the origin to get a fresh copy. During that fetch (typically 80–500 ms), what should the CDN do with incoming requests for that resource?
Two bad options:
- Block: Hold all requests until the origin responds. Adds 80–500 ms latency to the first request after every expiry. Under load, hundreds of requests pile up.
- Return 503: Refuse to serve. This is almost never acceptable.
The right option: serve the stale response while revalidating in the background. Users get a response immediately; the cache updates asynchronously.
This is the core idea of RFC 5861: HTTP Cache-Control Extensions for Stale Content.
RFC 5861: stale-while-revalidate
Cache-Control: max-age=60, stale-while-revalidate=30
Semantics:
- Fresh (first 60s): serve from cache without consulting origin
- Stale-while-revalidate (seconds 61–90): serve stale immediately, and trigger a background revalidation
- Hard stale (after 90s): must wait for fresh copy
Time (seconds)
0 60 90 ∞
|─ fresh ──|─ SWR ──|─ stale ─|
t=0: First fetch. Cache stores response.
t=30: Request → cache HIT (fresh)
t=65: Request → cache MISS (expired, within SWR window)
→ Serve stale immediately (0ms latency spike!)
→ Background goroutine fires origin request
→ When origin responds, update cache entry
t=95: Request → beyond SWR window → must fetch fresh before responding
The user at t=65 receives a response that is 5 seconds old — invisible difference. The user at t=95 waits ~80ms for the origin response.
RFC 5861: stale-if-error
Cache-Control: max-age=60, stale-if-error=86400
If the origin returns a 5xx error, or is unreachable (connection refused, timeout), serve the stale cached copy for up to 86400 seconds (24 hours) beyond the original expiry.
This is the “graceful degradation” directive. Your CDN continues serving content even when the origin is completely down, for up to the specified duration.
t=0: Origin healthy. Cache populated.
t=100: Cache entry expired (max-age=60 passed)
t=101: Origin request → 500 Internal Server Error
→ stale-if-error: serve cached copy from t=0
→ Client sees their article, not a 500 error
t=86460: stale-if-error window expired
→ If origin still down → return 502 Bad Gateway
The Background Revalidation Pattern
type cacheEntry struct {
response *http.Response
body []byte
expiry time.Time
swrDeadline time.Time // expiry + stale-while-revalidate
sieDeadline time.Time // expiry + stale-if-error
revalidating atomic.Bool
}
func (c *Cache) get(key string) (*cacheEntry, string) {
entry := c.store[key]
now := time.Now()
if now.Before(entry.expiry) {
return entry, "HIT"
}
if now.Before(entry.swrDeadline) {
// Serve stale, kick off background refresh
if entry.revalidating.CompareAndSwap(false, true) {
go c.revalidate(key, entry)
}
return entry, "STALE-REVALIDATING"
}
return nil, "EXPIRED" // must wait for fresh copy
}
func (c *Cache) revalidate(key string, old *cacheEntry) {
defer old.revalidating.Store(false)
resp, err := fetch(key)
if err != nil {
// Background refresh failed; stale-if-error logic handles it
return
}
c.store[key] = newEntry(resp)
}
The atomic.Bool on revalidating prevents multiple background
goroutines for the same key — the first one wins, subsequent SWR requests
see revalidating=true and skip launching another goroutine.
Stale-if-Error in Practice
The lab simulates origin failure with a configurable error rate flag.
With stale-if-error, the sequence is:
1. Normal operation: cache populated, served fresh
2. Origin starts returning 503 (simulated)
3. CDN: entry is expired + origin erroring
→ Is there a stale-if-error window?
→ Yes → serve stale, log "STALE-ERROR"
4. Origin recovers
→ Next background revalidation succeeds
→ Cache entry updated
5. Normal operation resumes, no user-visible error
This is origin availability decoupled from user experience. For most content (articles, product pages, media), brief staleness is far preferable to a visible error.
When Not to Use stale-while-revalidate
Some content must always be fresh:
| Content type | Use SWR? | Reason |
|---|---|---|
| News articles | ✓ (short window) | Mild staleness acceptable |
| Product pages | ✓ | Price/stock staleness OK for seconds |
| Authentication state | ✗ | Must be current |
| Payment/checkout | ✗ | Cannot serve stale price |
| Medical information | ✗ | Accuracy is legal requirement |
| Real-time scores/feeds | ✗ (or very short) | Value proposition is freshness |
Use Cache-Control: no-cache combined with conditional requests (Lab 04)
instead of SWR for content where freshness is the product.
Production Detail: Cloudflare’s Default SWR
Cloudflare silently applies always online (a form of stale-if-error)
for all cached content by default, serving a cached copy for up to 10
minutes if the origin returns a 5xx error. This is enabled by default
and can be disabled per-zone.
Cloudflare also exposes stale-while-revalidate support since 2022,
honoring the directive from origin responses.
Fastly’s equivalent is Shielding + Grace period: cached objects can be served from shield for up to a configurable grace period while background revalidation happens.
Timeline Annotation (What the Lab Prints)
t=0s: /article/1 → MISS → fetch → store (TTL=30s, SWR=15s, SIE=300s)
t=5s: /article/1 → HIT (25s remaining)
t=30s: /article/1 → HIT (0s remaining, within SWR window)
→ background revalidation started
t=31s: /article/1 → HIT (background revalidation complete, reset TTL)
[origin down]
t=50s: /article/1 → expires → origin 503
→ STALE-IF-ERROR (250s remaining in SIE window)
[origin up]
t=60s: background revalidation succeeds → back to normal HIT
Try It
make lab-07
# Normal behavior: articles served stale during background revalidation
curl http://localhost:8080/article/1
sleep 35
curl http://localhost:8080/article/1 # served stale, triggers background refresh
curl http://localhost:8080/article/1 # now fresh again
# Simulate origin down:
# Restart lab with --error-rate 1.0 to see stale-if-error in action
Lab 08 · Tiered Cache: Memory + Disk
Run it:
make lab-08
Source:labs/lab-08-tiered-cache/main.go
The Problem
A single in-memory cache has two opposing constraints:
- Memory is fast but limited: A 32 GB edge node has ~28 GB usable after OS. At an average response size of 10 KB, that’s ~2.8 million cached objects.
- Disk is large but slower: NVMe at 500 µs is 50× slower than DRAM at 100 ns — but a 4 TB NVMe holds 400 million 10 KB objects.
The solution is a two-tier cache: hot objects in L1 memory, warm objects in L2 disk. On an L1 miss, check L2 before going to origin.
Request
│
▼
L1 (Memory LRU) ─ HIT (< 1 µs) → response
│ miss
▼
L2 (Disk NVMe) ─ HIT (< 1 ms) → promote to L1 → response
│ miss
▼
Origin ─ fetch (80+ ms) → store in L2 → promote to L1 → response
This is the architecture of Nginx’s proxy_cache with memory + file
tiers, Varnish’s Massive Storage Engine (MSE), and Cloudflare’s tiered
cache storage.
L1: LRU Memory Cache
The LRU Data Structure
LRU (Least Recently Used) eviction requires O(1) Get and O(1) Put.
The classic implementation uses a doubly-linked list + hash map:
Hash map: key → *list.Element (O(1) lookup by key)
List: MRU [elem4, elem2, elem1, elem3] LRU (O(1) move to front, remove from back)
On Get(key):
1. Lookup element in hash map O(1)
2. Move element to front of list O(1) ← "recently used"
3. Return value
On Put(key, value):
1. If key exists: update + move to front
2. If cache full: remove LRU (list.Back()), delete from hash map
3. Insert new element at front
Go’s container/list provides the doubly-linked list. Combined with
a sync.RWMutex for concurrent access:
type LRUCache struct {
cap int
mu sync.RWMutex
list *list.List
items map[string]*list.Element
}
type item struct {
key string
value []byte
expiry time.Time
}
func (c *LRUCache) Get(key string) ([]byte, bool) {
c.mu.Lock()
defer c.mu.Unlock()
el, ok := c.items[key]
if !ok { return nil, false }
entry := el.Value.(*item)
if time.Now().After(entry.expiry) {
c.list.Remove(el)
delete(c.items, key)
return nil, false
}
c.list.MoveToFront(el) // Mark as recently used
return entry.value, true
}
Note: Get takes a write lock (not read lock) because it modifies the
list order. This is a subtle but important point: LRU Get is a mutation.
Solutions include:
- Accept write lock on every Get (simple, correct, ~50 ns per lock)
- Use a CLOCK algorithm (approximate LRU, read-only Get — used by Linux VM)
- Use a concurrent LRU like Ristretto (shard by key hash, reduces lock contention)
L2: Disk Cache with xxhash
Key-to-Filename Mapping
Storing arbitrary URL strings as filenames is error-prone (path traversal, length limits, special characters). Map the key to a hash:
import "github.com/cespare/xxhash/v2"
func diskPath(dir, key string) string {
h := xxhash.Sum64String(key)
// Use a two-level directory structure to avoid large directories:
// "a1b2c3d4e5f60718" → "a1/b2/a1b2c3d4e5f60718"
hex := fmt.Sprintf("%016x", h)
return filepath.Join(dir, hex[:2], hex[2:4], hex)
}
The two-level directory sharding (common in systems like git’s object store) prevents the filesystem from having too many entries in a single directory. ext4 directories degrade at ~10 million entries; most filesystems struggle past 100k entries in a flat dir.
Why xxhash, Not MD5/SHA1?
| Hash | Throughput | Output size | Collision resistance |
|---|---|---|---|
| MD5 | ~500 MB/s | 128 bit | Low (broken for crypto) |
| SHA-1 | ~300 MB/s | 160 bit | Low (broken for crypto) |
| SHA-256 | ~200 MB/s | 256 bit | Cryptographically strong |
| xxhash64 | ~30 GB/s | 64 bit | Sufficient for cache keys |
| xxhash128 | ~30 GB/s | 128 bit | Sufficient for cache keys |
For cache key → filename, we don’t need cryptographic resistance — we need speed and low collision probability. With 10 million cached items, xxhash64’s 64-bit space (1.8 × 10^19) gives a collision probability of ~2.7 × 10^-9. Acceptable.
xxhash is also used internally by Fastly and in ClickHouse, Kafka, and numerous storage systems.
Atomic File Writes
To prevent a partially-written cache file from being read:
func writeDisk(path string, data []byte) error {
// Write to temp file first
tmp := path + ".tmp"
if err := os.WriteFile(tmp, data, 0644); err != nil {
return err
}
// Atomic rename: no reader can see a partial write
return os.Rename(tmp, path)
}
os.Rename is atomic on POSIX systems (guaranteed by the kernel).
This is the same technique used by SQLite (WAL mode), databases, and
every serious storage system.
Cache Promotion
When an item is fetched from L2 (disk) to serve a request, promote it to L1 (memory) so subsequent requests for the same item are served from memory:
if data, ok := l2.Get(key); ok {
l1.Put(key, data, ttl) // promote to memory
return data, "L2-HIT"
}
Promotion policies to consider:
- Always promote: simple; hot items quickly migrate to L1
- Promote after N hits: avoids polluting L1 with items only needed once
- Promote based on size: don’t promote large files to L1 (wastes memory with low incremental benefit)
Production Numbers
| Tier | Technology | Latency | Size (single node) |
|---|---|---|---|
| L1 memory | DRAM | < 1 µs | 4–128 GB |
| L1.5 NVMe | PCIe 4.0 NVMe | 50–200 µs | 1–8 TB |
| L2 disk | HDD RAID | 3–10 ms | 4–100 TB |
| L3 shield | Origin Shield PoP | 5–20 ms | 100s of TB |
| Origin | App server / S3 | 50–500 ms | unlimited |
Cloudflare uses NVMe SSDs as L2 across all PoPs, with the L1 being in-process memory (implemented in Rust/C++). The transition from HDD to NVMe across CDN infrastructure (2015–2020) reduced L2 miss penalty by ~50× and enabled much larger object counts.
Try It
make lab-08
# First request: L2 miss → origin fetch → stored in L2 + promoted to L1
curl http://localhost:8080/article/1
# Second request: L1 HIT (< 1µs)
curl http://localhost:8080/article/1
# Kill and restart the process — L1 (memory) is gone but L2 (disk) remains
# Restart: first request should be L2 HIT, then promoted to L1
Lab 09 · Cache Tags & Bulk Purge
Run it:
make lab-09
Source:labs/lab-09-cache-tags/main.go
The Problem
When content changes, you need to invalidate the cached copies. The naive approach is to invalidate by URL:
DELETE /cache/article/42
But /article/42 might appear under many URLs:
/article/42
/article/42?format=mobile
/api/v1/articles/42
/api/v2/articles/42
/feed/latest ← includes article 42's content
/user/123/posts ← includes article 42 if user 123 wrote it
/search?q=keyword ← search results containing article 42
You can’t enumerate every affected URL. Content relationships are
graph-shaped, not path-shaped. You need a way to say: “invalidate
everything tagged with article-42.”
Surrogate Keys / Cache Tags
A cache tag (also: surrogate key, soft purge tag) is a label you apply to one or more cache entries. When content changes, you purge by tag, and all tagged entries are invalidated simultaneously.
The origin sets tags via a response header:
Surrogate-Key: article-42 author-123 category-tech
or the equivalent headers used by different vendors:
| Vendor | Header |
|---|---|
| Fastly | Surrogate-Key |
| Cloudflare | Cache-Tag |
| Akamai | Edge-Control: tag |
| Varnish | X-Tags |
| AWS CloudFront | (custom Lambda@Edge) |
The CDN strips this header from responses sent to browsers (it’s a CDN-internal directive) and maintains a tag → URL mapping internally.
Data Structure: Tag → URL Mapping
type TagStore struct {
mu sync.RWMutex
tagToURLs map[string]map[string]struct{} // tag → set of URLs
urlToTags map[string][]string // URL → list of tags
}
func (s *TagStore) Tag(url string, tags []string) {
s.mu.Lock()
defer s.mu.Unlock()
s.urlToTags[url] = tags
for _, tag := range tags {
if s.tagToURLs[tag] == nil {
s.tagToURLs[tag] = make(map[string]struct{})
}
s.tagToURLs[tag][url] = struct{}{}
}
}
func (s *TagStore) PurgeByTag(tag string) []string {
s.mu.Lock()
defer s.mu.Unlock()
urls := s.tagToURLs[tag]
purged := make([]string, 0, len(urls))
for url := range urls {
purged = append(purged, url)
// Remove reverse mapping
for _, t := range s.urlToTags[url] {
delete(s.tagToURLs[t], url)
}
delete(s.urlToTags, url)
}
delete(s.tagToURLs, tag)
return purged
}
The Purge API
POST /cache/purge
Content-Type: application/json
{"tags": ["article-42", "author-123"]}
Response:
{
"purged_urls": [
"/article/42",
"/article/42?format=mobile",
"/api/v1/articles/42",
"/feed/latest"
],
"count": 4
}
The CDN removes those entries from L1 and L2 storage. New requests will trigger origin fetches to repopulate.
Consistency Challenge: Distributed Purge
In a single-node setup, purge is a local operation. In a multi-PoP CDN, a purge request must propagate to every node that may have cached the tagged content.
Approaches:
1. Central purge broadcast
Application → Purge API → Central coordinator
│
┌─────────┼─────────┐
▼ ▼ ▼
NYC-01 LHR-01 NRT-01
Simple, but the coordinator is a single point of failure. Latency from coordinator to distant PoPs can be 100–200 ms, meaning stale content is served during the propagation window.
2. Gossip-based purge (Lab 14)
Purge messages propagate using epidemic (gossip) protocol. Each node tells a random subset of peers about the purge. Within O(log N) rounds, all nodes are notified. At scale (100+ nodes), gossip is more robust than central broadcast.
3. Versioned cache keys
Embed a content version in the cache key:
cache key = normalize(url) + "|v=" + content_version
When content changes, increment content_version at the application
layer. Old entries never get accessed again (they’re naturally evicted).
No explicit purge needed. Purge becomes a no-op for versioned content.
This is how Google Cloud CDN and most “CDN for static assets” setups
work: immutable assets with fingerprinted URLs (main.abc123.js).
Real-World Usage: CMS Integration
WordPress → publishes article update
→ WP plugin fires: POST /cdn/purge {"tags": ["post-42", "category-8", "tag-php"]}
→ CDN removes:
- /2025/01/article-about-php
- /category/php/
- /tag/php/
- / ← homepage (if it shows recent posts)
- /sitemap.xml
- RSS feed entries
Drupal, WordPress, and most CMSs have plugins for Fastly, Cloudflare, and Akamai that fire these purges on content save/publish events.
At The New York Times, Fastly Surrogate-Key purge is used to invalidate all representations of an article simultaneously — the canonical URL, AMP version, app API response, and share preview — with a single purge call containing the article’s surrogate key.
Tag Design Best Practices
| Pattern | Example | Notes |
|---|---|---|
| Entity ID | article-42 | Always tag with entity type + ID |
| Entity type | articles | Purge all articles in one call |
| Author | author-123 | Invalidate author profile changes |
| Category | cat-tech | Category page + all articles in it |
| Layout template | template-homepage | If homepage template changes |
| API version | api-v1 | Deprecating an API endpoint |
Don’t create tags with high cardinality as single values — e.g., a tag per user session is meaningless for shared CDN cache.
Try It
make lab-09
# Tag gets automatically set on origin responses
curl http://localhost:8080/article/42 -v
# Look for Surrogate-Key in origin response (stripped from CDN response to client)
# Article served from cache (HIT)
curl http://localhost:8080/article/42
# Purge by tag → invalidates all tagged entries
curl -X POST http://localhost:8080/cache/purge \
-H "Content-Type: application/json" \
-d '{"tags": ["article-42"]}'
# Next request should be a MISS (content re-fetched from origin)
curl http://localhost:8080/article/42
Lab 10 · Compression
Run it:
make lab-10
Source:labs/lab-10-compression/main.go
The Problem
Network bandwidth is neither free nor unlimited. Compressing HTTP responses before delivery:
- Reduces latency: smaller payload = faster transfer, especially on mobile networks (LTE: ~20 Mbps, high latency)
- Saves egress cost: CDN egress pricing ($0.01–0.09/GB); compression typically achieves 60–80% size reduction on text
- Improves user experience: a 500 KB page compressed to 120 KB loads 4× faster on a 1 Mbps mobile connection
The CDN is uniquely positioned to apply compression because:
- It has fast CPUs dedicated to edge functions
- Pre-compressing on cache store amortizes CPU cost over many serves
- Origin doesn’t need to compress repeatedly for each request
HTTP Content Negotiation
The client advertises its supported encodings:
Accept-Encoding: br, gzip, deflate, zstd;q=0.9
The server selects from the client’s list and responds with:
Content-Encoding: br
Content-Length: 12340
Vary: Accept-Encoding
Quality Values (q-values)
The ;q=N suffix is a preference weight from 0 to 1. The CDN should
select the encoding with the highest q-value that it supports:
Accept-Encoding: br;q=1.0, gzip;q=0.9, *;q=0.5
→ Prefer br, then gzip, then any other encoding
The Three Encodings
gzip (RFC 1952 + deflate)
The universal standard. Every HTTP client built since 1997 supports gzip. Wrap deflate (DEFLATE algorithm) with a CRC-32 checksum.
compression ratio: ~67% (text) 3 KB HTML → ~1 KB
throughput: ~400 MB/s (klauspost/compress implementation)
gzip is based on LZ77 sliding window compression + Huffman coding. The sliding window size (8 KB–32 KB) controls compression ratio vs. memory. Larger window = better ratio, more memory.
Brotli (RFC 7932)
Developed by Google, released 2015. Designed specifically for HTTP text compression. Uses a pre-built dictionary of common HTML/CSS/JS tokens plus the standard DEFLATE approach.
compression ratio: ~82% (text) 3 KB HTML → ~540 bytes (≈15% better than gzip)
throughput: ~300 MB/s
browser support: all modern browsers (IE 11 and below: no)
Brotli at quality level 11 (max) achieves the best ratio but is very slow to compress (~10 MB/s). CDNs typically use quality 4–6 for on-the-fly compression and quality 11 for pre-compressed static assets.
Zstd (RFC 8478)
Facebook’s Zstandard, released 2016. Extremely fast decompression.
compression ratio: ~70–80% (text)
throughput: ~2 GB/s compression, ~5 GB/s decompression
use case: origin-to-CDN links, inter-datacenter transfers
Zstd is not yet universally supported in browsers (Chrome only, 2023). Its main CDN use case is origin-to-edge compression: Cloudflare uses zstd between their edge nodes and origin servers where both ends are controlled.
Storage Strategies
1. Store compressed, serve compressed
Store one compressed version per encoding. On request, check
Accept-Encoding and serve the matching stored version:
type cacheEntry struct {
rawBody []byte // uncompressed
gzipBody []byte // gzip compressed
brotliBody []byte // brotli compressed
}
- Pros: zero per-request CPU for compression
- Cons: 2–3× storage overhead (each encoding stored separately)
- Best for: static assets with long TTL, high request volume
2. Compress on-the-fly
Store uncompressed. Compress each response at serve time:
func compressResponse(w io.Writer, body []byte, encoding string) error {
switch encoding {
case "br":
bw := brotli.NewWriterLevel(w, brotli.DefaultCompression)
defer bw.Close()
_, err := bw.Write(body)
return err
case "gzip":
gw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
defer gw.Close()
_, err := gw.Write(body)
return err
}
_, err := w.Write(body)
return err
}
- Pros: 1× storage, always fresh compression
- Cons: CPU cost per request (~1 µs/KB for gzip, ~3 µs/KB for brotli)
- Best for: dynamic content with short TTL, low repetition
3. Pre-compressed at origin
Origin stores pre-compressed files:
/assets/app.js → not pre-compressed
/assets/app.js.gz → gzip pre-compressed
/assets/app.js.br → brotli pre-compressed
CDN serves app.js.gz or app.js.br based on Accept-Encoding.
No CPU overhead at CDN. Common for static site CDNs (S3 + CloudFront).
The Vary: Accept-Encoding Requirement
When the CDN stores multiple encodings of the same URL, it must include
Vary: Accept-Encoding in responses. This tells downstream caches (browsers,
ISP proxies) that the response differs by encoding.
Without Vary, a browser may cache the gzip version and later send it to
a client that only supports plain text — corrupted response.
Also: the CDN must maintain separate cache entries keyed by encoding.
See Lab 05 for how the cache key is expanded using Vary headers.
What Not to Compress
| Content type | Compress? | Reason |
|---|---|---|
| HTML, CSS, JS | ✓ always | High text entropy; 60–80% savings |
| JSON APIs | ✓ always | Often compresses 5–10× |
| SVG, XML | ✓ | XML is verbose |
| JPEG, PNG, WebP | ✗ | Already compressed; gzip adds overhead |
| MP4, WebM | ✗ | Already compressed |
| ✗ | Usually already compressed internally | |
| Already gzipped | ✗ | Double-compressing = larger output |
| < 1 KB | Optional | Overhead exceeds savings |
The CDN should check Content-Type before compressing and skip binary
formats. Most CDNs have a built-in list of compressible MIME types.
Compression Savings Calculator
For a site serving 1 TB/month with 70% text responses (700 GB):
gzip saves 67%: 700 GB × 0.67 = 469 GB saved
At $0.05/GB egress: 469 GB × $0.05 = $23.45/month saved
brotli saves 82%: 700 GB × 0.82 = 574 GB saved
At $0.05/GB egress: 574 GB × $0.05 = $28.70/month saved
At petabyte scale (Netflix, YouTube), compression savings run to millions of dollars per month.
Try It
make lab-10
# Request with brotli (best compression)
curl http://localhost:8080/article/1 -H "Accept-Encoding: br" \
-v --output /dev/null 2>&1 | grep -i "content-encoding"
# Request with gzip
curl http://localhost:8080/article/1 -H "Accept-Encoding: gzip" \
--compressed -v
# No compression (compare sizes)
curl http://localhost:8080/article/1 -H "Accept-Encoding: identity" -v
# Compare content lengths:
for enc in br gzip identity; do
echo -n "$enc: "
curl -s http://localhost:8080/article/1 -H "Accept-Encoding: $enc" \
-o /tmp/response -w "%{size_download} bytes\n"
done
Lab 11 · Range Requests & Byte Serving
Run it:
make lab-11
Source:labs/lab-11-range-requests/main.go
The Problem
A user seeks to 00:45:00 in a 4-hour movie. The full file is 4 GB. Without range requests, the client must:
- Start streaming from the beginning
- Buffer through 2.8 GB before reaching the 45-minute mark
- Or re-download the entire file after a connection drop
This is obviously unacceptable. HTTP Range Requests (RFC 7233) solve this by allowing clients to request specific byte ranges of a resource.
HTTP Range Requests: RFC 7233
The Request
GET /video/movie.mp4 HTTP/1.1
Range: bytes=2097152-4194303
Requesting bytes 2,097,152 to 4,194,303 (a 2 MB chunk starting at the 2 MB mark).
The Response
HTTP/1.1 206 Partial Content
Content-Range: bytes 2097152-4194303/10485760
Content-Length: 2097152
Content-Type: video/mp4
206 Partial Content indicates a successful range request. The full
resource size is 10,485,760 (10 MB) in this example.
Byte Range Syntax
| Format | Meaning |
|---|---|
bytes=0-499 | First 500 bytes |
bytes=500-999 | Second 500 bytes |
bytes=-500 | Last 500 bytes |
bytes=9500- | Bytes from 9500 to end |
bytes=0-0,-1 | First and last byte |
Multipart Range Responses
A single request can ask for multiple disjoint ranges:
Range: bytes=0-50, 100-150
Response:
HTTP/1.1 206 Partial Content
Content-Type: multipart/byteranges; boundary=3d6b6a416f9b5
--3d6b6a416f9b5
Content-Type: text/plain
Content-Range: bytes 0-50/1270
[50 bytes of data]
--3d6b6a416f9b5
Content-Type: text/plain
Content-Range: bytes 100-150/1270
[50 bytes of data]
--3d6b6a416f9b5--
Multipart ranges are rarely used in practice — most video players request sequential single ranges.
Accept-Ranges Header
The server advertises range request support:
Accept-Ranges: bytes
If absent or Accept-Ranges: none, the client knows not to bother with
range requests.
How the CDN Handles Range Requests
Case 1: Full object cached
If the CDN has the full object cached, it can serve any range locally without contacting the origin:
func serveRange(w http.ResponseWriter, r *http.Request, body []byte) {
// Parse Range header
start, end := parseRange(r.Header.Get("Range"), len(body))
w.Header().Set("Content-Range", fmt.Sprintf("bytes %d-%d/%d", start, end, len(body)))
w.Header().Set("Content-Length", strconv.Itoa(end-start+1))
w.WriteHeader(http.StatusPartialContent)
w.Write(body[start : end+1])
}
Case 2: Object not cached (miss)
The CDN must fetch the object from origin. Two strategies:
-
Fetch full object: Request the complete file, cache it, serve the requested range. Simpler, but wasteful if the user only watches 5 minutes of a 4-hour movie.
-
Forward range request: Pass the
Rangeheader upstream. The origin returns the exact bytes requested, which the CDN serves and caches. Problem: the CDN now has a partial object cached. Subsequent requests for different ranges must all go to origin.
Most CDNs use a hybrid approach: Range request forwarding with background fetch of the full object. Serve the requested range immediately (low TTFB), fetch the rest in background for future requests.
Case 3: Partial object cached (common in video)
A popular approach is segment-based caching: the CDN maps the byte range to a fixed-size segment (e.g., 1 MB) and caches segments independently. Any range request is decomposed into cached segments plus (at most) two uncached boundary segments.
This is what Akamai Adaptive Media Delivery and AWS CloudFront Media Store do internally for large video files.
Video Player Behavior
HTML5 video players (<video>) issue range requests with a characteristic
pattern:
- Initial fetch:
Range: bytes=0-65535(first 64 KB — moov atom for MP4) - Seek:
Range: bytes=<offset of target timestamp>-<offset+1MB> - Buffer ahead: sequential range requests slightly ahead of playback
- On pause: cancel in-flight range request
- On resume: restart from current position
The CDN sees these as a sequence of range requests to the same URL. Caching the full object ensures all of these can be served locally after the first full miss.
Download Resumption
When a large file download is interrupted (connection drop, browser closed):
Resumed download:
GET /downloads/large-file.zip
Range: bytes=52428800-
↑ Resume from exactly where it stopped (50 MB mark)
Without range support, the user restarts the entire download. With range support, the download resumes from where it was.
The CDN must set ETag or Last-Modified on the initial response so
the client can validate the resource hasn’t changed before resuming:
If-Range: "abc123"
Range: bytes=52428800-
If-Range says: “If the ETag still matches, resume; otherwise send the full
file again.” This prevents serving corrupt data if the file was updated
between the initial download and the resume.
Implementation: http.ServeContent
Go’s standard library provides a complete range request implementation:
func handler(w http.ResponseWriter, r *http.Request) {
body := getContent(r.URL.Path)
reader := bytes.NewReader(body)
modTime := time.Now() // or real modification time
http.ServeContent(w, r, r.URL.Path, modTime, reader)
}
http.ServeContent handles:
Rangeheader parsing and validation206 Partial Contentresponses304 Not ModifiedviaIf-Modified-SinceandIf-None-MatchContent-Rangeheader generation- Multipart ranges
For CDN caching layers, the lab implements manual range handling to show
the full mechanics. For production use of static files, http.ServeContent
or http.ServeFile are correct choices.
What to Measure
# Ratio of 206 vs 200 responses (high ratio = lots of video/download traffic)
rate(http_responses_total{status="206"}[5m]) /
rate(http_responses_total{status="200"}[5m])
# Range request cache hit ratio
rate(cache_hits_total{request_type="range"}[5m]) /
rate(cache_requests_total{request_type="range"}[5m])
# Large object hit ratio (bytes, not requests — often more meaningful)
rate(cache_hit_bytes_total[5m]) /
rate(cache_total_bytes_total[5m])
Try It
make lab-11
# Full file
curl http://localhost:8080/file/video.mp4 -v
# First 10 KB
curl http://localhost:8080/file/video.mp4 -H "Range: bytes=0-10239" -v
# Last 4 KB
curl http://localhost:8080/file/video.mp4 -H "Range: bytes=-4096" -v
# Resume from 50 MB
curl http://localhost:8080/file/large.bin -H "Range: bytes=52428800-" -v
# With If-Range (ETag-validated resume)
ETAG=$(curl -si http://localhost:8080/file/video.mp4 | grep -i etag | awk '{print $2}')
curl http://localhost:8080/file/video.mp4 \
-H "Range: bytes=1024-2047" \
-H "If-Range: $ETAG" -v
Lab 12 · Consistent Hashing
Run it:
make lab-12
Source:labs/lab-12-consistent-hashing/main.go
The Problem
You have a pool of N CDN edge nodes. You want to route each URL to the same node consistently — so the same URL is always cached at the same node, maximizing cache reuse. How do you map URLs to nodes?
The Naive Approach: Modular Hashing
node := hash(url) % N // assign URL to node index
This works — until you add or remove a node. When N changes to N+1:
hash(url) % N → different node for almost every URL
Remapping fraction ≈ (N-1)/N. With 10 nodes, adding one node remaps 90% of all cache keys to different nodes. 90% of your cache invalidates instantly — a thundering herd against your origin.
This is why CDNs don’t use modular hashing for node selection.
Consistent Hashing (Karger et al., 1997)
Consistent hashing places both nodes and keys on a virtual ring (a circle with positions 0 to 2^32 or 2^64). A key is assigned to the first node clockwise from the key’s position on the ring.
Ring (0 to 2^32)
0
───────────
/ N1 \
│ (pos=15) │
│ ● │
│ │
│ ● ● │
│ K1 N2 │
│ │
│ │
\ N3 K2 /
─────────────
max
K1 (pos=45) → first clockwise node → N2 (pos=60)
K2 (pos=90) → first clockwise node → N3 (pos=95)
When a node is added: only the keys that fall between the new node’s position and its predecessor need to move. Expected remapping: only 1/N of all keys, regardless of N.
When a node is removed: only keys assigned to that node need to move to the next node. Again, only 1/N remapped.
Virtual Nodes (Vnodes)
With only one ring position per node, the key distribution is uneven — some nodes get more keys than others, especially with few nodes.
The solution: each physical node occupies multiple positions on the ring
(virtual nodes). The buraksezer/consistent library defaults to 100 vnodes
per node:
Physical node A → virtual nodes at positions: 15, 234, 567, 891, 1043, ...
Physical node B → virtual nodes at positions: 72, 310, 640, 958, 1200, ...
With 100 vnodes per node and 3 nodes: 300 ring positions. Key distribution becomes approximately uniform (σ ≈ 10% of mean load per node).
Tradeoff: more vnodes = better balance, but more ring metadata to maintain. At 1000 nodes × 100 vnodes = 100,000 ring positions. Still trivial in memory.
Implementation with buraksezer/consistent
import "github.com/buraksezer/consistent"
type Member string
func (m Member) String() string { return string(m) }
// Create ring
cfg := consistent.Config{
PartitionCount: 271, // prime number for distribution
ReplicationFactor: 40, // vnodes per member
Load: 1.25, // max load imbalance factor
Hasher: hasher{},
}
c := consistent.New(nil, cfg)
// Add nodes
c.Add(Member("node-1"))
c.Add(Member("node-2"))
c.Add(Member("node-3"))
// Route a key
member := c.LocateKey([]byte(url)) // returns the responsible node
// member.String() → "node-2"
Important API note: LocateKey returns a consistent.Member interface,
not a (Member, error) pair. It always returns one member (the ring is never
empty once populated). If the ring is empty, it panics — guard with a node
count check.
The PartitionCount Parameter
Consistent library’s PartitionCount (not to be confused with Kafka
partitions) divides the hash space into PartitionCount slices. Each
partition is assigned to a member. Better explanation of the API:
PartitionCount: 271 → 271 hash space segments (prime to minimize collisions)
ReplicationFactor: 40 → each member appears in ~40 partitions
With 3 members and ReplicationFactor=40, each member owns ~89 partitions
(271/3 ≈ 90, slight imbalance due to prime).
Applications in CDN Architecture
1. Shield routing (Lab 13)
Origin Shield uses consistent hashing to route all requests for a URL
to the same shield PoP. This maximizes the shield’s cache utilization:
if 10 edge PoPs all forward misses for /popular-image to the same shield
node, that shield node only fetches from origin once.
2. Peer-to-peer CDN (BitTorrent-style)
CDN nodes use consistent hashing to decide which peer to request cached content from before going to origin. Key = object ID, ring = all CDN nodes in a region.
3. Memcache cluster routing
Client-side consistent hashing for memcached clusters. The application
client routes each key to the same cache server. Adding a new cache server
only remaps 1/N keys (instead of all keys with modular hashing).
This was described in Facebook’s 2013 memcache paper.
4. Load balancing with session stickiness
Route users to the same backend server (for session data stored in-process) using consistent hashing on the client IP or session cookie.
Failure Modes
| Failure | Consistent hashing behavior | Plain mod-N behavior |
|---|---|---|
| Add 1 node to 10 | 1/11 keys remap | 10/11 keys remap |
| Remove 1 node from 10 | 1/10 keys remap | 9/10 keys remap |
| Node flapping (add/remove rapidly) | Same 1/N segment shifts each time | Wholesale remapping |
| Uneven key distribution | Vnodes reduce imbalance | N/A |
Hot Keys
Consistent hashing assigns each key to exactly one node. If a key is extremely popular (a viral video URL), one node gets all the traffic.
Solutions:
- Consistent hash → multiple replicas: store popular objects on K nodes (the primary plus K-1 successors), distribute reads randomly among them.
- Application-level scatter: Nginx upstream zones with
least_conn(route to least-loaded backend). - Per-node in-memory cache: popular objects are already in L1 on every node; consistent hashing only affects L2 and origin routing.
Cloudflare’s Argo routing uses a variant: route based on real-time network latency and node load rather than pure hash, accepting the cache inefficiency for better tail latency.
Try It
make lab-12
# Route URLs to nodes — each URL consistently maps to the same node
curl http://localhost:8080/route/article/1
curl http://localhost:8080/route/article/1 # same node every time
# Show distribution
curl http://localhost:8080/stats
# Simulate node removal — minimal remapping
curl -X DELETE http://localhost:8080/nodes/node-2
curl http://localhost:8080/stats # articles on node-2 moved to next node
Lab 13 · Origin Shield
Run it:
make lab-13
Source:labs/lab-13-origin-shield/main.go
The Problem
A CDN with 200 PoPs worldwide, each with an independent cache. Your origin handles peak traffic fine: 100 req/s of cache misses.
Then a popular video goes viral. Every PoP simultaneously gets cache misses for that video URL. 200 PoPs × simultaneous misses = 200 simultaneous origin requests. Origin collapses.
Even with singleflight within a single PoP (Lab 06), there’s no
deduplication across PoPs. Each PoP independently decides to fetch
from the origin.
Origin Shield: A Designated Parent PoP
The solution: designate one PoP as the shield (or parent PoP). All 200 edge PoPs forward their misses to the shield instead of to the origin. The shield may have the cached copy; if not, it fetches from origin once and serves all 200 edge misses from that single fetch.
200 Edge PoPs (all miss simultaneously)
│
├── NYC: forward to shield
├── LHR: forward to shield
├── NRT: forward to shield
│ ...
└── SYD: forward to shield
│
▼
Shield PoP (e.g. IAD)
│
├── Shield HIT → serve all 200 edges
│
└── Shield MISS → 1 origin request
│
▼
Origin
Result: 200 origin requests → 1 origin request.
The shield applies singleflight itself: even if 200 edges arrive within
a millisecond, the shield collapses all 200 into a single upstream fetch.
Combined with shield-level caching, origin sees at most 1 request per
content piece per TTL period regardless of CDN scale.
Vendor Implementations
| Vendor | Shield name | Designation |
|---|---|---|
| Fastly | Shielding / POP-to-POP | Any PoP can be shield |
| CloudFront | Origin Shield | Single regional shield |
| Cloudflare | Tiered Cache | Smart Tiering (auto) |
| Akamai | SureRoute / Tiered Distribution | Hierarchical |
Fastly Shielding allows any PoP to be designated as shield, with routing based on latency to origin. You configure it per-service in VCL:
sub vcl_recv {
if (req.backend == F_origin && !req.http.Fastly-FF) {
set req.backend = shield:IAD; # route through IAD shield
}
}
CloudFront Origin Shield is a dedicated regional tier between the edge PoPs and your origin. You enable it with:
{
"OriginShield": {
"Enabled": true,
"OriginShieldRegion": "us-east-1"
}
}
CloudFront charges $0.0087–0.0050/10,000 requests for origin shield traffic — still vastly cheaper than paying for origin infrastructure to handle unshielded traffic.
Implementation
Three-Tier Architecture
Client → Edge (:8080, :8081) → Shield (:8082) → Origin (:9001)
Each tier is a separate Go process. The edge nodes use consistent
hashing (Lab 12) to select which shield node handles each URL, and
singleflight to collapse concurrent same-key requests within the edge:
// At the edge, for a cache miss:
result, _, shared := sfGroup.Do(cacheKey, func() (interface{}, error) {
return fetchFromShield(cacheKey)
})
The shield does the same before forwarding to origin:
// At the shield, for a cache miss:
result, _, _ := sfGroup.Do(cacheKey, func() (interface{}, error) {
return fetchFromOrigin(cacheKey)
})
The Shield Selection
For a shield tier with multiple shield nodes, use consistent hashing to select which shield handles each URL:
Edge → consistent_hash(url) → ShieldNode-X → Origin
All edges route requests for URL X to the same shield node, maximizing shield cache hit ratio. If a shield node fails, consistent hashing automatically routes to the next node (only 1/N of URLs are remapped).
The Math: Origin Request Reduction
Without origin shield:
E = number of edge PoPs (200)
T = TTL (300 seconds)
R = request rate per URL (1000/s across all PoPs)
Origin requests per URL = E = 200 (on each TTL expiry)
With origin shield:
S = number of shield nodes (2–5 typically)
Origin requests per URL = S = 2–5 (one per shield node per TTL)
Reduction factor: 200 ÷ 3 ≈ 67× fewer origin requests.
In practice with singleflight at the shield level, even S requests
are collapsed to 1. Origin sees exactly 1 request per URL per TTL
regardless of CDN scale.
Shield Latency Tradeoff
Origin shielding adds one network hop. Edge → Shield adds latency:
Without shield: Edge → Origin = 150 ms
With shield: Edge → Shield → Origin = 5 ms + 150 ms = 155 ms
5 ms overhead for edge-to-shield hop (same region, dedicated link). The tradeoff is worth it because:
- 99% of requests are cache hits at either edge or shield
- The 5 ms penalty only applies to the remaining ~1% miss path
For a well-shielded CDN serving popular content:
Hit ratio at edge: 85% → 0 ms overhead
Hit ratio at shield: 12% → 5 ms overhead
Cache miss: 3% → 155 ms (5 + 150)
Average added latency: 0.85×0 + 0.12×5 + 0.03×155 = 5.25 ms average
Origin protection vastly outweighs the 5.25 ms average latency cost.
Failure Modes
| Failure | Behavior without shield | Behavior with shield |
|---|---|---|
| Origin spike | 200 PoPs × misses = 200 requests | 1–3 shield requests |
| Origin down | 200 PoPs serve stale or error | 1–3 shield requests (stale-if-error) |
| Shield node down | Edge falls back to origin directly | Consistent hash routes to next node |
| Shield cache invalidation | Must purge all edges too | Purge shield = automatic edge invalidation |
Try It
make lab-13
# Start all three tiers
# Lab automatically starts edge1(:8080), edge2(:8081), shield(:8082), origin(:9001)
# Request through edge 1
curl http://localhost:8080/article/1 -H "X-Debug: tiers"
# Response should show: Edge MISS → Shield MISS → Origin HIT
# Same request through edge 2 (different PoP)
curl http://localhost:8081/article/1 -H "X-Debug: tiers"
# Should show: Edge MISS → Shield HIT (shield already has it)
# Repeat both — edges should be HIT now
curl http://localhost:8080/article/1
curl http://localhost:8081/article/1
Lab 14 · Gossip Cluster & Distributed Purge
Run it:
make lab-14
Source:labs/lab-14-gossip-cluster/main.go
The Problem
You have 100 CDN edge nodes. Content changes. You need every node to know about the invalidation within seconds.
Why Not a Central Coordinator?
A single “purge coordinator” that notifies all nodes:
Application → Coordinator → [Node1, Node2, ..., Node100]
Problems:
- Single point of failure: coordinator down = no purges propagate
- O(N) work per purge: coordinator sends 100 messages
- N connection overhead: coordinator maintains 100 persistent connections
- Thundering herd on coordinator: during deployments, thousands of purges
- Partitioned PoPs: nodes behind a network partition miss purges silently
The solution used by Cassandra, CockroachDB, and Cloudflare’s edge network: gossip protocol (epidemic dissemination).
Gossip Protocol: Epidemic Dissemination
Named by analogy to biological epidemics: one infected node tells a few others, who each tell a few more. Within O(log N) rounds, all nodes are informed.
Algorithm:
Round 1: Node A has new information
→ A tells: B, E
Round 2: B and E spread:
→ B tells: C, F
→ E tells: G, D
Round 3: C, F, G, D spread:
→ C tells: H, I
→ F tells: J, K
→ G tells: L, M
→ D tells: N, A (A already knows)
With 100 nodes and fanout=3: propagates to all in log₃(100) ≈ 4.2 rounds
Each round is a small fixed-cost message. Total network messages per gossip cycle: O(N log N). Compare to broadcast: O(N). Gossip is slightly more expensive per event but infinitely more resilient.
hashicorp/memberlist
The memberlist library implements the SWIM protocol (Scalable Weakly-consistent
Infection-style Membership protocol) with enhancements from “Lifeguard”:
- Member discovery: nodes find each other via gossip
- Failure detection: probe + indirect probe to detect crashes
- Broadcast: attach arbitrary data to membership messages (e.g., cache purge events)
import "github.com/hashicorp/memberlist"
// Configure the local node
config := memberlist.DefaultLocalConfig()
config.Name = "edge-node-1"
config.BindAddr = "0.0.0.0"
config.BindPort = 7946
config.Delegate = &myDelegate{} // receives user data
config.Events = &myEventDelegate{} // membership change callbacks
list, err := memberlist.Create(config)
// Join an existing cluster
list.Join([]string{"edge-node-2:7946", "edge-node-3:7946"})
// Broadcast a message to all nodes
list.LocalNode().Meta = []byte("hello") // meta is broadcast with membership
list.UpdateNode(5 * time.Second)
// Or use the TransmitLimitedQueue for arbitrary messages
queue := &memberlist.TransmitLimitedQueue{
NumNodes: func() int { return list.NumMembers() },
RetransmitMult: 3,
}
queue.QueueBroadcast(&purgeMessage{tag: "article-42"})
Implementing Distributed Cache Purge
1. Purge message format
type PurgeMessage struct {
ID string `json:"id"` // UUID for deduplication
Tags []string `json:"tags"`
URLs []string `json:"urls"`
Origin string `json:"origin"` // which node originated the purge
Timestamp time.Time `json:"ts"`
}
func (m *PurgeMessage) Invalidates() bool { return true }
func (m *PurgeMessage) Message() []byte { return mustMarshal(m) }
func (m *PurgeMessage) Finished() {}
2. The Delegate
The memberlist.Delegate interface is how you plug in custom logic:
type cacheDelegate struct {
queue *memberlist.TransmitLimitedQueue
seen sync.Map // deduplication: message ID → struct{}
}
func (d *cacheDelegate) NotifyMsg(b []byte) {
var msg PurgeMessage
json.Unmarshal(b, &msg)
// Deduplication: skip messages we've already processed
if _, loaded := d.seen.LoadOrStore(msg.ID, struct{}{}); loaded {
return
}
// Apply the purge locally
for _, tag := range msg.Tags { localCache.PurgeByTag(tag) }
for _, url := range msg.URLs { localCache.Delete(url) }
}
func (d *cacheDelegate) GetBroadcasts(overhead, limit int) [][]byte {
return d.queue.GetBroadcasts(overhead, limit)
}
3. Gossip anti-entropy
Beyond event-driven purge, gossip implements anti-entropy: nodes periodically compare state with a random peer and reconcile differences. This catches missed messages (due to network partitions, node restarts, message drops under load).
Every 30s:
→ Node A picks random peer B
→ A sends a digest of its cache state (Bloom filter or version vectors)
→ B responds with any items A is missing
→ A applies the delta
This ensures eventual consistency: even if a purge message is dropped, the anti-entropy scan will catch the discrepancy within 30 seconds.
SWIM Protocol: Failure Detection
SWIM’s failure detection is probabilistic but fast:
1. Every T_probe seconds: node A probes random node B with a ping
2. If B doesn't respond within T_timeout:
→ A asks K other random nodes to probe B indirectly (indirect probe)
3. If no indirect probe succeeds:
→ A marks B as SUSPECT, gossips the suspicion
4. If B doesn't refute (send alive message) within T_suspect:
→ B is declared DEAD, gossipped as such
5. Dead members are removed from the ring
This gives O(1) probe messages per node and detects failures in ~3–5 seconds with default settings. Compare to a central heartbeat system: O(N) messages per probe cycle.
Push-Pull Gossip for State Synchronization
memberlist also implements push-pull gossip:
Node A pushes its full local state to random node B
Node B pushes its full local state back to A
Both reconcile differences
This is more expensive (full state exchange) but faster convergence for new nodes joining the cluster. Frequency: once per 30–60 seconds.
For cache invalidation: push-pull can sync the full set of currently-valid cache tags, ensuring a node that was offline for 5 minutes catches up on all purges it missed.
Real-World: Cloudflare’s Cache Purge
Cloudflare’s cache purge propagates across 300+ PoPs using a gossip-adjacent system. Their 2022 blog post describes how a purge request:
- Hits Cloudflare’s API endpoint
- Is distributed via their internal notification system (similar to gossip)
- Reaches all PoPs within 150ms for 95th percentile
At Cloudflare scale, this requires highly optimized serialization (Protocol Buffers), binary gossip protocols, and infrastructure tuned for low-latency small-message delivery.
Try It
make lab-14
# Three nodes form a gossip cluster automatically
# Look for "Cluster formed: 3 members" in the output
# Issue a purge on node 1
curl -X POST http://localhost:8080/cache/purge \
-H "Content-Type: application/json" \
-d '{"tags": ["article-42"]}'
# Within ~100ms, the purge propagates to nodes 2 and 3
# Verify by checking their cache state:
curl http://localhost:8081/cache/stats
curl http://localhost:8082/cache/stats
# Both should show article-42 as purged
Lab 15 · Geographic Routing & PoP Failover
Run it:
make lab-15
Source:labs/lab-15-geo-routing/main.go
The Problem
A CDN node in Singapore is useless to a user in Berlin. Latency on a Singapore → Berlin path is ~160 ms one-way. A Frankfurt PoP would serve Berlin in ~5 ms.
Geographic routing — directing each user to the nearest CDN PoP — is one of the most impactful optimizations in CDN infrastructure. The difference between 160 ms and 5 ms TTFB is the difference between a bounced visitor and a retained one.
Routing Mechanisms
1. Anycast BGP (used by Cloudflare, Fastly)
The same IP address is announced from every PoP via BGP. Internet routing automatically directs packets to the topologically nearest PoP:
209.91.64.22 announced from:
- Frankfurt PoP → European users reach Frankfurt
- Tokyo PoP → Asian users reach Tokyo
- Chicago PoP → US Midwest users reach Chicago
BGP anycast routing is handled entirely by the internet’s routing infrastructure. CDN operator’s job: configure BGP announcements correctly and monitor AS path lengths.
Advantage: Zero application-level routing logic. Failover is automatic (BGP withdraws the broken PoP’s announcement).
Disadvantage: BGP convergence is slow (~30–180 seconds for a prefix withdrawal to propagate globally). A PoP that goes down may continue receiving traffic for minutes.
DNS-level failover is faster (~30 seconds with low TTL), but requires additional coordination.
2. GeoDNS (used by many second-tier CDNs)
DNS returns different IP addresses based on the client’s IP’s geographic region:
User from Germany resolves cdn.example.com:
→ DNS returns 203.0.113.10 (Frankfurt PoP)
User from Japan resolves cdn.example.com:
→ DNS returns 203.0.113.20 (Tokyo PoP)
Advantage: Simple to implement; works with any CDN infrastructure.
Disadvantage: DNS caching (TTL 60s–300s) means failover is slow. During failover, users who cached the old IP get routed to a dead PoP. NXDOMAIN or connection refused until TTL expires.
3. Application-Layer Routing (HTTP Redirect)
User → cdn.example.com → Routing server
→ 302 Redirect to "ams01.cdn.example.com"
This lab implements application-layer routing. A routing server receives all requests, calculates the optimal PoP, and either redirects or proxies to it.
Haversine Distance Calculation
The lab computes geographic distance using the haversine formula, which gives the great-circle distance between two points on a sphere:
func haversine(lat1, lon1, lat2, lon2 float64) float64 {
const R = 6371 // Earth radius in km
φ1 := lat1 * math.Pi / 180
φ2 := lat2 * math.Pi / 180
Δφ := (lat2 - lat1) * math.Pi / 180
Δλ := (lon2 - lon1) * math.Pi / 180
a := math.Sin(Δφ/2)*math.Sin(Δφ/2) +
math.Cos(φ1)*math.Cos(φ2)*
math.Sin(Δλ/2)*math.Sin(Δλ/2)
c := 2 * math.Atan2(math.Sqrt(a), math.Sqrt(1-a))
return R * c // distance in km
}
Given client location, find the closest PoP:
func nearestPoP(clientLat, clientLon float64, pops []PoP) PoP {
var nearest PoP
minDist := math.MaxFloat64
for _, pop := range pops {
if !pop.healthy.Load() { continue } // skip unhealthy PoPs
d := haversine(clientLat, clientLon, pop.Lat, pop.Lon)
if d < minDist {
minDist = d
nearest = pop
}
}
return nearest
}
The 5 PoPs
The lab simulates 5 geographically distributed PoPs:
| PoP | City | Coords | Port |
|---|---|---|---|
| NYC | New York | 40.71°N, 74.00°W | :9010 |
| LHR | London | 51.51°N, 0.13°W | :9011 |
| NRT | Tokyo | 35.65°N, 139.76°E | :9012 |
| SYD | Sydney | 33.87°S, 151.21°E | :9013 |
| GRU | São Paulo | 23.43°S, 46.47°W | :9014 |
Health Checking & Failover
Each PoP exposes a /health endpoint. The router runs periodic health
checks:
type PoP struct {
Name string
Addr string
Lat float64
Lon float64
healthy atomic.Bool
}
func (r *Router) healthCheckLoop() {
ticker := time.NewTicker(5 * time.Second)
for range ticker.C {
for i := range r.pops {
pop := &r.pops[i]
go func() {
resp, err := http.Get(pop.Addr + "/health")
healthy := err == nil && resp.StatusCode == 200
pop.healthy.Store(healthy)
}()
}
}
}
atomic.Bool for the health state means reads in the routing hot path
require no lock. Health checks run concurrently with requests; a false
health state is propagated within one health-check interval.
When the nearest PoP is unhealthy, routing falls back to the next-nearest healthy PoP automatically.
Real-World PoP Selection
Geographic distance is a proxy for network latency, but not a perfect one. BGP path length, network peering relationships, and inter-AS latency can cause a geographically farther PoP to have lower latency.
Production CDNs use active latency measurements:
- Cloudflare Argo: routes traffic based on real-time network telemetry measured across the actual internet paths between PoPs
- Fastly: uses Anycast BGP (network handles routing) plus performance-based override for known poor paths
- AWS CloudFront: uses latency-based routing in Route 53
The haversine approach in this lab is a good approximation (within ~20% of actual latency in most cases) and zero-overhead at runtime.
Client Location Detection
In production, client location comes from:
- IP geolocation: MaxMind GeoLite2 database or IP-API, maps IP → country/city/coords
- CDN headers: Cloudflare adds
CF-IPCountry,CF-IPCity,CF-IPLatitude,CF-IPLongitudeto every request automatically - GPS/browser API: browser can provide precise location (user permission required)
- CDN PoP metadata: the PoP itself knows its geographic location; route users to the PoP they connected to
The lab accepts lat/lon as query parameters for testability.
PoP Infrastructure Design
When selecting where to locate PoPs, the key criteria are:
- Internet Exchange Points (IXPs): co-locate at major IXPs (DE-CIX Frankfurt, AMS-IX Amsterdam, LINX London) for direct peering with hundreds of ISPs, reducing latency and cost
- Traffic density: PoPs near large populations (NYC, London, Tokyo, São Paulo, Mumbai) serve the most users
- Data center tier: Tier 3+ (99.999% uptime, redundant power/cooling)
- Network diversity: multiple transit providers per PoP prevents single-provider outages from taking down the PoP
Try It
make lab-15
# Route a request from NYC (40.71, -74.00) — should go to NYC PoP
curl "http://localhost:8080/?lat=40.71&lon=-74.00" -v
# Route from London (51.51, -0.13) — should go to LHR PoP
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v
# Route from Tokyo — should go to NRT PoP
curl "http://localhost:8080/?lat=35.65&lon=139.76" -v
# Simulate LHR failure — London user should reroute to nearest healthy PoP
curl -X DELETE "http://localhost:8080/pops/LHR"
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v
# Should now route to NYC or GRU (next closest)
Lab 16 · Signed URLs & Token Authentication
Run it:
make lab-16
Source:labs/lab-16-signed-urls/main.go
The Problem
Public CDN caching works for content anyone can access. But what about:
- A Netflix video: only the paying subscriber should be able to access it
- A signed download link: expires in 1 hour
- A presigned S3 URL: locked to a specific IP address
- A livestream: viewers who joined must stay authorized, not share URLs
The CDN must enforce authorization at the edge, before delivering content, without calling the origin for every request (that would destroy the CDN’s performance advantage).
Signed URLs solve this: the application server generates a URL that contains a cryptographic signature. The CDN verifies the signature without contacting the origin.
HMAC-SHA256: The Signature Primitive
HMAC (Hash-based Message Authentication Code) uses a secret key and a hash function (SHA-256 here) to produce an authentication code:
HMAC-SHA256(key, message) = H(key XOR opad || H(key XOR ipad || message))
Properties:
- Unforgeability: without the key, it’s computationally infeasible to produce a valid MAC for a different message
- Key-binding: same message + different key → different MAC
- Non-collision: different messages → different MACs (with overwhelmingly high probability)
This is the same primitive used by JWT (HS256 variant), AWS Signature V4, and cookie signing in Django/Rails.
import "crypto/hmac"
import "crypto/sha256"
func sign(key []byte, message string) string {
mac := hmac.New(sha256.New, key)
mac.Write([]byte(message))
return hex.EncodeToString(mac.Sum(nil))
}
// Verify — always use hmac.Equal, never ==
func verify(key []byte, message, signature string) bool {
expected := sign(key, message)
// hmac.Equal is constant-time to prevent timing attacks
return hmac.Equal([]byte(signature), []byte(expected))
}
The Canonical String
The signature must cover all inputs that should be tamper-proof. The lab uses:
func canonicalString(method, path string, expires int64, clientIP string) string {
// Deterministic: same inputs always produce same string
return fmt.Sprintf("%s\n%s\n%d\n%s",
strings.ToUpper(method), // GET
path, // /video/movie.mp4
expires, // Unix timestamp
clientIP, // 1.2.3.4 (or "" if not IP-bound)
)
}
This canonical string is signed. The URL then carries:
/video/movie.mp4?expires=1735689600&ip=1.2.3.4&sig=a1b2c3d4...&keyver=v2
What to include in the canonical string
| Parameter | Include? | Why |
|---|---|---|
| HTTP method | Recommended | Prevent GET token being used for DELETE |
| URL path | Required | Prevent token for /video/1 being used for /video/2 |
| Expiry timestamp | Required | Time-bound the token |
| Client IP | Optional | IP-locked tokens prevent sharing; breaks VPNs |
| Content type | Optional | Prevent download link being used for upload |
| Key version | Via URL | Enables rotation without invalidating all tokens |
Timing-Safe Comparison: hmac.Equal
Never use == or bytes.Equal to compare HMACs. These perform
byte-by-byte comparison and short-circuit on the first mismatch.
An attacker can measure response time to determine how many bytes of their forged signature match the real signature (timing oracle). With enough requests:
sig[0] == correct? → 200 ns (one comparison)
sig[0] != correct? → 100 ns (short-circuit)
→ Binary search on each byte → forge a valid signature in O(256×32) = 8192 requests
hmac.Equal (and subtle.ConstantTimeCompare) always compare the full
input regardless of where the first mismatch is, eliminating the timing
oracle:
// WRONG — timing oracle vulnerability
if signature != expected {
return false
}
// CORRECT — constant-time comparison
if !hmac.Equal([]byte(signature), []byte(expected)) {
return false
}
This is OWASP Top 10 territory (A07: Identification and Authentication Failures).
Key Rotation
Secrets must be rotatable without invalidating all outstanding tokens.
The URL carries a keyver parameter:
/video/movie.mp4?sig=abc123&keyver=v1&expires=...
/video/movie.mp4?sig=xyz789&keyver=v2&expires=...
The CDN maintains multiple keys:
var signingKeys = map[string][]byte{
"v1": []byte("old-secret-key"), // still accepted for in-flight URLs
"v2": []byte("new-secret-key-2025"), // current signing key
}
func verifySignedURL(r *http.Request) bool {
keyver := r.URL.Query().Get("keyver")
key, ok := signingKeys[keyver]
if !ok { return false }
// Verify with the key for this version
return hmac.Equal(
[]byte(computeExpectedSig(key, r)),
[]byte(r.URL.Query().Get("sig")),
)
}
Rotation procedure:
- Generate new key; add as
v2to CDN config (old key still active) - Configure application server to sign new URLs with
v2 - Wait for all outstanding
v1tokens to expire (or force-expire them) - Remove
v1from CDN config
Expiry Validation
func checkExpiry(r *http.Request) bool {
expiresStr := r.URL.Query().Get("expires")
expires, err := strconv.ParseInt(expiresStr, 10, 64)
if err != nil { return false }
return time.Now().Unix() < expires
}
Clock skew: CDN nodes across PoPs may have slight clock differences. Add a small grace period (30–60 seconds) to tolerate this:
return time.Now().Unix() < expires + 60 // 60-second grace window
IP Binding
Binding a signed URL to the client’s IP prevents the link from being shared. When a user logs into your streaming service and requests a video URL:
Application: token = sign(path, expires, clientIP="1.2.3.4")
CDN: verify(path, expires, clientIP=request.RemoteAddr)
If the user shares the URL with a friend (IP 5.6.7.8), the CDN
rejects with 403 Forbidden.
Tradeoff: IP binding breaks users on:
- VPNs (IP changes between token generation and use)
- Mobile networks (IP changes during handoff)
- Large corporate NAT (all employees share one IP — one user’s token would be usable by all)
Most streaming services IP-bind only for high-value content or use short-TTL tokens (15 minutes) instead.
Vendor Implementation
| Vendor | Signed URL mechanism |
|---|---|
| Cloudflare | Signed URLs + Token Auth (Workers or built-in) |
| Fastly | Signed tokens via VCL |
| AWS CloudFront | Signed URLs (RSA) or Signed Cookies |
| Akamai | Edge Auth Token |
AWS CloudFront uses RSA signatures (asymmetric): the application signs with a private key; CloudFront verifies with the public key. This means the CDN never needs to know the private key — useful when you don’t fully trust the CDN operator with the signing secret.
Try It
make lab-16
# Generate a signed URL (the lab provides a /sign endpoint for testing)
SIGNED=$(curl -s "http://localhost:8080/sign?path=/video/movie.mp4&ttl=300")
echo "Signed URL: $SIGNED"
# Access with valid signature
curl "$SIGNED" -v
# Access without signature — should be 403
curl "http://localhost:8080/video/movie.mp4" -v
# Expired token (manipulate the expires param)
EXPIRED=$(echo "$SIGNED" | sed 's/expires=[0-9]*/expires=1000000000/')
curl "$EXPIRED" -v # should be 403
# Wrong IP (change the ip param if IP-bound)
curl "$SIGNED" -v # will succeed from your IP
# → Serving signed content from a different IP would fail
Lab 17 · Edge Compute via WebAssembly
Run it:
make lab-17
Source:labs/lab-17-edge-compute/main.go
The Problem
Every CDN feature we’ve built so far is fixed at deployment time: the routing logic, the cache key normalization, the compression settings. What if you want application-specific logic at the edge that changes independently of the CDN infrastructure?
Use cases:
- Custom request routing logic (A/B test, feature flag)
- Bot and device detection
- Request authentication and rate limiting
- Header manipulation (add, remove, rewrite)
- Edge-rendered personalisation fragments
- URL rewriting and canonical redirects
Traditionally this required deploying custom CDN software (Nginx modules, Varnish VMODs) or Lua/JS scripts (Nginx Lua, CloudFront Lambda@Edge). WebAssembly (WASM) provides a more universal and safer alternative.
WebAssembly at the Edge
WebAssembly is a binary instruction format designed for safe, fast execution in sandboxed environments. Key properties for edge compute:
- Sandboxed: WASM modules cannot access the filesystem, network, or system calls directly. All I/O is mediated by the host.
- Language-agnostic: Compile Go, Rust, C, AssemblyScript, or any WASM target to the same binary format.
- Near-native speed: WASM runtime compiles to machine code; typical overhead is 5–10% vs. native.
- Instant startup: WASM modules start in ~50 µs. Lambda/container cold starts are 100 ms–10 s.
Production edge compute platforms using WASM
| Platform | Runtime | Guest languages |
|---|---|---|
| Cloudflare Workers | V8 isolates (JS + WASM) | JS, Rust, Go, Python |
| Fastly Compute | Lucet → Wasmtime | Rust, JS, Go, C |
| Deno Deploy | V8 + Deno WASM | JS, TS |
| Fermyon Spin | Wasmtime | Rust, Go, Python |
| wazero (this lab) | Pure Go WASM runtime | Any WASI target |
wazero: Pure Go WASM Host
tetratelabs/wazero is a zero-dependency, pure Go WASM runtime that
implements WASI (WebAssembly System Interface). It runs WASM modules in-process:
import (
"github.com/tetratelabs/wazero"
"github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)
// Create a runtime
ctx := context.Background()
r := wazero.NewRuntime(ctx)
defer r.Close(ctx)
// Instantiate the WASI environment (stdin/stdout/stderr for the module)
wasi_snapshot_preview1.MustInstantiate(ctx, r)
// Load and compile the WASM binary
wasmBinary, _ := os.ReadFile("detector.wasm")
mod, _ := r.InstantiateModuleFromBinary(ctx, wasmBinary)
// Call a function exported by the WASM module
fn := mod.ExportedFunction("detect_bot")
result, _ := fn.Call(ctx, /* args... */)
API note: The lab uses
wasi_snapshot_preview1.Instantiate(notMustInstantiate).MustInstantiatepanics on error;Instantiatereturns an error, which is more appropriate in a production handler.
The Guest WASM Module
The WASM “guest” is compiled from Go with GOOS=wasip1 GOARCH=wasm:
//go:build wasip1
package main
import (
"strings"
"unsafe"
)
// DetectBot checks if the User-Agent is a known bot
//
//export detect_bot
func DetectBot(uaPtr uint32, uaLen uint32) uint32 {
ua := ptrToString(uaPtr, uaLen)
if isBotUA(ua) { return 1 }
return 0
}
func isBotUA(ua string) bool {
lower := strings.ToLower(ua)
bots := []string{"googlebot", "bingbot", "slurp", "duckduckbot",
"baiduspider", "yandexbot", "sogou", "facebot",
"ia_archiver", "curl/", "python-requests", "go-http"}
for _, bot := range bots {
if strings.Contains(lower, bot) { return true }
}
return false
}
// ptrToString converts a WASM memory pointer+length to a Go string
func ptrToString(ptr, length uint32) string {
var buf []byte
s := (*[1 << 30]byte)(unsafe.Pointer(uintptr(ptr)))
buf = s[:length:length]
return string(buf)
}
func main() {} // required for WASI
Build:
GOOS=wasip1 GOARCH=wasm go build -o detector.wasm ./guest/
Host-Guest Memory Communication
WASM has a flat 32-bit address space shared between host and guest. To pass a string from Go host to WASM guest:
// 1. Call the guest's allocate function to get a memory pointer
allocate := mod.ExportedFunction("allocate")
ptr, _ := allocate.Call(ctx, uint64(len(ua)))
// 2. Write the string into WASM memory
mod.Memory().Write(uint32(ptr[0]), []byte(ua))
// 3. Call the WASM function with the pointer and length
fn := mod.ExportedFunction("detect_bot")
result, _ := fn.Call(ctx, ptr[0], uint64(len(ua)))
This shared-memory model is efficient but requires careful memory management. The lab uses a simple approach (allocate once per request); production implementations use memory pools or arena allocators.
Fail-Open vs. Fail-Closed
When the WASM module errors (invalid input, out-of-memory, assertion failure), the edge has two choices:
Fail-open: serve the request normally, log the WASM error:
if err != nil {
log.Warn("WASM error, serving anyway", "err", err)
next.ServeHTTP(w, r) // continue without bot detection
return
}
Fail-closed: reject the request on WASM error:
if err != nil {
http.Error(w, "Internal Error", 503)
return
}
The lab uses fail-open for bot detection — if the WASM module crashes, it’s better to serve the user (potentially a bot, but probably a real user) than to block everyone.
Fail-closed is appropriate for: authentication checks, fraud detection, rate limiting where the risk of missing a check exceeds the risk of false rejection.
Cloudflare Workers Architecture
Cloudflare Workers run JavaScript (or WASM via JS) in V8 isolates — the same V8 engine used by Chrome/Node.js, but isolated per-worker:
V8 Isolate (< 128MB RAM, 10ms CPU):
→ One isolate per worker code deployment
→ Thousands of concurrent isolates per PoP
→ Cold start: ~5 ms (pre-warmed isolates: 0 ms)
→ Network: fetch() API proxied through Cloudflare infrastructure
Workers are intentionally limited to prevent abuse: no persistent state, no direct filesystem access, no raw network sockets. State must go through Workers KV (eventually consistent store), D1 (SQLite), or Durable Objects.
The WASM approach in this lab is more similar to Fastly Compute, which gives WASM modules more direct access to request/response objects.
Try It
make lab-17
# Normal request — WASM module processes the User-Agent
curl http://localhost:8080/article/1 \
-H "User-Agent: Mozilla/5.0 (compatible; Chrome)" -v
# Bot request — should be identified and optionally blocked/tagged
curl http://localhost:8080/article/1 \
-H "User-Agent: Googlebot/2.1" -v
# Curl (also a bot)
curl http://localhost:8080/article/1 -v
# Check X-Bot-Detected header in response
# Python-requests (bot)
curl http://localhost:8080/article/1 \
-H "User-Agent: python-requests/2.28.0" -v
Lab 18 · HTTP/3 and QUIC
Run it:
make lab-18
Source:labs/lab-18-http3-quic/main.go
The Problem
TCP was designed in 1974. Every HTTP version from 0.9 to 2.0 runs on TCP. But TCP has a fundamental flaw for modern web performance: Head-of-Line (HoL) Blocking.
In HTTP/2, all streams share a single TCP connection. If one TCP packet is lost, all streams stall until the lost packet is retransmitted:
HTTP/2 connection (single TCP)
Stream 1: HTML [====| |=====>] ← stalled by lost packet
Stream 2: CSS [====| |=====>] ← stalled by lost packet
Stream 3: image [====| LOST |=====>] ← packet lost here
All streams wait for the retransmission of stream 3's packet.
On a path with 2% packet loss (common on mobile, satellite, congested networks), HTTP/2 throughput can be worse than HTTP/1.1 because of amplified HoL blocking.
QUIC solves this by rebuilding transport from scratch on top of UDP.
QUIC: A New Transport
QUIC (Quick UDP Internet Connections, RFC 9000) is a transport protocol built on UDP. It replicates TCP’s reliability guarantees while eliminating HoL blocking:
QUIC connection (over UDP)
Stream 1: HTML [=============>] ← independent stream
Stream 2: CSS [=============>] ← independent stream
Stream 3: image [===== LOST →] ← only this stream pauses for retransmit
Streams 1 and 2 continue unaffected.
Key QUIC Features
Connection IDs (CIDs)
In TCP, a connection is identified by (src IP, src port, dst IP, dst port).
Changing any element tears down the connection.
QUIC connections are identified by a 64-bit opaque Connection ID:
QUIC Connection: CID=0xdeadbeef01234567
Can migrate: src IP changes (mobile handoff) → connection survives
Can migrate: src port changes (NAT rebinding) → connection survives
This enables Connection Migration: a mobile user moving from WiFi to LTE doesn’t break QUIC connections. TCP connections would require a full TLS+TCP handshake on the new network.
0-RTT Resumption
A client that previously connected to a server can resume with 0 RTT:
1st connection: 1-RTT (QUIC INIT + crypto handshake)
2nd connection: 0-RTT (client sends data immediately with cached session ticket)
0-RTT data is not forward secret (replay attack risk). For safe 0-RTT:
- Non-mutating requests only (GET, HEAD, OPTIONS)
- Servers must use replay protection (nonce tracking or time-window limits)
QPACK Header Compression
HTTP/2 uses HPACK for header compression. HPACK requires in-order delivery (a single dynamic table synchronized between endpoints). HPACK breaks under packet reordering.
QPACK (RFC 9204) is QUIC’s header compression scheme. It uses separate encoder and decoder streams that don’t block request streams on packet loss.
TLS 1.3 Integration
QUIC encrypts the transport layer itself. There is no unencrypted QUIC. QUIC integrates TLS 1.3 handshake into its own handshake:
TCP/TLS 1.3: QUIC/TLS 1.3:
[TCP SYN] [Initial Packet (Client Hello)]
[TCP SYN-ACK] [Initial Packet (Server Hello)]
[TCP ACK] [Handshake Packet (Finished)]
[TLS ClientHello]
[TLS ServerHello] ← QUIC fuses these into fewer round-trips
[TLS Finished]
[First request] [First request]
Result: QUIC 1-RTT vs. TCP+TLS 2-RTT for new connections.
ECDSA vs. RSA Certificates
The lab generates a self-signed ECDSA P-256 certificate (not RSA). ECDSA offers smaller key sizes for equivalent security:
| Algorithm | Key size (128-bit security) | Signature size | Handshake CPU |
|---|---|---|---|
| RSA | 3072 bits | 384 bytes | ~3ms |
| ECDSA P-256 | 256 bits | 64 bytes | ~0.3ms |
For a CDN terminating millions of TLS connections per second, ECDSA is significantly more efficient. Cloudflare uses ECDSA certificates by default.
// Generate ECDSA P-256 key
privKey, _ := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
// Self-signed certificate
template := &x509.Certificate{
SerialNumber: big.NewInt(1),
Subject: pkix.Name{CommonName: "localhost"},
NotBefore: time.Now(),
NotAfter: time.Now().Add(365 * 24 * time.Hour),
DNSNames: []string{"localhost"},
}
certDER, _ := x509.CreateCertificate(rand.Reader, template, template, &privKey.PublicKey, privKey)
quic-go Implementation
The lab uses github.com/quic-go/quic-go:
import (
"github.com/quic-go/quic-go/http3"
"net/http"
)
// HTTP/3 server runs on UDP
server := &http3.Server{
Addr: ":443",
Handler: mux,
TLSConfig: &tls.Config{
Certificates: []tls.Certificate{cert},
NextProtos: []string{"h3"}, // ALPN for HTTP/3
},
}
// Also run HTTP/1.1 + HTTP/2 on TCP (for clients that don't support H3)
go server.ListenAndServeTLS(certFile, keyFile) // UDP 443
// Alt-Svc header tells clients "this server speaks H3 on port 443"
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Alt-Svc", `h3=":443"; ma=86400`)
// ... serve content
})
Alt-Svc: Protocol Upgrade Negotiation
Browsers discover H3 support via the Alt-Svc response header:
Alt-Svc: h3=":443"; ma=86400
h3=":443"— server speaks HTTP/3 on port 443ma=86400— this advertisement is valid for 86400 seconds (24h)
On first request, the browser uses TCP (H1/H2). On subsequent requests, it connects via QUIC instead. The browser caches Alt-Svc per origin.
H1 vs. H2 vs. H3 Performance
Relative performance depends on network conditions:
| Protocol | 0% loss | 1% loss | 5% loss |
|---|---|---|---|
| HTTP/1.1 (6 connections) | baseline | -30% | -60% |
| HTTP/2 (1 connection) | +15% | -50% | -70% |
| HTTP/3 (QUIC) | +10% | -5% | -20% |
H3 shines on lossy/high-latency networks. On a clean datacenter network, H2 and H3 are comparable. This is why CDN edge nodes gain more from H3 than CDN-to-origin connections (origin is typically on a reliable path).
Firewall Considerations
QUIC runs on UDP. Many corporate firewalls block all non-DNS UDP traffic. When QUIC is blocked, clients fall back to TCP:
Client: send QUIC Initial packet (UDP 443)
Firewall: drops UDP 443
Client: timeout after ~150ms
Client: fall back to TCP + TLS 1.3
This is called “QUIC Happy Eyeballs”: parallel TCP and QUIC attempts, use whichever succeeds first. Chrome/Firefox implement this.
Browser QUIC adoption statistics (2024): ~28% of all web requests (dominated by Google services which pioneered QUIC via gQUIC).
Try It
make lab-18
# HTTP/1.1 request
curl -k http://localhost:8080/ -v
# HTTP/3 request (skip certificate validation for self-signed cert)
curl -k --http3 https://localhost:8443/ -v
# Benchmark: compare H1 vs H3 latency
time curl -k http://localhost:8080/large-file -o /dev/null
time curl -k --http3 https://localhost:8443/large-file -o /dev/null
# Verify Alt-Svc header is present
curl -k https://localhost:8443/ -I | grep Alt-Svc
# Should show: Alt-Svc: h3=":8443"; ma=86400
# Check what protocol was negotiated (curl verbose shows it)
curl -k --http3 https://localhost:8443/ -v 2>&1 | grep "Using HTTP"
Lab 19 · HLS Streaming & Segment Caching
Run it:
make lab-19
Source:labs/lab-19-hls-streaming/main.go
The Problem
Video delivery is the dominant use case for CDN infrastructure — in 2024, video represents ~65% of all internet traffic. Unlike web pages (one-shot request-response), video streaming is:
- High-bandwidth: a 4K stream is 25 Mbps; 1 million concurrent viewers require 25 Tbps of aggregate bandwidth
- Time-sensitive: a 2-second buffer stall causes viewer abandonment rates to jump 20%
- Long-duration: sessions last 30–120 minutes; cache TTLs matter differently for live vs. VOD
The CDN must cache aggressively to serve millions of concurrent viewers from edge rather than hammering the origin’s encoder/packager.
HLS: HTTP Live Streaming
HLS (RFC 8216) is the dominant streaming protocol for CDNs. It works by slicing video into short segments and serving them over plain HTTP:
Client CDN Origin Encoder
│ │ │
│── GET master.m3u8 ──────>│── (cache miss) ─────────>│
│<──── master playlist ────│<──── master playlist ─────│
│ │ (cache TTL: 60s) │
│── GET 720p/playlist.m3u8>│── (cache miss) ─────────>│
│<──── variant playlist ───│<──── variant playlist ────│
│ │ (cache TTL: 5s for live) │
│── GET seg001.ts ────────>│── (cache miss) ─────────>│
│<──── segment ────────────│<──── segment ─────────────│
│ │ (cache TTL: 24h immutable)│
│── GET seg002.ts ────────>│── (cache HIT) ────────────│ ← no origin hit
Three Types of Content, Three TTLs
HLS has three distinct content types with fundamentally different caching characteristics:
1. Media Segments (.ts, .fmp4) — TTL: 24 hours, immutable
Segments are content-addressed: once seg001.ts is generated and
named, it never changes. The name uniquely identifies the content.
// Immutable segment — cache forever
w.Header().Set("Cache-Control", "public, max-age=86400, immutable")
w.Header().Set("ETag", `"seg001-v1"`)
This is identical to the approach used for hashed static assets (main.abc123.js).
CDN hit ratios for segment requests should be ~99% once the initial viewers
warm the cache.
Thundering herd implication: When a new segment is published, the first viewer to request it causes a cache miss to origin. All subsequent viewers hit the cache. For popular streams (100k+ viewers), the initial miss is a single request to origin. This is excellent.
2. Variant Playlist (.m3u8 per quality level) — TTL: 5 seconds (live), longer for VOD
The variant playlist (e.g., 720p/playlist.m3u8) lists the available
segments:
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:42
#EXTINF:6.006,
seg042.ts
#EXTINF:6.006,
seg043.ts
#EXTINF:6.006,
seg044.ts
For live streams, the playlist changes every segment duration (typically 2–6 seconds). It must not be cached too long or viewers fall behind the live edge.
// Short TTL for live variant playlist
w.Header().Set("Cache-Control", "public, max-age=5")
w.Header().Set("ETag", etag) // still ETag for conditional requests
The thundering herd problem here: Every viewer polls the variant playlist every ~5 seconds. With 100k viewers, that’s 20k requests/second to the CDN for a single stream’s variant playlist — all simultaneously (viewers synchronize on segment boundaries).
Singleflight at the CDN level is essential here. The lab’s populateCache
function uses singleflight.Group to collapse concurrent playlist requests.
3. Master Playlist (.m3u8 top-level) — TTL: 60 seconds
The master playlist lists the variant streams:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8
This changes rarely (new quality levels, DRM changes). A 60-second TTL allows clients to adapt to changes within a minute.
// Medium TTL for master playlist
w.Header().Set("Cache-Control", "public, max-age=60")
Segment Prefetch
After parsing a variant playlist, the CDN can proactively fetch the next segments from origin before any client requests them:
func (c *Cache) prefetchSegments(playlistURL string, playlist []byte) {
urls := parseSegmentURLs(playlist) // extract seg URLs from M3U8
for _, url := range urls {
if !c.Has(url) {
go c.warmSegment(url) // fetch in background
}
}
}
This converts cache misses on segment requests to cache hits:
Without prefetch:
Viewer arrives → GET seg042.ts (miss) → wait for origin → play
Next viewer → GET seg042.ts (hit) → instant play
With prefetch:
New playlist published → CDN prefetches seg042.ts
Viewer arrives → GET seg042.ts (hit) → instant play
All viewers get cache hits
Prefetch is standard on CDNs like Cloudflare Stream and Fastly.
LL-HLS: Low-Latency HLS
Standard HLS has a live latency of 3–5 segments (~15–30 seconds). This is acceptable for broadcast TV but too high for sports, gaming streams, or live auctions.
LL-HLS (Low-Latency HLS, RFC 8216 Appendix) reduces latency to 2–5 seconds:
- Partial Segments: segments are delivered as they’re being encoded in partial 200ms chunks
- Playlist Delta Updates: only changed lines of the playlist are sent
- Blocking Playlist Request: client sends
_HLS_msn=44parameter; CDN holds the request until segment 44 is available (HTTP long poll)
Client: GET /playlist.m3u8?_HLS_msn=44&_HLS_part=0
CDN: [holds request until segment 44 part 0 is available]
CDN: → 200 OK with updated playlist ← instant delivery at segment publish
LL-HLS requires CDN support. As of 2024, Cloudflare, Fastly, and AWS CloudFront all support LL-HLS.
CMAF: Common Media Application Format
Traditional HLS uses MPEG-2 TS (.ts) container. MPEG-DASH uses fMP4.
These are incompatible, requiring separate encoder pipelines.
CMAF (ISO 23000-19) standardizes on fMP4 as the container for both HLS and DASH:
CMAF Encoder:
Input → fMP4 chunks → HLS playlist (.m3u8 + .cmfv/.cmfa)
→ DASH manifest (.mpd + .cmfv/.cmfa)
One encode, two protocol manifests. Netflix, Apple, and major CDNs use
CMAF. The .ts format is legacy at this point; new deployments should
use fMP4/CMAF.
VOD vs. Live Caching Strategy
| Aspect | VOD | Live |
|---|---|---|
| Segment TTL | Forever (immutable) | Forever (immutable — same!) |
| Variant playlist TTL | Minutes to hours | 2–10 seconds |
| Master playlist TTL | Hours | 30–60 seconds |
| Cache fill | Can prefetch everything | Must chase live edge |
| Thundering herd | Only at launch | Every 5 seconds, always |
| Cache-Control header | immutable | Short max-age |
Try It
make lab-19
# Fetch master playlist
curl http://localhost:8080/stream/master.m3u8 -v
# Fetch 720p variant playlist
curl http://localhost:8080/stream/720p/playlist.m3u8 -v
# Note Cache-Control: max-age=5
# Fetch a segment (first one is a cache miss, note timing)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null
# Fetch the same segment again (cache hit, much faster)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null
# Hit the cache stats endpoint
curl http://localhost:8080/metrics/cache -s | python3 -m json.tool
# Simulate thundering herd on playlist
for i in $(seq 1 20); do
curl -s http://localhost:8080/stream/720p/playlist.m3u8 -o /dev/null &
done
wait
# Check singleflight collapsed these into one origin request
curl http://localhost:8080/metrics/cache
Lab 20 · Observability: Metrics, Logging & SLOs
Run it:
make lab-20
Source:labs/lab-20-observability/main.go
The Problem
A CDN you cannot observe is a CDN you cannot operate. Without metrics:
- You don’t know your cache hit ratio is degrading
- You don’t know latency spiked at 3 AM while you slept
- You can’t tell if a deploy improved or degraded performance
- You can’t define SLAs because you can’t measure SLOs
Production CDN observability has three pillars:
- Metrics: numeric time-series data (Prometheus)
- Structured logs: machine-parseable event records (slog)
- Traces: distributed request tracking (OpenTelemetry — not in this lab)
Prometheus: The Metrics System
Prometheus uses a pull model: the metrics server scrapes your
application’s /metrics endpoint at regular intervals (typically 15–60s).
Your application doesn’t push; it exposes a snapshot of current state.
Metric Types
Counter — monotonically increasing. Never decreases.
var requestsTotal = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "cdn_requests_total",
Help: "Total number of requests served",
},
[]string{"method", "status", "cache"},
)
// Increment on each request
requestsTotal.WithLabelValues("GET", "200", "hit").Inc()
Use counters for: request count, bytes transferred, error count, cache hits.
Gauge — can go up or down. Represents current state.
var cacheSize = prometheus.NewGauge(prometheus.GaugeOpts{
Name: "cdn_cache_size_bytes",
Help: "Current cache size in bytes",
})
// Set on cache eviction/addition
cacheSize.Set(float64(currentSize))
Use gauges for: active connections, cache size, queue depth, goroutine count.
Histogram — samples observations into buckets. Calculates percentiles.
var requestDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "cdn_request_duration_seconds",
Help: "Request duration distribution",
Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5},
},
[]string{"cache"}, // "hit" or "miss"
)
// Record each request's duration
start := time.Now()
// ... serve request ...
requestDuration.WithLabelValues(cacheStatus).Observe(time.Since(start).Seconds())
Use histograms for: request latency, response size, queue wait time.
The Cardinality Trap
Cardinality = the number of unique combinations of label values. High cardinality is Prometheus’s kryptonite.
// WRONG — user_id can have millions of values!
requestsTotal.WithLabelValues("GET", "200", userId).Inc()
// → Millions of time series → Prometheus OOM → pager at 3 AM
Safe labels:
- HTTP method: 5 values (GET, POST, PUT, DELETE, HEAD)
- HTTP status code category: 5 values (1xx–5xx) or discrete codes (~30 values)
- Cache status: 3 values (hit, miss, bypass)
- Region: 10–20 values (US, EU, APAC, …)
- Host: only if you have a bounded number of hosts
Never use as labels:
- User IDs, session IDs, account IDs
- Full URL paths with IDs embedded (
/user/123/profile) - IP addresses
- Trace IDs, request IDs (use logs for per-request data)
Rule of thumb: any label that can have more than ~1000 distinct values in production will cause cardinality explosion.
CDN Metrics Catalog
The lab implements these metrics:
// === Request counters ===
cdn_requests_total{method, status, cache} // "cache" ∈ {hit, miss, bypass}
cdn_bytes_served_total{cache} // bytes, same labels
// === Latency ===
cdn_request_duration_seconds{cache} // histogram, per cache status
// === Cache state ===
cdn_cache_entries // gauge: count of items in cache
cdn_cache_size_bytes // gauge: bytes used
// === Origin ===
cdn_origin_requests_total{status} // requests forwarded to origin
cdn_origin_duration_seconds // histogram: origin TTFB
// === Compression ===
cdn_compression_ratio // histogram: compressed/uncompressed
Structured Access Logging with slog
Go 1.21 introduced log/slog, a structured logging package. Every
request should produce a structured JSON log line:
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
// Per-request log (inside middleware)
logger.Info("request",
"method", r.Method,
"path", r.URL.Path,
"status", status,
"bytes", bytesWritten,
"duration", time.Since(start).Milliseconds(),
"cache", cacheStatus,
"ip", r.RemoteAddr,
"ua", r.Header.Get("User-Agent"),
"referer", r.Header.Get("Referer"),
)
Output:
{
"time": "2025-01-15T14:23:01Z",
"level": "INFO",
"msg": "request",
"method": "GET",
"path": "/image/hero.jpg",
"status": 200,
"bytes": 102400,
"duration": 3,
"cache": "hit",
"ip": "1.2.3.4:54321",
"ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
}
Structured logs enable direct processing in log aggregators (Loki, Splunk, Elasticsearch) without parsing regex patterns.
Key PromQL Recipes
Cache Hit Ratio
# Instant hit ratio (last 5 minutes)
rate(cdn_requests_total{cache="hit"}[5m])
/
rate(cdn_requests_total[5m])
Target: > 0.90 (90% hit ratio). Below 0.80 indicates a caching problem.
Byte Hit Ratio
# Bytes served from cache vs. total bytes served
rate(cdn_bytes_served_total{cache="hit"}[5m])
/
rate(cdn_bytes_served_total[5m])
Byte hit ratio is more meaningful than request hit ratio for billing purposes (CDN vendors charge for bytes to/from origin).
p99 Latency
# 99th percentile request latency
histogram_quantile(0.99,
sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)
)
p99 Latency by Cache Status
# Compare hit vs. miss latency
histogram_quantile(0.99,
sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le, cache)
)
Expect cache hits to be 5–100x faster than misses.
Error Rate (5xx)
# Percentage of 5xx responses
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])
Origin Request Rate
# Origin requests per second (should be low relative to total)
rate(cdn_origin_requests_total[5m])
Requests Per Second
sum(rate(cdn_requests_total[1m]))
SLOs and Error Budgets
An SLO (Service Level Objective) defines the target reliability:
SLO: 99.9% of requests return a successful response (2xx/3xx)
within 500ms at p99, measured over 30 days
An error budget is the allowed amount of failure:
30-day error budget = 30 * 24 * 60 * 60 * (1 - 0.999) = 2592 seconds = 43.2 minutes
If your error budget is consumed, you stop feature deployments and focus on reliability until the budget refills.
SLO Burn Rate
The burn rate measures how fast you’re consuming the error budget:
# 1-hour burn rate (how fast are we consuming monthly budget?)
(
sum(rate(cdn_requests_total{status=~"5.."}[1h]))
/
sum(rate(cdn_requests_total[1h]))
)
/ (1 - 0.999) # error budget fraction
A burn rate of 1.0 = consuming budget at exactly the sustainable rate. Burn rate > 14.4 = exhausting the monthly budget in 2 hours → page immediately. Google SRE Workbook recommends multi-window alerting:
- Fast burn (1h + 5m windows): alert for rapid consumption
- Slow burn (3d + 6h windows): alert for gradual degradation
Grafana Dashboard
The lab exposes:
/metrics— Prometheus metrics endpoint/metrics/cache— JSON cache diagnostics
Point Grafana at Prometheus and import a CDN dashboard. The docker-compose.yml
in Lab 21 wires up the full stack (Prometheus + Grafana).
Try It
make lab-20
# Send some traffic to generate metrics
for i in $(seq 1 100); do
curl -s http://localhost:8080/item/$((RANDOM % 20)) -o /dev/null
done
# View Prometheus metrics
curl -s http://localhost:8080/metrics | grep cdn_
# View cache diagnostics
curl -s http://localhost:8080/metrics/cache | python3 -m json.tool
# Compute hit ratio manually from raw counters
HITS=$(curl -s http://localhost:8080/metrics | grep 'cdn_requests_total{.*cache="hit"' | awk '{print $2}')
TOTAL=$(curl -s http://localhost:8080/metrics | grep 'cdn_requests_total' | grep -v '^#' | awk '{sum+=$2} END{print sum}')
echo "Hit ratio: $(echo "scale=3; $HITS/$TOTAL" | bc)"
Lab 21 · The Full System
Run it:
make lab-21
Source:labs/lab-21-full-system/main.go
Compose:labs/lab-21-full-system/docker-compose.yml
The Architecture
This final lab wires together everything from Labs 1–20 into a production-representative CDN system. It is a microcosm of how real CDNs like Cloudflare, Fastly, and AWS CloudFront are structured.
┌─────────────────────────────────────────────────┐
│ CDN System │
│ │
Internet ──> │ Edge NYC (:8080) ──\ │
│ (singleflight, \ │
│ signed URL verify, → Shield (:8082) ──> Origin (:9001)
│ 30s TTL, metrics) / (singleflight,
│ / 300s TTL,
│ Edge LHR (:8081) ──/ metrics)
│ (same config)
│ │
│ Prometheus (:9090) Grafana (:3000) │
└─────────────────────────────────────────────────┘
Component Responsibilities
| Component | Port | Role |
|---|---|---|
| Origin | :9001 | Source of truth. Serves all content. Simulates 50ms processing delay. |
| Shield | :8082 | Aggregation layer. One connection to origin for many edge requests. 300s TTL. |
| Edge NYC | :8080 | User-facing edge in New York. Validates signed URLs. 30s TTL. |
| Edge LHR | :8081 | User-facing edge in London. Same config as NYC. 30s TTL. |
| Prometheus | :9090 | Scrapes metrics from all nodes. |
| Grafana | :3000 | Dashboards over Prometheus. |
Multi-Tier TTL Design
The TTL cascade is intentional and critical:
User ── Edge (30s TTL) ── Shield (300s TTL) ── Origin
Why Edge TTL < Shield TTL?
The edge serves users directly. Fresh content reaches users within 30 seconds of origin publication. But the edge collapses requests from many users into one request to the shield.
The shield’s 300s TTL means: for any given piece of content, the shield makes at most one request to origin per 5 minutes. A popular item might be requested by 10,000 users/minute across both edges — the shield ensures origin sees only 1 request every 5 minutes for that item.
Without shield:
10,000 users/min × 30s TTL edge = 333 cache misses/min to origin
(every edge miss → origin request)
With shield (300s TTL):
10,000 users/min × 30s TTL edge = 333 edge misses/min
→ All go to shield
→ Shield hit ratio ~98% (only 1 miss per 5 min)
→ ~7 requests/min reach origin
This is a 50× reduction in origin load.
Singleflight at Two Layers
Both edge and shield run singleflight.Group:
type CachingProxy struct {
cache *Cache
origin string
group singleflight.Group // deduplicates concurrent misses
}
func (p *CachingProxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
key := cacheKey(r)
if item, ok := p.cache.Get(key); ok {
serveFromCache(w, item)
return
}
// Multiple concurrent requests for the same key?
// singleflight collapses them into ONE upstream request
result, _, _ := p.group.Do(key, func() (interface{}, error) {
return p.fetchFromUpstream(r)
})
item := result.(*CacheItem)
p.cache.Set(key, item)
serveFromCache(w, item)
}
The thundering herd cascade: without singleflight at both layers, a popular item expiring simultaneously at 1,000 edge nodes would cause 1,000 concurrent requests to the shield, which would cause 1,000 concurrent requests to origin. Singleflight at edge reduces 1,000 → 1 per edge node. Singleflight at shield reduces 2 edge misses → 1 shield request to origin.
Signed URL Verification
The edge validates HMAC-signed URLs before serving any content:
func (e *Edge) verifySignedURL(r *http.Request) bool {
sig := r.URL.Query().Get("sig")
if sig == "" { return false } // or true for public content
expires, _ := strconv.ParseInt(r.URL.Query().Get("expires"), 10, 64)
if time.Now().Unix() > expires {
return false // expired
}
keyver := r.URL.Query().Get("keyver")
key, ok := e.signingKeys[keyver]
if !ok { return false }
canonical := fmt.Sprintf("GET\n%s\n%d\n", r.URL.Path, expires)
expected := computeHMAC(key, canonical)
return hmac.Equal([]byte(sig), []byte(expected))
}
The shield and origin do not re-verify — they trust the edge. This is the standard trust boundary design: validation happens at the first authorized boundary, not repeatedly at every tier.
Docker Compose
# labs/lab-21-full-system/docker-compose.yml
services:
origin:
build: .
command: ["./cdn-lab21", "-role=origin", "-addr=:9001"]
ports: ["9001:9001"]
shield:
build: .
command: ["./cdn-lab21", "-role=shield", "-addr=:8082", "-upstream=http://origin:9001"]
ports: ["8082:8082"]
depends_on: [origin]
edge-nyc:
build: .
command: ["./cdn-lab21", "-role=edge", "-addr=:8080", "-upstream=http://shield:8082", "-pop=NYC"]
ports: ["8080:8080"]
depends_on: [shield]
edge-lhr:
build: .
command: ["./cdn-lab21", "-role=edge", "-addr=:8081", "-upstream=http://shield:8082", "-pop=LHR"]
ports: ["8081:8081"]
depends_on: [shield]
prometheus:
image: prom/prometheus:latest
volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]
ports: ["9090:9090"]
grafana:
image: grafana/grafana:latest
ports: ["3000:3000"]
depends_on: [prometheus]
Prometheus Configuration
# labs/lab-21-full-system/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'cdn-edge'
static_configs:
- targets: ['edge-nyc:8080', 'edge-lhr:8081']
- job_name: 'cdn-shield'
static_configs:
- targets: ['shield:8082']
- job_name: 'cdn-origin'
static_configs:
- targets: ['origin:9001']
Observing the System Under Load
With the system running, generate load and observe the cascade:
# Generate 1000 requests across 50 unique URLs
for i in $(seq 1 1000); do
curl -s "http://localhost:8080/item/$((RANDOM % 50))" -o /dev/null
done
# Check metrics at each tier
# Edge NYC hit ratio
curl -s http://localhost:8080/metrics | grep cdn_requests_total
# Shield hit ratio
curl -s http://localhost:8082/metrics | grep cdn_requests_total
# Origin request count (should be tiny compared to edge total)
curl -s http://localhost:9001/metrics | grep cdn_requests_total
You should see:
- Edge hit ratio: ~80–90% (after warmup)
- Shield hit ratio: ~95–99%
- Origin requests: ~1–5% of edge total
Failure Modes & Resilience
Origin failure
Origin down → Shield gets 502/503 from origin
→ Shield returns stale-if-error (from Cache-Control)
→ Edge returns stale content to users
This is the “stale-if-error” pattern from Lab 7, applied system-wide. Users see slightly stale content rather than errors.
Shield failure
Shield down → Edge cannot reach upstream
→ Edge serves stale (if available) or 503
In production, the shield tier has multiple nodes behind a load balancer. A single shield failure routes to another shield node.
Edge failure
Edge-NYC down → Geo routing redirects NYC users to Edge-LHR
→ Higher latency but service continues
This is the health-check failover from Lab 15. Each edge registers with the geo-routing layer and is removed from rotation when health checks fail.
Path to Production
To harden this system for real traffic:
- Replace in-memory cache with Redis: enables shared cache state across edge instances and survives restarts
- Add TLS termination: Let’s Encrypt or ACME protocol for automatic certificate provisioning
- Add rate limiting: token bucket per IP/user with Redis-backed counters
- Add WAF rules: block common attack patterns (SQLi, XSS, path traversal)
- Add CDN purge API: authenticated endpoint to purge cache keys by tag
- Add distributed tracing: OpenTelemetry spans across edge → shield → origin
- Add chaos testing: kill origin/shield randomly to validate resilience
Try It
# Start the full system with Docker Compose
cd labs/lab-21-full-system
docker compose up --build
# In another terminal: generate signed URL and fetch content
TOKEN=$(curl -s "http://localhost:8080/sign?path=/article/1&ttl=300")
curl -s "$TOKEN" -v
# View Prometheus metrics
open http://localhost:9090
# View Grafana (default credentials: admin/admin)
open http://localhost:3000
# Generate load test
for i in $(seq 1 5000); do
curl -s "http://localhost:8080/item/$((RANDOM % 100))" -o /dev/null &
done
wait
# Observe the request waterfall through the tiers
curl -s http://localhost:8080/metrics | grep cdn_requests_total | head -5
curl -s http://localhost:8082/metrics | grep cdn_requests_total | head -5
curl -s http://localhost:9001/metrics | grep cdn_requests_total | head -5
Appendix A · Production Deployment Guide
This appendix covers deploying a CDN to public internet traffic. It is written for a principal engineer who has worked through the labs and is now hardening the system for production.
CDN Vendor Decision Matrix
For most organizations, building CDN infrastructure from scratch is not the right choice. Use a managed CDN vendor unless you have >1 Tbps of traffic and specific requirements that no vendor meets.
| Vendor | Best for | Differentiators | Pricing model |
|---|---|---|---|
| Cloudflare | Security-first CDN, DDoS protection, global network | Largest anycast network, Workers edge compute, free tier, Zero Trust | Per-seat or flat-rate (no bandwidth cost on higher tiers) |
| Fastly | Developers, real-time purging, full VCL control | Instant purge API, Varnish/VCL programmability, Compute@Edge WASM | Per-GB + per-request |
| AWS CloudFront | AWS-native applications, Lambda@Edge, tight IAM integration | Deep AWS integration, Lambda@Edge, S3 origins, OAC | Per-GB + per-request + per-origin HTTPS |
| Akamai | Enterprise, compliance, media delivery | Largest network footprint, MPLS backbone, strict SLAs | Enterprise contracts |
| BunnyCDN | Cost-sensitive mid-tier | Very low per-GB pricing, simple API | Per-GB only |
When to build your own CDN
Build only if all of these are true:
- Traffic > 1 Tbps sustained
- Specific protocol requirements (custom routing, custom protocols)
- Cost at scale exceeds vendor pricing significantly
- In-house expertise to operate CDN infrastructure 24/7
Companies that built their own: Netflix (Open Connect), Google, Meta, ByteDance/TikTok. Everyone else uses vendors.
Origin Protection
Your origin servers must not be directly reachable from the internet when you’re using a CDN. If they are, attackers can bypass the CDN:
Attacker discovers origin IP → sends traffic directly → DDoS bypasses CDN
Origin Protection Methods
1. Cloudflare-to-Origin mTLS (Authenticated Origin Pulls)
# nginx: only accept connections presenting Cloudflare's client certificate
ssl_verify_client on;
ssl_client_certificate /etc/nginx/cloudflare-origin-pull-ca.pem;
Cloudflare presents its certificate when connecting to your origin. Your origin rejects any connection without it.
2. IP allowlist at origin
Only allow traffic from CDN IP ranges:
# Cloudflare IPs: https://www.cloudflare.com/ips/
iptables -A INPUT -p tcp --dport 443 -s 103.21.244.0/22 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j DROP
Cloudflare, Fastly, and AWS publish their IP ranges. Use the vendor’s API to pull the current list (IPs change).
3. Shared secret header
CDN adds a secret header; origin validates it:
CDN → Origin: X-CDN-Secret: <32-byte-random-secret>
Origin: reject any request missing this header
This is simpler but less secure than mTLS (header could leak in logs).
4. Private networking
Put origin on a private VPC; CDN connects via private peering:
- Cloudflare Magic WAN
- Fastly Secure Edge Connector
- AWS CloudFront + VPC Origins
WAF: Web Application Firewall
A WAF sits in front of your origin (at the CDN edge) and blocks malicious traffic before it reaches your application.
Common WAF Rule Sets
OWASP Core Rule Set (CRS): covers OWASP Top 10 attacks
- SQLi:
GET /api/user?id=1 OR 1=1 - XSS:
GET /search?q=<script>alert(1)</script> - Path traversal:
GET /files/../../../etc/passwd - Remote file inclusion, SSRF
Bot management rules:
- Block known malicious user agents
- Challenge suspicious traffic (CAPTCHA)
- Rate-limit scraper patterns
Custom rules for your application:
- Block geographic regions not served
- Rate-limit unauthenticated API endpoints
- Block requests without required headers
Cloudflare WAF Setup
# Via Cloudflare API: enable OWASP Managed Rules
curl -X PUT "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/phases/http_request_firewall_managed/entrypoint" \
-H "Authorization: Bearer $CF_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"rules": [{
"action": "managed_ruleset",
"expression": "true",
"action_parameters": {
"id": "efb7b8c949ac4650a09736fc376e9aee"
}
}]
}'
WAF False Positive Management
WAFs generate false positives. Common causes:
- API requests containing JSON with SQL-like syntax
- Developer tools testing edge cases
- Search queries containing special characters
Use audit mode first (log but don’t block), then move to block mode after reviewing false positives. Most vendors support per-rule enable/disable.
TLS Configuration
Certificate Provisioning
Cloudflare: Handles certificate provisioning automatically via Universal SSL or Advanced Certificate Manager. Zero configuration required.
Self-managed: Use ACME (Let’s Encrypt) with automatic renewal:
# certbot with nginx
certbot --nginx -d cdn.example.com --agree-tos --email ops@example.com
# Auto-renewal via cron
0 0,12 * * * certbot renew --quiet
TLS Minimum Version
Disable TLS 1.0 and 1.1 (deprecated; known attacks):
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:...
TLS 1.3 should be preferred everywhere. TLS 1.2 minimum is the current industry standard (PCI DSS requires at least 1.2).
HSTS
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Once HSTS is deployed and a client sees this header, the browser will refuse to make any HTTP request to this origin for 31,536,000 seconds (1 year). Submit to the HSTS preload list for even stronger enforcement.
Cost Optimization
CDN costs are typically:
- Bandwidth to end users: $0.01–0.08/GB depending on vendor and region
- Requests: $0.001–0.01 per 10,000 requests
- Origin egress: typically covered by CDN bandwidth; watch for AWS data transfer charges separately
Optimization Strategies
1. Maximize cache hit ratio
Every CDN hit replaces a CDN miss. CDN bandwidth is cheaper than:
- Origin bandwidth (higher utilization costs)
- Origin compute (request processing)
- Origin database queries
Target > 90% byte hit ratio. Each 1% improvement in hit ratio on 100 TB/day = 1 TB/day saved from origin, potentially saving thousands per month.
2. Compression
Compressing assets before caching multiplies your CDN capacity:
10 TB/day uncompressed HTML/CSS/JS
→ Brotli compression ~70% ratio
→ 3 TB/day stored and served
→ 70% bandwidth cost reduction
3. Choose regions wisely
Premium regions (Australia, South America, India) cost 2–5× more per GB than US/EU. If your user base is primarily US/EU, serving Asia from US CDN PoPs (higher latency but lower cost) may be acceptable for static assets.
4. Origin Shield
Origin shield (covered in Lab 13) reduces origin-bound requests. CDN vendors charge less for shield-to-origin bandwidth than edge-to-origin in some pricing models. More importantly, it reduces origin compute costs.
Deployment Checklist
Before going public with your CDN:
- Origin is not directly reachable from internet (IP allowlist or mTLS)
- WAF is enabled with OWASP CRS in audit mode → block mode after validation
- TLS 1.2+ minimum; TLS 1.3 preferred
- HSTS header set on all HTTPS responses
- Cache-Control headers set correctly on all content types
- Sensitive paths (admin, API tokens, auth callbacks) bypassed from CDN caching
- Purge API configured and tested
- Metrics and alerting configured (hit ratio, latency, error rate)
- SLOs defined and error budget dashboards live
- Load test with traffic 10× expected peak
- Incident runbook written and shared with on-call team
Appendix B · HTTP Caching Headers Reference
A complete reference for every HTTP header relevant to CDN caching. Cross-referenced against RFC 7234 (HTTP/1.1 Caching), RFC 9110 (HTTP Semantics), RFC 9111 (HTTP Caching), and RFC 5861 (stale extensions).
Cache-Control
The primary mechanism for controlling caching behavior. Both request and response headers; semantics differ.
Response Cache-Control Directives
max-age=<seconds>
Content is fresh for this many seconds after the Date response header.
Cache-Control: max-age=3600
After 3600 seconds, the cached response is stale. CDN must revalidate or refetch from origin.
Gotcha:
max-ageis relative toDate, not to when the CDN received the response. If origin is slow and the response travels for 5 seconds, the CDN’s effective max-age is reduced by 5 seconds.
s-maxage=<seconds>
Overrides max-age for shared caches only (CDN, proxy). Does not
affect browser cache. Ideal for: short browser TTL + long CDN TTL.
Cache-Control: max-age=60, s-maxage=3600
Browser caches the page for 1 minute; CDN caches it for 1 hour.
no-cache
Does not mean “don’t cache”. Means: store the response, but revalidate
with origin before every use. Equivalent to max-age=0, must-revalidate.
Cache-Control: no-cache
The CDN will send a conditional request (If-None-Match or If-Modified-Since)
for every cache hit. Origin returns 304 if unchanged (fast), or 200 + new body.
no-store
Do not cache at all. Not in memory, not on disk, not in shared caches.
Cache-Control: no-store
Use for: authentication tokens, session data, personalized responses.
private
Cache only in the browser (private cache). CDN/proxies must not cache.
Cache-Control: private, max-age=3600
Use for: user-specific pages (shopping cart, account dashboard).
public
Cache in all caches (browser + CDN), even for responses to requests
with Authorization headers.
Cache-Control: public, max-age=86400
Without public, responses to authenticated requests are not cached
by CDNs by default (even if max-age is set).
must-revalidate
Once stale, must revalidate before serving. Do not serve stale content even if the origin is unavailable.
Cache-Control: max-age=3600, must-revalidate
Contrast with stale-if-error which allows serving stale when origin fails.
proxy-revalidate
Same as must-revalidate but applies only to shared caches (CDN/proxy),
not browsers.
immutable
The response body will not change during its freshness lifetime.
Browser (and some CDNs) will not send conditional revalidation requests
until after max-age expires.
Cache-Control: public, max-age=31536000, immutable
Use for: hashed assets (main.abc123.js), HLS segments, versioned files.
Saves one conditional request per asset per page load.
stale-while-revalidate=<seconds>
After max-age expires, continue serving the stale response while
revalidating in the background. See Lab 7.
Cache-Control: max-age=60, stale-while-revalidate=600
From 60s to 660s after Date: serve stale, trigger background revalidation.
After 660s: must revalidate before serving.
stale-if-error=<seconds>
If origin returns 5xx (or is unreachable), serve the stale cached response for up to this many seconds beyond the normal expiry.
Cache-Control: max-age=3600, stale-if-error=86400
Serve stale for up to 24 hours during origin outage. Combine with
stale-while-revalidate for both performance and resilience.
no-transform
Prohibit intermediate caches (CDN) from transforming the response body. Prevents CDN from applying compression, image optimization, or minification.
Request Cache-Control Directives
no-cache
Force the CDN to revalidate with origin before returning a cached response. Used by browsers when the user clicks “Refresh”.
no-store
Request that the response not be cached (hint; servers may ignore).
max-age=<seconds> (request)
Accept only responses not older than this many seconds (fresh or revalidated).
max-stale=<seconds> (request)
Accept stale responses up to this many seconds past their expiry.
min-fresh=<seconds> (request)
Accept only responses that will remain fresh for at least this many more seconds.
only-if-cached (request)
Return a cached response or 504 Gateway Timeout. Used by clients that want to avoid network requests.
Expires (Legacy)
The original HTTP/1.0 cache expiry mechanism. Specifies an absolute date:
Expires: Wed, 21 Oct 2025 07:28:00 GMT
Cache-Control: max-age overrides Expires when both are present.
Expires is still useful for backwards compatibility with very old clients
and proxies that don’t understand Cache-Control.
Gotcha: If
Expiresis in the past, the response is immediately stale. If the value is malformed or invalid, it’s treated as expired.
ETag
An opaque validator for a specific version of a resource:
ETag: "686897696a7c876b7e"
ETag: W/"686897696a7c876b7e" (weak ETag)
Strong ETag ("...") means byte-for-byte identical content.
Weak ETag (W/"...") means semantically equivalent content (may have
different whitespace, compression, etc.).
Revalidation with ETag:
Client: GET /resource HTTP/1.1
If-None-Match: "686897696a7c876b7e"
Origin: 304 Not Modified (if ETag matches — no body)
200 OK + new body + new ETag (if changed)
ETag best practices:
- Use content hash (SHA-1, MD5, xxhash) for strong ETags
- For database-backed content: use
version_numberorupdated_at - Avoid using timestamps alone — they don’t reflect content changes and can cause spurious mismatches
Last-Modified / If-Modified-Since
The older revalidation mechanism (HTTP/1.0):
Response: Last-Modified: Tue, 15 Nov 2024 08:12:31 GMT
Request: If-Modified-Since: Tue, 15 Nov 2024 08:12:31 GMT
Response: 304 Not Modified (if not modified since that date)
200 OK + new body (if modified)
Use ETag when possible — it’s more precise. Last-Modified is only
second-resolution; files updated multiple times per second may have
the same Last-Modified but different content.
Vary
Tells caches that the response varies based on request headers:
Vary: Accept-Encoding
Vary: Accept-Language
Vary: Accept, Accept-Encoding
With Vary: Accept-Encoding, the CDN stores a separate cached response
for each Accept-Encoding value:
GET /page.html Accept-Encoding: gzip → cache key: /page.html#gzip
GET /page.html Accept-Encoding: br → cache key: /page.html#brotli
GET /page.html (no Accept-Encoding) → cache key: /page.html#identity
Warning: Vary: User-Agent or Vary: Cookie creates cardinality
explosion — a separate cache entry per unique User-Agent string (thousands).
Avoid unless necessary. CDNs often ignore Vary: Cookie for this reason.
Surrogate-Control / Surrogate-Key (CDN-specific)
Not in RFCs; vendor-specific cache control for CDN-only directives:
Surrogate-Control: max-age=86400
This header is intended for the CDN only. The CDN strips it before
forwarding to the browser, so the browser uses its own Cache-Control.
Surrogate-Key (Fastly) / Cache-Tag (Cloudflare):
Surrogate-Key: article-123 author-456 category-sports
Cache-Tag: article-123 author-456 category-sports
Associates the cached response with logical tags. Enables instant purge by tag rather than by URL. See Lab 9.
CDN-Cache-Control
A CDN-specific Cache-Control variant proposed as a standard (draft):
CDN-Cache-Control: max-age=600
Cache-Control: max-age=60
CDN respects CDN-Cache-Control (600s TTL) while browsers respect
Cache-Control (60s TTL). Supported by Cloudflare, Fastly, and others.
More explicit than s-maxage because it targets CDNs specifically rather
than all shared caches.
Pragma: no-cache (Legacy)
HTTP/1.0 equivalent of Cache-Control: no-cache. Ignore in new code;
handle for backwards compatibility:
Pragma: no-cache → treated as Cache-Control: no-cache by modern caches
Age
Set by the CDN/proxy to indicate how old a cached response is:
Age: 1234
Age: 1234 means this response was fetched from origin 1234 seconds ago.
Remaining freshness = max-age - Age. If Age >= max-age, the response
is stale even before it leaves the CDN.
Warning (Deprecated in RFC 9111)
Formerly used to indicate stale or revalidation state:
Warning: 110 - "Response is Stale"
Warning: 214 - "Transformation Applied"
RFC 9111 deprecated all Warning headers. Do not generate them; ignore if received.
Quick Reference Table
| Header | Direction | Purpose |
|---|---|---|
Cache-Control: max-age | Response | Cache TTL in seconds |
Cache-Control: s-maxage | Response | CDN-only TTL override |
Cache-Control: no-cache | Response | Revalidate before use |
Cache-Control: no-store | Response | Never cache |
Cache-Control: private | Response | Browser cache only |
Cache-Control: public | Response | All caches including CDN |
Cache-Control: immutable | Response | Never revalidate during freshness |
Cache-Control: stale-while-revalidate | Response | Async background refresh |
Cache-Control: stale-if-error | Response | Serve stale on origin failure |
ETag | Response | Version identifier |
Last-Modified | Response | Last change timestamp |
Vary | Response | Differentiate cache by request headers |
Age | Response | Time in cache (seconds) |
Expires | Response | Absolute expiry date (legacy) |
Surrogate-Control | Response | CDN-only TTL (stripped before browser) |
Surrogate-Key / Cache-Tag | Response | Logical purge grouping |
If-None-Match | Request | Conditional request by ETag |
If-Modified-Since | Request | Conditional request by date |
Cache-Control: no-cache | Request | Force CDN revalidation |
Appendix C · PromQL Recipes for CDN Monitoring
A cookbook of production-ready PromQL queries for CDN observability. Each query assumes the metric names from Lab 20. Adapt label names to your actual instrumentation.
Hit Ratio Queries
Request-level hit ratio (5-minute window)
rate(cdn_requests_total{cache="hit"}[5m])
/
rate(cdn_requests_total[5m])
Byte hit ratio (more meaningful for billing)
rate(cdn_bytes_served_total{cache="hit"}[5m])
/
rate(cdn_bytes_served_total[5m])
Hit ratio by PoP
rate(cdn_requests_total{cache="hit"}[5m]) by (pop)
/
rate(cdn_requests_total[5m]) by (pop)
Hit ratio trend (1-hour average over past 24 hours)
avg_over_time(
(
rate(cdn_requests_total{cache="hit"}[1h])
/
rate(cdn_requests_total[1h])
)[24h:1h]
)
Latency Queries
p50, p95, p99 request latency (across all requests)
histogram_quantile(0.50, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))
p99 latency, split by cache status
histogram_quantile(0.99,
sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le, cache)
)
This reveals the latency gap between cache hits and misses.
p99 origin latency (time spent fetching from origin)
histogram_quantile(0.99,
sum(rate(cdn_origin_duration_seconds_bucket[5m])) by (le)
)
Latency heatmap (for Grafana heatmap panel)
sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)
Traffic Volume Queries
Requests per second
sum(rate(cdn_requests_total[1m]))
Requests per second by status code class
sum(rate(cdn_requests_total[1m])) by (status)
Bytes served per second
sum(rate(cdn_bytes_served_total[1m]))
Bandwidth in Mbps
sum(rate(cdn_bytes_served_total[1m])) * 8 / 1e6
Top 10 most-requested paths (requires path label — use sparingly)
topk(10, sum(rate(cdn_requests_total[5m])) by (path))
Warning: Only add
pathlabel if your URL space is bounded. Unbounded paths cause cardinality explosion.
Error Rate Queries
Overall error rate (5xx)
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])
Error rate by status code
sum(rate(cdn_requests_total{status=~"5.."}[5m])) by (status)
Origin error rate (errors returned by origin)
rate(cdn_origin_requests_total{status=~"5.."}[5m])
/
rate(cdn_origin_requests_total[5m])
Error rate that would violate 99.9% SLO
# If this is > 0, you're burning error budget
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])
> 0.001
SLO & Error Budget Queries
Error budget consumption rate (ratio to 30-day budget)
# Burn rate > 1 means you'll exhaust the monthly budget before the month ends
(
rate(cdn_requests_total{status=~"5.."}[1h])
/
rate(cdn_requests_total[1h])
)
/ (1 - 0.999)
Remaining error budget (fraction, 30-day window)
1 - (
sum(increase(cdn_requests_total{status=~"5.."}[30d]))
/
sum(increase(cdn_requests_total[30d]))
/
0.001 # error budget = 1 - SLO = 1 - 0.999
)
Multi-window burn rate (Google SRE approach)
# Fast burn: 1-hour window, threshold ~14.4× for 2-hour exhaustion alert
(
rate(cdn_requests_total{status=~"5.."}[1h])
/
rate(cdn_requests_total[1h])
)
/
(1 - 0.999)
> 14.4
Cache Efficiency Queries
Cache entry count
cdn_cache_entries
Cache size in MB
cdn_cache_size_bytes / 1e6
Cache miss rate (requests going to origin)
rate(cdn_origin_requests_total[5m])
/
rate(cdn_requests_total[5m])
Average compression ratio
histogram_quantile(0.50,
sum(rate(cdn_compression_ratio_bucket[5m])) by (le)
)
Alerting Rules
These are example Prometheus alerting rules (for prometheus.yml):
groups:
- name: cdn_alerts
rules:
# Hit ratio dropped below 80% — caching problem
- alert: CDNHitRatioLow
expr: |
rate(cdn_requests_total{cache="hit"}[5m])
/
rate(cdn_requests_total[5m])
< 0.80
for: 10m
labels:
severity: warning
annotations:
summary: "CDN hit ratio below 80% for 10 minutes"
description: "Current hit ratio: {{ $value | humanizePercentage }}"
# p99 latency above 1 second
- alert: CDNHighLatency
expr: |
histogram_quantile(0.99,
sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "CDN p99 latency above 1s"
# Error rate above 1% (burning 99.9% SLO)
- alert: CDNHighErrorRate
expr: |
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])
> 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "CDN error rate above 1%"
description: "Current error rate: {{ $value | humanizePercentage }}"
# Fast burn: exhausting monthly error budget in under 2 hours
- alert: CDNFastBurn
expr: |
(
rate(cdn_requests_total{status=~"5.."}[1h])
/
rate(cdn_requests_total[1h])
)
/ (1 - 0.999) > 14.4
for: 2m
labels:
severity: critical
page: "true"
annotations:
summary: "CDN burning error budget 14.4× faster than sustainable rate"
Grafana Dashboard Layout
Recommended panel organization:
Row 1: Traffic Overview
- Total RPS (stat)
- Bandwidth Mbps (stat)
- Error rate % (stat)
Row 2: Cache Performance
- Hit ratio over time (graph)
- Byte hit ratio over time (graph)
- Cache size (graph)
Row 3: Latency
- p50/p95/p99 latency (graph)
- Latency heatmap (heatmap panel)
- Origin latency p99 (graph)
Row 4: Errors
- Error rate over time (graph)
- Error rate by status code (graph)
- SLO burn rate (graph with threshold annotation)
Row 5: Infrastructure
- Cache entry count (graph)
- Goroutine count (graph)
- Memory usage (graph)
Appendix D · Mental Models & Decision Trees
A collection of decision frameworks and mental models for CDN engineering. These are the heuristics principal engineers use to reason about CDN behavior quickly, without needing to run simulations.
Should I Cache This?
Is the response identical for all users?
├── No → Cache-Control: private (browser only) or no-store
│ Examples: shopping cart, account dashboard, personalized feed
│
└── Yes → Can it be revalidated cheaply (ETag / Last-Modified)?
├── Yes → Cache with short max-age + conditional request support
│ Examples: API responses with version IDs, database records
│
└── No → How often does the content change?
├── Never → max-age=31536000, immutable
│ Examples: hashed assets, video segments
├── Rarely → max-age=86400 (1 day)
│ Examples: fonts, PDFs, images
├── Daily → max-age=3600 (1 hour) + stale-while-revalidate
│ Examples: product listings, blog articles
├── Frequently → max-age=60 + stale-while-revalidate=300
│ Examples: inventory counts, prices
└── Real-time → max-age=5 (live streams) or no-store
Examples: live sports scores, financial ticks
What TTL Should I Use?
The TTL tradeoff is always: freshness vs. CDN efficiency.
| Content update frequency | Recommended TTL | Pattern |
|---|---|---|
| Never (hashed assets) | 1 year | immutable |
| Months (legal pages) | 1–7 days | Long TTL + surrogate keys for emergency purge |
| Days (blog posts) | 1 hour + SWR 24h | High hit ratio + fresh within a day |
| Hours (product pages) | 5 min + SWR 1h | SWR bridges the gap |
| Minutes (inventory) | 60s + SWR 300s | Accept slight staleness |
| Seconds (live playlist) | 5s | No SWR (viewers need current segment list) |
| Never cache (auth, cart) | no-store | Keep off CDN entirely |
SWR = stale-while-revalidate. Serve stale content instantly while fetching fresh copy in the background.
Thundering Herd Decision Tree
Multiple concurrent requests for the same uncached resource?
│
└── Is this a shared cache (CDN, origin shield)?
├── Yes → Use singleflight.Group to collapse concurrent misses
│ → Only ONE upstream request, all waiters receive the result
│
└── No → Multiple browser tabs? Multiple microservices?
└── Client-side: stagger requests with jitter
Service-side: use a distributed lock (Redis SETNX)
Stale TTL expiry causes burst at cache expiry time?
│
└── Use stale-while-revalidate:
→ Content remains "fresh" for 30s more (serve stale instantly)
→ Background fetch rehydrates the cache
→ Eliminates the expiry spike entirely
OR use XFetch (probabilistic early refresh):
→ Randomly start refreshing BEFORE expiry
→ No single expiry moment → no thundering herd
Cache Key Design Checklist
When adding a new query parameter or header to your application, ask:
-
Does it change the response body?
Yes → Include it in cache key
No → Exclude it (reduces cache fragmentation) -
Is it unbounded (e.g., user ID, session token)?
Yes → Never include in cache key (creates cache pollution)
Strip it, or bypass CDN caching entirely -
Is it a tracking parameter (utm_source, fbclid, gclid)?
Strip it from the cache key (never affects content)
UseCache-KeyVCL / Cloudflare transform rules / Fastly custom VCL -
Does the response vary by
Accept-Encoding?
Yes → AddVary: Accept-Encoding
CDN creates separate cache entries for gzip, brotli, identity -
Does the response vary by
Accept-Language?
Yes →Vary: Accept-Languageor use URL path per language
Cache Tier Selection
Traffic volume per origin?
├── < 100 RPS → Single CDN layer is sufficient
│
├── 100–10k RPS → Add CDN with origin shield
│ Edge (30s TTL) → Shield (300s TTL) → Origin
│
└── > 10k RPS → Multi-tier CDN + distributed origin shield
+ in-process cache at origin for hot items
+ singleflight at every tier
Memory cache sizing rule of thumb:
Working set size (bytes) = median_item_size × unique_items_per_hour × revalidation_window
If you have 10k unique products, each averaging 10 KB, refreshed every 5 minutes:
10,000 items × 10 KB = 100 MB for the full working set
If you can only cache 50 MB, you’ll have a ~50% miss ratio in steady state (assuming uniform access). Add an LRU layer with hot-item bias.
Compression Algorithm Selection
Content type?
├── Already compressed (JPEG, PNG, video, WASM)
│ → Skip compression (will be larger, not smaller)
│
└── Compressible (HTML, CSS, JS, JSON, SVG, plain text)
│
└── Client supports brotli (Accept-Encoding: br)?
├── Yes → Use brotli (20–30% better ratio than gzip, similar CPU)
│
└── No → Client supports gzip?
├── Yes → Use gzip (universal support, fast)
└── No → Serve identity (uncompressed)
Response size?
├── < 1 KB → Skip compression (overhead > savings for tiny responses)
└── > 1 KB → Always compress compressible content
Origin Shield Sizing
Number of shield nodes:
If traffic_to_origin_without_shield = T requests/s
And shield_TTL = N seconds
And unique_items_requested_per_N_seconds = U
Then shield reduces origin to: U / N requests/s (one miss per item per TTL window)
Shield nodes needed = max(1, ceil(T / node_capacity))
For a single-node shield serving 10,000 edge requests/s with 300s TTL:
- If working set has 1,000 unique items: ~3.3 origin requests/s
- If working set has 100,000 unique items: ~333 origin requests/s
The shield’s cache size must comfortably hold the working set.
HTTP/3 Adoption Decision
Is your user base on mobile or high-latency networks?
├── Yes → Prioritize HTTP/3; measurable 15–30% improvement on lossy links
│
└── No → HTTP/2 is sufficient for datacenter-quality paths
Does your CDN vendor support HTTP/3?
├── No → Wait for vendor support before implementing
│
└── Yes → Enable HTTP/3 with Alt-Svc fallback to HTTP/2
(no downside — clients that don't support H3 use H2 automatically)
Can you deploy QUIC? (UDP 443 not blocked at your edge)
├── No → QUIC blocked by firewall; HTTP/3 won't work
│ Consider: if customers are on corporate networks, H3 gains are limited
│
└── Yes → Enable H3; measure adoption rate in 30 days
SLO Tier Selection
| Traffic level | Appropriate SLO | Reasoning |
|---|---|---|
| < 10k users | 99.0% (87.6h/year downtime) | Small teams can’t sustain higher |
| 10k–1M users | 99.9% (8.76h/year downtime) | Standard web service |
| 1M+ users | 99.95% (4.38h/year downtime) | Revenue-critical |
| Enterprise SLA | 99.99% (52.6 min/year downtime) | Very expensive to maintain |
The rule: SLO should be set below what you can actually achieve. If your system is at 99.97% reliability, set SLO at 99.9%. The gap between actual and SLO is your buffer for incidents.
Setting SLO at 99.99% when you achieve 99.97% means you’re always burning error budget, which prevents the team from shipping new features.
Purge Strategy Selection
How frequently do you need to invalidate cached content?
│
├── Rarely (< 1/day)
│ → URL-based purge is sufficient (specify exact URLs to purge)
│
├── Regularly (dozens per day)
│ → Use surrogate keys / cache tags
│ → Tag responses: "article-123", "author-456"
│ → Purge by tag on content update: purge("article-123")
│
└── Constantly (100s per day, CMS-driven)
→ Surrogate keys + webhook from CMS on publish event
→ CDN purge API called by publish webhook
→ Consider short TTLs (60s) + stale-while-revalidate as alternative
How many URLs does a content update affect?
├── 1 URL → URL purge
├── 10–100 URLs → Surrogate key covering those URLs
└── 1000+ URLs → Short TTL is better than purging 1000 URLs
Security Checklist for CDN Engineers
Before any CDN deployment to production:
- Origin not reachable from internet — IP allowlist or mTLS
- Signed URLs for private content — HMAC-SHA256 with constant-time comparison
- WAF enabled — OWASP CRS at minimum
- TLS 1.2 minimum — no TLS 1.0/1.1, no SSLv3
- HSTS set — with preload for critical domains
-
no-storeon sensitive paths — auth callbacks, session tokens, API keys - Purge API authenticated — not publicly accessible
- Metrics endpoint restricted —
/metricsshould not be public - Timing-safe signature comparison —
hmac.Equal, not== - Key rotation procedure documented — tested at least once
- Rate limiting on public endpoints — prevents DoS via cache miss amplification