Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Hitchhiker’s Guide to CDNs

“Don’t Panic.”

This guide is for engineers who want to understand Content Delivery Networks from first principles — not the marketing brochure version, but the real, production-grade, failure-mode-and-all version that principal engineers at Cloudflare, Fastly, and AWS think about every day.


What This Guide Is

You are reading the companion to a 21-lab Go codebase. Each lab is a fully runnable program (go run ./labs/lab-XX-name/) that demonstrates one specific CDN concept. The code is intentionally duplicated across labs — each lab is self-contained, not a library — so you can read it in isolation.

This guide gives every lab the depth it deserves: the why behind every design decision, the failure modes, the real-world vendor implementations, and the production nuances that only seasoned engineers with scar tissue know.


Who This Is For

  • Principal engineers evaluating or building CDN infrastructure
  • Staff engineers integrating CDNs into large-scale distributed systems
  • Platform/infrastructure engineers owning edge architecture
  • Engineers who want to stop treating the CDN as a black box

Prerequisites: solid Go knowledge, comfort with HTTP internals, basic distributed systems familiarity (you know what a TCP connection is).


The 10,000-foot Architecture

Before diving into individual labs, ground yourself in the full picture:

User (browser / mobile app)
  │
  │   DNS resolves cdn.example.com to nearest PoP IP (Anycast BGP or GeoDNS)
  ▼
┌──────────────────────────────────────────────┐
│  Edge PoP  (e.g. Cloudflare NYC)             │
│                                              │
│  1. TLS termination           (ECDH, TLS 1.3)│
│  2. HTTP/3 + QUIC or HTTP/2   (lab 18)       │
│  3. Signed-URL verification   (lab 16)       │
│  4. Edge compute (WASM)       (lab 17)       │
│  5. Cache lookup — L1 memory  (lab 08)       │
│  6. Cache lookup — L2 NVMe    (lab 08)       │
│  7. Request collapsing        (lab 06)       │
│  8. Compression               (lab 10)       │
│  9. Range request support     (lab 11)       │
└──────────────────┬───────────────────────────┘
                   │ cache MISS only
                   ▼
┌──────────────────────────────────────────────┐
│  Origin Shield  (e.g. Cloudflare Tiered Cache│
│  or Fastly Shield PoP)                       │
│                                              │
│  1. Consistent-hashed routing  (lab 12)      │
│  2. Singleflight collapse      (lab 13)      │
│  3. Gossip invalidation        (lab 14)      │
└──────────────────┬───────────────────────────┘
                   │ shield MISS only
                   ▼
┌──────────────────────────────────────────────┐
│  Origin  (S3 / App Server / Database)        │
│  (lab 01)                                    │
└──────────────────────────────────────────────┘

The CDN’s purpose is simple: serve as many requests as possible without touching the origin. Every lab in this series improves that ratio.


The Numbers That Matter

MetricTypical production target
Cache hit ratio (by request)85–95%
Cache hit ratio (by bytes)often higher (large objects)
Edge L1 miss-to-shield latency1–5 ms
Shield miss-to-origin latency10–100 ms
TLS handshake (session resume)< 1 ms
TTFB (Time To First Byte) to user< 50 ms at p99
Availability SLA99.99% (52 min downtime/year)

Cloudflare publicly reported ~60 million requests/second in peak traffic (2024). At that scale, a 1% cache hit ratio improvement saves ~600,000 origin requests per second.


How to Run the Labs

# Clone and install deps
git clone https://github.com/10xdev/cdn && cd cdn
go mod download

# Run any lab
make lab-01    # or: go run ./labs/lab-01-origin-server/

# Build all labs to verify compilation
go build ./...

Each lab:

  1. Starts an embedded mock origin on :9001
  2. Starts the edge/proxy on :8080 (sometimes :8081, :8082 too)
  3. Runs a self-contained demo with printed observations
  4. Blocks at the end so you can curl endpoints manually

Lab Map

#LabCore ConceptKey Go API
01Origin ServerLatency baselinenet/http
02Reverse ProxyForwarding, connection poolshttputil.ReverseProxy
03First CacheMiss/hit, TTLsync.Map
04HTTP Cache HeadersETag, 304, Cache-ControlRFC 7234
05Cache Key DesignVary, tracking paramsurl.Values
06Thundering HerdRequest collapsingsingleflight.Group
07Stale ContentRFC 5861 SWR/SIEcustom TTL windows
08Tiered CacheLRU + diskcontainer/list + xxhash
09Cache TagsSurrogate-Key purgesync.RWMutex
10Compressiongzip/brotli/zstd negotiationandybalholm/brotli
11Range Requests206 Partial Contenthttp.ServeContent
12Consistent HashingStable node routingburaksezer/consistent
13Origin ShieldTiered PoPs + singleflightgolang.org/x/sync
14Gossip ClusterDistributed invalidationhashicorp/memberlist
15Geo RoutingHaversine, PoP failovercustom
16Signed URLsHMAC-SHA256 token authcrypto/hmac
17Edge ComputeWASM sandboxing at edgetetratelabs/wazero
18HTTP/3 + QUICQUIC transportquic-go/quic-go
19HLS StreamingAdaptive bitrate cachecustom
20ObservabilityPrometheus, SLOs, logsprometheus/client_golang
21Full SystemAll layers togetherAll of the above

Reading This Guide

Each chapter follows the same structure:

  1. The Problem — why this feature exists, what breaks without it
  2. The Protocol / Algorithm — the formal specification or academic basis
  3. The Implementation — walkthrough of the lab code with deep commentary
  4. Production Details — how Cloudflare, Fastly, AWS CloudFront do it
  5. Failure Modes — what goes wrong and how to detect it
  6. What to Measure — metrics, alerts, and SLO indicators
  7. Try It — curl commands and things to observe

Let’s start at the beginning.

Lab 01 · The Origin Server

Run it: make lab-01
Source: labs/lab-01-origin-server/main.go


The Problem

Before you can understand what a CDN does, you need to understand what it protects. The origin server is the authoritative source of content — the thing that actually knows what the response should be. It might be:

  • A Go/Python/Rails application querying a PostgreSQL database
  • An S3 bucket serving static files
  • A legacy monolith that someone is afraid to touch
  • A media encoder writing MPEG-TS segments in real time

The origin’s fundamental problem is latency × concurrency. Every request pays the full cost of whatever work the origin must do: database queries, template rendering, business logic, external API calls.

The math

At 80 ms average latency per request, handling 1,000 requests/second requires 80 simultaneously active goroutines just to keep up. That’s 80 database connections, 80 in-flight external calls, 80 units of CPU work happening at once. At 500 rps it becomes 40. These numbers sound manageable until traffic spikes 10×.

Now imagine the home page of a news site during a breaking story. 50,000 concurrent users hit Refresh. At 80 ms latency, you need 4,000 simultaneous origin threads. No origin handles that gracefully — but a CDN can serve all 50,000 from a single cached response stored at the edge.


What This Lab Shows

Lab 01 is intentionally simple: just the origin, no proxy, no cache.

User → Origin (:9001)

Every request pays the full --latency cost (default: 80 ms). You can see this directly in the output — 12 sequential requests each taking ~80 ms, for ~960 ms total.

The key observable: X-Origin-Hit increments for every single request. When you add a cache in Lab 03, you’ll see this counter stop growing after the first few requests.


The Origin Server Contract

A well-behaved origin sets these headers:

HeaderPurpose
Cache-Control: public, max-age=NTells CDN: cache for N seconds
Cache-Control: privateTells CDN: don’t cache (user-specific)
Cache-Control: no-storeNever cache anywhere
ETag: "abc123"Content fingerprint for conditional requests
Vary: Accept-EncodingDifferent response per encoding
X-Served-By: originDebug header: which tier served this

The lab origin sets Cache-Control: public, max-age=30 — correct for publicly cacheable content. Labs 04–05 build on this contract in depth.


Production Detail: Origin Capacity Planning

CDN engineers think about origin capacity as the residual load after the CDN absorbs its share. If your CDN achieves a 90% hit ratio and you expect 10,000 req/s peak traffic:

Origin load = 10,000 × (1 - 0.90) = 1,000 req/s

Capacity-plan your origin for this number, not the full 10,000. But factor in cold-start scenarios: after a deploy, a CDN cache flush, or a network partition that invalidates a large fraction of cache simultaneously. Your origin must survive a sudden 10× spike above its steady-state CDN-assisted load.

This is why Cloudflare, Fastly, and AWS CloudFront all have “origin overload protection” features (origin shield, request collapsing, retries with circuit breakers) — labs 06 and 13.


Failure Modes

FailureSymptomFix
Origin latency spikeAll edge responses slowStale-while-revalidate (lab 07)
Origin error rate spike502/503 from CDNStale-if-error (lab 07)
Origin cold startHigh latency on deployWarm cache before cutover
DDoS bypassAttacker hits origin IP directlyIP allowlist: CDN IPs only

Security note: Always allowlist your origin to accept connections only from CDN IP ranges. If attackers discover your origin IP, they can bypass the CDN entirely and DDoS it directly. All major CDNs publish their IP ranges (Cloudflare: https://cloudflare.com/ips).


What to Measure

# Origin request rate (should stay low and stable)
rate(origin_requests_total[1m])

# Origin p99 latency (your SLA baseline)
histogram_quantile(0.99, rate(origin_response_duration_seconds_bucket[5m]))

# Origin error rate (alert at >0.1%)
rate(origin_errors_total[5m]) / rate(origin_requests_total[5m])

Try It

make lab-01
# In another terminal:
curl http://localhost:9001/article/1 -v

# With higher latency:
go run ./labs/lab-01-origin-server/ --latency 200ms --requests 5

# With errors:
go run ./labs/lab-01-origin-server/ --error-rate 0.3

Watch X-Origin-Hit increment with every single request. When you reach Lab 03 and add a cache, you’ll see it stop.

Lab 02 · The Naive Reverse Proxy

Run it: make lab-02
Source: labs/lab-02-naive-proxy/main.go


The Problem

Adding a proxy between users and the origin is the first step in CDN architecture — before caching, before edge compute, before any optimization.

But why does a proxy even help if it doesn’t cache? Several reasons:

  1. TLS offloading: The CDN terminates TLS on fast, dedicated hardware so the origin doesn’t pay the cryptographic overhead for every user.
  2. Connection pooling: The proxy maintains persistent HTTP/1.1 keep-alive or HTTP/2 multiplexed connections to the origin, amortizing TCP handshake cost across many requests.
  3. Protocol upgrade: Users connect via HTTP/2 or HTTP/3; the CDN speaks HTTP/1.1 to a legacy origin.
  4. DDoS surface reduction: The origin is invisible to the internet.
  5. Header normalization: Strip tracking headers, add forwarding metadata.
  6. Rate limiting, WAF: Applied at the proxy before the origin even sees the request.

How httputil.ReverseProxy Works

Go’s standard library httputil.ReverseProxy is the canonical building block for a reverse proxy:

proxy := &httputil.ReverseProxy{
    Director: func(req *http.Request) {
        req.URL.Scheme = "http"
        req.URL.Host   = "origin:9001"
        // Strip hop-by-hop headers
        req.Header.Del("Connection")
        req.Header.Del("Upgrade")
        // Append X-Forwarded-For
        if clientIP, _, err := net.SplitHostPort(req.RemoteAddr); err == nil {
            req.Header.Add("X-Forwarded-For", clientIP)
        }
    },
    ModifyResponse: func(resp *http.Response) error {
        resp.Header.Set("X-Served-By", "proxy")
        return nil
    },
    ErrorHandler: func(w http.ResponseWriter, r *http.Request, err error) {
        http.Error(w, "Bad Gateway", http.StatusBadGateway)
    },
}

Director mutates the request before forwarding. It runs in the same goroutine as the handler, so it must be fast and side-effect-free.

ModifyResponse mutates the response before sending back to the client. Use this to add headers like X-Cache, normalize Content-Type, or strip internal headers.

Transport is the HTTP client used to reach the origin. Default is http.DefaultTransport, which maintains a connection pool. For production, tune:

Transport: &http.Transport{
    MaxIdleConnsPerHost: 200,          // connection pool per origin
    MaxConnsPerHost:     500,          // max concurrent connections
    IdleConnTimeout:     90 * time.Second,
    ResponseHeaderTimeout: 30 * time.Second,
    DisableKeepAlives:   false,        // ALWAYS keep-alives on
    ForceAttemptHTTP2:   true,         // H2 to origin if supported
    TLSHandshakeTimeout: 5 * time.Second,
    // DialContext: custom dialer for DNS override, binding, etc.
}

Hop-by-Hop Headers

HTTP defines two classes of headers:

End-to-end headers: forwarded unchanged through all proxies to the final recipient. Examples: Content-Type, ETag, Cache-Control, Authorization.

Hop-by-hop headers: meaningful only for the immediate connection. Must be stripped before forwarding. Defined in RFC 7230 §6.1:

Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization,
TE, Trailers, Transfer-Encoding, Upgrade

Additionally, any header listed in the Connection header value is hop-by-hop for that hop:

Connection: X-Custom-Header, Keep-Alive
→ Strip X-Custom-Header too

Failing to strip hop-by-hop headers causes subtle bugs: the origin may try to negotiate an Upgrade on a connection it doesn’t have, or the downstream client may receive a Transfer-Encoding: chunked header that doesn’t match the actual response framing.


X-Forwarded-For and the IP Chain

When a proxy adds X-Forwarded-For: 1.2.3.4, and then another proxy adds another layer, you get:

X-Forwarded-For: 1.2.3.4, 10.0.0.1

The leftmost IP is the client (set by the first trusted proxy). The rightmost is the last proxy before the origin. Origin applications should parse the first untrusted IP from the left — but only if they know how many trusted proxies are in front of them.

In production, CDNs like Cloudflare expose the real client IP via:

CF-Connecting-IP: 1.2.3.4        (always the real client IP)
True-Client-IP: 1.2.3.4          (Cloudflare Enterprise)

This avoids the ambiguity of X-Forwarded-For in multi-proxy setups.

Security trap: Never trust X-Forwarded-For for access control if any user can send it directly. Validate the header only when you can confirm the request came through a trusted proxy.


The Proxy Overhead Measurement

The lab measures raw proxy overhead by timing the same request through:

  1. Direct origin call
  2. Through the proxy

Typical result: < 0.5 ms proxy overhead. This is negligible vs. actual origin latency (80+ ms). The overhead comes from:

  • Goroutine scheduling (< 1 µs)
  • Memory copy of request/response buffers
  • Two additional TCP reads/writes

This is why the caching layer in Lab 03 — which adds zero network hops on a hit — provides dramatic speedups: it collapses 80 ms to < 0.1 ms.


Production Detail: Connection Pools

http.DefaultTransport uses a connection pool per host:port. When Go’s HTTP client gets a response, it returns the underlying TCP connection to the pool for reuse on the next request to the same origin.

At scale, pool sizing matters:

ScenarioMaxIdleConnsPerHost
Single origin, low traffic10 (default)
Single origin, high traffic100–500
Origin cluster behind load balancer200+ (connections spread across backends)
Origin with connection limit (MySQL)Match origin’s max_connections

Setting this too low forces new TCP handshakes under load, adding ~5 ms of SYN/ACK round-trip on every miss. At 10,000 cache misses/second, that’s 50 seconds/second of wasted TCP overhead.


Failure Modes

FailureSymptomFix
Origin timeout504 Gateway TimeoutSet ResponseHeaderTimeout; circuit break
Origin 5xx502 Bad GatewayErrorHandler; retry on idempotent requests
Connection pool exhaustionLatency spikeIncrease MaxIdleConnsPerHost; queue requests
Memory leakUnbounded growthAlways read resp.Body to EOF even if discarding
Hop-by-hop not strippedProtocol negotiation failureExplicit header removal in Director

Try It

make lab-02

# Direct origin (no proxy)
curl http://localhost:9001/article/1

# Through proxy
curl http://localhost:8080/article/1 -v

# Compare response headers — should see X-Served-By: proxy
# and X-Forwarded-For header in origin logs

Lab 03 · The First Cache

Run it: make lab-03
Source: labs/lab-03-first-cache/main.go


The Problem

The reverse proxy in Lab 02 blindly forwards every request to the origin. A cache short-circuits that path: if we’ve seen this URL recently and have a stored response, serve it directly from memory without touching the origin.

The fundamental trade-off: freshness vs. cost. A cached response might be stale, but serving it is:

  • Orders of magnitude faster (memory read vs. network round trip)
  • Origin-free (no database query, no CPU work)
  • Deterministic (no dependency on origin availability)

The Cache Lifecycle: MISS → HIT → EXPIRED

Request arrives
    │
    ▼
┌─────────────────────────────────────┐
│  Lookup key = normalize(URL)         │
└─────────────────────────────────────┘
    │
    ├─► Entry not found → MISS
    │       │
    │       ▼
    │   Fetch from origin
    │   Store in cache with deadline = now + TTL
    │   Return response to client
    │
    ├─► Entry found, not expired → HIT
    │       │
    │       ▼
    │   Return cached response immediately
    │
    └─► Entry found, expired → EXPIRED (= MISS)
            │
            ▼
        Revalidate or re-fetch
        Replace cache entry

The X-Cache response header tells the client (and debugging engineers) which branch was taken:

X-Cache: MISS      # first request for this URL
X-Cache: HIT       # served from cache

Implementation: sync.Map + TTL

type cacheEntry struct {
    response []byte
    headers  http.Header
    status   int
    expiry   time.Time
}

var cache sync.Map   // map[string]*cacheEntry

func get(key string) (*cacheEntry, bool) {
    v, ok := cache.Load(key)
    if !ok { return nil, false }
    entry := v.(*cacheEntry)
    if time.Now().After(entry.expiry) {
        cache.Delete(key)    // lazy expiry
        return nil, false
    }
    return entry, true
}

Why sync.Map? The standard map plus sync.RWMutex would work, but sync.Map is optimized for a specific workload: many reads, few writes, stable key set. CDN caches have a hot set of URLs that are read millions of times per second and written (populated) far less often. sync.Map achieves this via an atomic “read map” that requires no locking on reads for existing keys.

However, sync.Map has a known weakness: its internal dirty map can accumulate entries and requires a periodic promotion step. For very write-heavy caches (cold start, high churn), a sharded map + sync.RWMutex pattern can be more efficient.


TTL: Where Does It Come From?

In Lab 03 the TTL is hardcoded. Lab 04 shows how to parse it properly from Cache-Control headers:

Cache-Control: public, max-age=300
→ TTL = 300 seconds

Cache-Control: no-store
→ Do not cache at all

Cache-Control: private
→ Do not store in shared (CDN) cache

Cache-Control: no-cache
→ Store but always revalidate before serving

Ignoring Cache-Control is the #1 cause of CDN misconfiguration. If you cache a private response, you may serve one user’s data to another. If you cache no-store, you violate the application’s contract.


Background Sweep: Avoiding Memory Leaks

A cache without eviction grows unboundedly. Lab 03 runs a background goroutine that sweeps expired entries:

go func() {
    ticker := time.NewTicker(30 * time.Second)
    for range ticker.C {
        var expired []string
        cache.Range(func(k, v any) bool {
            if time.Now().After(v.(*cacheEntry).expiry) {
                expired = append(expired, k.(string))
            }
            return true
        })
        for _, k := range expired {
            cache.Delete(k)
        }
    }
}()

Note the two-phase delete: first collect expired keys (during which we hold the range lock), then delete. You cannot modify sync.Map during a Range iteration.

Production caches use more sophisticated eviction:

PolicyDescriptionUse case
TTL expiryRemove at expiryAll caches
LRUEvict least-recently-usedBounded memory (Lab 08)
LFUEvict least-frequently-usedPopularity-skewed workloads
ARCAdaptive Replacement CacheSelf-tuning between LRU and LFU
S3-FIFOSimple, Scalable, Segmented FIFOModern alternative to LRU (lower overhead)

The Deliberate Limitations of Lab 03

The lab explicitly documents what it doesn’t do yet:

  1. No Cache-Control parsing — TTL is hardcoded. Fixed in Lab 04.
  2. No singleflight — concurrent misses all hammer origin. Fixed in Lab 06.
  3. Unbounded memory — LRU eviction arrives in Lab 08.
  4. No content negotiation — same key for Accept-Encoding: gzip and Accept-Encoding: br. Fixed in Lab 05 via Vary.
  5. No conditional requests — always fetches full response, no 304. Fixed in Lab 04.

This incremental approach is pedagogically important: each lab adds exactly one concept so the interaction is clear.


Production Detail: Cache Serialization Format

Real CDN disk caches store responses in compact binary formats. Varnish uses its own VCL-controlled storage. Nginx uses a format with:

[8 bytes: key hash]
[8 bytes: expiry timestamp]
[4 bytes: headers length]
[4 bytes: body length]
[headers (HTTP/1.1 text)]
[body bytes]

Lab 08 uses file-system storage with xxhash-named files, which is functionally equivalent but less efficient (filesystem metadata overhead).

For in-memory caches, Google’s Groupcache and Fastly’s own cache daemon use Protocol Buffers for serialization, enabling:

  • Zero-copy responses via io.WriterTo
  • Shared-memory between processes
  • Binary compatibility across versions

What to Measure

# Hit ratio (requests)
sum(rate(cache_hits_total[5m])) /
sum(rate(cache_requests_total[5m]))

# Miss rate (triggers origin fetches)
rate(cache_misses_total[5m])

# Cache entries currently stored
cache_entries_current

# Evictions (if bounded cache)
rate(cache_evictions_total[5m])

Try It

make lab-03

# First request — should be MISS
curl http://localhost:8080/article/1 -v | grep X-Cache

# Second request — should be HIT (< 1ms)
curl http://localhost:8080/article/1 -v | grep X-Cache

# X-Origin-Hit should only increment on first request
curl http://localhost:8080/article/1 -H "X-Debug: origin-count"

Lab 04 · HTTP Cache Headers

Run it: make lab-04
Source: labs/lab-04-http-cache-headers/main.go


The Problem

Lab 03 cached everything for a hardcoded TTL. That’s wrong in production: some content must never be cached (authentication tokens), some content is user-specific (shopping carts), and some content changes frequently (live feeds). HTTP defines a rich vocabulary for expressing caching intent — and CDNs are contractually obligated to honor it.

This lab implements the full RFC 7234 caching model: parsing Cache-Control, handling conditional requests, and generating ETag-based 304 responses.


RFC 7234: The Caching Specification

HTTP caching is defined in RFC 7234 (HTTP/1.1 Caching, 2014), now superseded by RFC 9111 (HTTP Caching, 2022). The spec is 44 pages and covers:

  • Freshness: when a cached response can be served without revalidation
  • Validation: checking with the origin if the cached copy is still good
  • Invalidation: removing cache entries when content changes

The Freshness Calculation

response_is_fresh = (freshness_lifetime > current_age)

freshness_lifetime = max-age directive (if present)
                   = s-maxage directive (CDN-specific override)
                   = Expires header - Date header (fallback)
                   = heuristic: 10% of (Date - Last-Modified) (last resort)

current_age       = age_value (from Age header, added by previous CDN)
                  + (now - response_time)

The Age header is critical in multi-tier setups: when a shield proxy serves a cached response to an edge proxy, it adds Age: 120 meaning “this response is 120 seconds old”. The edge node calculates remaining freshness as max-age - Age = 300 - 120 = 180 seconds left.


Cache-Control Directives

Request-side (Cache-Control from the client)

DirectiveMeaning
no-cacheDon’t use cached response; must revalidate
no-storeDon’t cache this request or its response
max-age=0Treat cached response as stale (same as no-cache in practice)
max-stale=NAccept stale response up to N seconds past expiry
min-fresh=NOnly accept response fresh for at least N more seconds
only-if-cachedFail with 504 if no cached copy (offline mode)

Response-side (Cache-Control from the origin)

DirectiveMeaning
publicShared caches (CDN) may store this
privateOnly browser cache; CDN must not store
no-storeNo cache anywhere
no-cacheStore but always revalidate before use
max-age=NFresh for N seconds
s-maxage=NOverrides max-age for CDN/shared caches only
stale-while-revalidate=NServe stale for N seconds while revalidating (lab 07)
stale-if-error=NServe stale for N seconds if origin errors (lab 07)
immutableContent won’t change during freshness window (no revalidation)
must-revalidateNever serve stale, even if origin is down
proxy-revalidateSame as must-revalidate but CDN-specific

The CDN Override: s-maxage

s-maxage is the CDN operator’s tool. Use it to set a long CDN TTL while browsers cache for a shorter time:

Cache-Control: public, max-age=60, s-maxage=86400

Browser caches for 60 seconds. CDN caches for 24 hours and refreshes on demand via cache invalidation APIs. This is the standard pattern for versioned static assets where you want browser protection (privacy mode resets) but long CDN caching.


ETags: Content Fingerprinting

An ETag (Entity Tag) is an opaque validator for a specific version of a resource. It can be:

  • Strong: "d41d8cd98f00b204e9800998ecf8427e" — byte-for-byte equality
  • Weak: W/"20230101" — semantically equivalent (same meaning, maybe different encoding)

CDN caches store the ETag alongside the response. On the next request (at or after expiry), the cache can send a conditional request:

GET /article/1 HTTP/1.1
If-None-Match: "d41d8cd98f00b204e9800998ecf8427e"

If the content hasn’t changed, the origin returns:

HTTP/1.1 304 Not Modified
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Cache-Control: public, max-age=300

The cache then extends the TTL of the existing response without re-transferring the body. This is a huge bandwidth saving — imagine refreshing a 10 MB PDF that hasn’t changed; you pay ~200 bytes (304 response headers) instead of 10 MB.

ETag Generation Strategies

// MD5 of content (lab 04)
etag := fmt.Sprintf(`"%x"`, md5.Sum(body))

// xxhash (faster, non-cryptographic)
etag := fmt.Sprintf(`"%016x"`, xxhash.Sum64(body))

// Semantic versioning
etag := fmt.Sprintf(`"v%s-%s"`, version, contentHash)

// Timestamp-based (weak)
etag := fmt.Sprintf(`W/"%d"`, lastModified.Unix())

Prefer xxhash over MD5 for performance: xxhash is 10–20× faster and provides sufficient collision resistance for ETags (not a security primitive, just a freshness validator).


Last-Modified / If-Modified-Since

Before ETags existed (HTTP/1.0 era), conditional requests used timestamps:

GET /article/1 HTTP/1.1
If-Modified-Since: Wed, 01 Jan 2025 00:00:00 GMT

Origin returns 304 Not Modified if content hasn’t changed since that timestamp. This is weaker than ETags because:

  1. 1-second granularity: If content changes and reverts within 1 second, the cache won’t detect the change.
  2. Time-zone ambiguity: Distributed systems with clock skew may return stale content.
  3. Database writes don’t always update mtime: Application-level content may be “modified” logically without a filesystem timestamp change.

Use ETags when possible. Use Last-Modified as a fallback for static files where the mtime is reliable.


The 304 Path in Practice

CDN cache (entry expired, has ETag stored)
    │
    │  GET /article/1
    │  If-None-Match: "abc123"
    ▼
Origin
    │  → Content unchanged
    │  304 Not Modified
    │  ETag: "abc123"
    ▼
CDN cache
    │  → Reset TTL, keep cached body
    │  → Serve existing body + new TTL to client
    ▼
Client receives 200 OK with cached body

The 304 shortcut saves:

  • Body transfer bandwidth (origin → CDN)
  • Origin CPU/DB work (content generation)
  • CDN memory allocation (no new copy of body)

Production Detail: s-maxage=0

A common pattern for HTML pages that reference versioned assets:

Cache-Control: public, max-age=0, s-maxage=0, must-revalidate

“Must revalidate every time, but please cache the ETag so you can use 304 responses.” This ensures pages are always fresh while still using conditional requests to avoid full re-transfers.

Another pattern — Cloudflare’s “Edge Cache TTL” override: even if the origin sends max-age=0, Cloudflare’s dashboard can set a longer CDN TTL, overriding the origin’s preference. This is useful for origins you don’t control. Fastly calls this “Surrogate-Control”:

Surrogate-Control: max-age=86400

Fastly honors Surrogate-Control for CDN TTL and strips it before sending to browsers. The browser then sees only Cache-Control with shorter TTL.


What to Measure

# 304 rate (healthy revalidation; saves bandwidth)
rate(http_responses_total{status="304"}[5m])

# Ratio of 304 vs full responses (should be 20-50% on well-cached APIs)
rate(http_responses_total{status="304"}[5m]) /
  rate(http_responses_total{status="200"}[5m])

# Uncacheable responses (watch for unexpected growth)
rate(cache_store_skipped_total{reason="no-store"}[5m])
rate(cache_store_skipped_total{reason="private"}[5m])

Try It

make lab-04

# Observe Cache-Control header
curl http://localhost:9001/article/1 -v 2>&1 | grep -i cache

# Conditional request (ETag)
ETAG=$(curl http://localhost:9001/article/1 -si | grep -i etag | awk '{print $2}')
curl http://localhost:9001/article/1 -H "If-None-Match: $ETAG" -v
# → Should return 304

# Test no-store (not cached)
curl http://localhost:9001/private/1 -v

Lab 05 · Cache Key Design

Run it: make lab-05
Source: labs/lab-05-cache-key-design/main.go


The Problem

A cache key is the identifier under which a response is stored and looked up. If the key is wrong, everything downstream breaks:

  • Too narrow (same key for different content): serve wrong content to wrong user, or collapse variations into one response.
  • Too wide (include irrelevant query params): artificially low hit ratio, wasted storage, repeated origin fetches for identical content.

In production, cache key design is one of the highest-leverage activities a CDN engineer performs. A 10-minute key design review can improve hit ratio from 70% to 95%.


The Basic Key: URL Normalization

The naive key is req.URL.String(). This breaks immediately when:

/article/1?foo=bar         # different key from
/article/1?bar=foo         # same semantically

Query parameters don’t have a defined order. Two requests for the same resource with parameters in different order are identical, but a naive cache sees them as different URLs.

Normalization steps (applied by Lab 05):

func normalizeKey(u *url.URL) string {
    // 1. Lowercase the path
    u.Path = strings.ToLower(u.Path)

    // 2. Remove tracking parameters
    q := u.Query()
    for _, p := range trackingParams {
        q.Del(p)
    }

    // 3. Sort remaining parameters deterministically
    for k, v := range q {
        sort.Strings(v)
    }
    keys := make([]string, 0, len(q))
    for k := range q { keys = append(keys, k) }
    sort.Strings(keys)
    // Rebuild in sorted order
    ...

    // 4. Strip fragment (#anchor — never sent to server but can appear in
    //    reconstructed URLs)
    u.Fragment = ""
}

Tracking Parameter Pollution

Marketing teams append tracking parameters to every URL. These are meaningless to the origin but fragment your cache:

ParameterSource
utm_source, utm_medium, utm_campaign, utm_term, utm_contentGoogle Analytics
fbclidFacebook click ID
gclid, gad_sourceGoogle Ads
mc_eidMailchimp email ID
_gaGoogle Analytics cross-domain
msclkidMicrosoft Ads
twclidTwitter click ID
ref, referralGeneric referral parameters

Without stripping these, each user who clicks a Facebook ad link (?fbclid=XYZ) generates a unique cache key even though they want the same article. A single popular article shared on Facebook could generate millions of unique cache keys — all for identical content.

Cloudflare, Fastly, and Akamai all maintain curated lists of these parameters and strip them from cache keys by default.


The Vary Header

Vary tells the cache: “this response may differ based on these request headers”. Example:

Vary: Accept-Encoding

This means there are potentially multiple stored versions of the same URL: one for clients that accept gzip, one for brotli, one for uncompressed.

Key expansion for Vary

cache_key = normalize(url) + "|" + canonicalize(vary_headers)

GET /article/1
Accept-Encoding: br          → key: "/article/1|br"
Accept-Encoding: gzip        → key: "/article/1|gzip"
Accept-Encoding: (absent)    → key: "/article/1|"

Common Vary values and their implications:

Vary valueCache behaviorRisk
Accept-EncodingStore one copy per encodingFine; well-enumerated set
Accept-LanguageStore per languageCan explode: 50+ languages
User-AgentStore per UA stringCatastrophic; millions of unique strings
CookieStore per cookieNever do this on shared cache
AuthorizationPer auth tokenResponse must also be private

Vary: * means the response is unique per request and must never be cached in a shared cache. CDNs treat it as uncacheable.

Vary: User-Agent is the most destructive mistake in CDN history. Nginx docs used to recommend it for mobile detection, causing cache hit ratios to collapse to near 0% as every browser sent a unique UA string. The fix: perform device detection at the origin and emit Vary: User-Agent only when necessary, or better, use a normalized X-Device-Type: mobile|desktop header in a custom Vary.


Cache Keying in Production CDNs

Cloudflare Cache Rules

Cloudflare provides a Cache Rules UI and API to configure:

Cache Rule: "Strip marketing params"
  When: hostname matches "example.com"
  Cache Key: exclude query strings "utm_*", "fbclid", "gclid"

Fastly VCL

Fastly’s VCL (Varnish Configuration Language) gives full control:

sub vcl_hash {
    # Normalize host
    set req.hash += req.http.host;

    # Normalize path (lowercase)
    set req.hash += regsuball(req.url.path, "[A-Z]", {"\L&"});

    # Strip tracking params from hash
    declare local var.qs STRING;
    set var.qs = regsuball(req.url.qs,
        "(?:^|&)(?:utm_[^=]*|fbclid|gclid)[^&]*", "");
    set req.hash += regsub(var.qs, "^&", "");

    return(hash);
}

Akamai

Akamai uses “cache ID” rules configured via Property Manager or the APIs. Key parameters can be included/excluded per URL pattern.


Naive vs. Smart Key: The Hit Ratio Impact

The lab demonstrates this directly. Given traffic:

/article/1?utm_source=google&utm_medium=cpc
/article/1?utm_source=facebook&utm_medium=social
/article/1?utm_source=twitter
/article/1        ← direct visit
Key strategyCache hitsHit ratio
Naive (full URL)0/40%
Strip utm_*3/475%
Strip utm_* + normalize4/4100%

In production with millions of requests, this difference is the difference between a $200/month origin bill and a $20,000/month one.


Production Checklist: Cache Key Design

  • Strip all known tracking parameters
  • Sort query string parameters alphabetically
  • Lowercase path components
  • Handle Vary explicitly per resource type
  • Never vary on User-Agent or Cookie in shared cache
  • Use Surrogate-Control to set CDN-specific TTLs
  • Test key normalization with realistic traffic samples

Try It

make lab-05

# Same content, different tracking params — should be HIT on second request
curl "http://localhost:8080/article/1?utm_source=google"
curl "http://localhost:8080/article/1?utm_source=facebook"
# Second request should be a cache HIT

# Different Accept-Encoding → different Vary bucket
curl "http://localhost:8080/article/1" -H "Accept-Encoding: gzip"
curl "http://localhost:8080/article/1" -H "Accept-Encoding: br"
# Both should be stored separately

Lab 06 · The Thundering Herd

Run it: make lab-06
Source: labs/lab-06-thundering-herd/main.go


The Problem

You have a cache. A popular URL’s TTL expires. In the next millisecond, 800 concurrent requests arrive for that URL. Your cache code:

func handler(w http.ResponseWriter, r *http.Request) {
    if entry, ok := cache.Get(key); ok {
        write(w, entry)   // HIT
        return
    }
    // MISS — 800 goroutines all reach here simultaneously
    resp := fetch(key)    // 800 fetches to origin — boom
    cache.Set(key, resp)
}

This is the thundering herd (also: cache stampede, dog pile effect). A single cache expiry event becomes an instant DDoS against your origin.

At YouTube scale (2023), a single CDN node may have 10,000 concurrent viewers for a popular video. When that cache entry expires, 10,000 simultaneous origin requests arrive in sub-millisecond windows. Most origins cannot handle this.


Solution: singleflight.Group

Go’s golang.org/x/sync/singleflight package provides the exact primitive needed: request collapsing (or request deduplication).

import "golang.org/x/sync/singleflight"

var group singleflight.Group

func fetch(key string) ([]byte, error) {
    result, err, shared := group.Do(key, func() (interface{}, error) {
        // This function runs ONCE, no matter how many concurrent
        // callers invoke group.Do with the same key.
        return fetchFromOrigin(key)
    })
    // 'shared' is true if this result was returned to multiple callers
    return result.([]byte), err
}

The semantics:

  • First caller with a given key triggers the actual fetch
  • All subsequent callers with the same key block and wait for the first caller’s result
  • When the fetch completes, all waiting callers receive the same result
  • No extra origin requests are made
800 concurrent misses for /video/popular
    │
    ├── Goroutine 1: starts group.Do("video/popular") → actual fetch
    ├── Goroutine 2: group.Do("video/popular") → blocks, waiting
    ├── Goroutine 3: group.Do("video/popular") → blocks, waiting
    ├── ...
    └── Goroutine 800: group.Do("video/popular") → blocks, waiting

[~80ms later: origin responds]

    └── All 800 goroutines receive the same result simultaneously
        → 1 origin request, not 800

The shared Return Value

group.Do returns three values: (v interface{}, err error, shared bool).

shared is true if the result was shared with other callers. This is useful for metrics — you can measure how many requests were collapsed:

result, err, shared := group.Do(key, fetch)
if shared {
    collapsedRequestsTotal.Inc()
}

Monitoring collapsed requests reveals thundering herd intensity. If you see thousands of requests being collapsed per second, your TTLs may be too low or your TTLs are expiring synchronously (lab 07 addresses this with staggered expiry).


Cascade Failure Without Singleflight

The lab demonstrates what happens without singleflight:

800 requests arrive at t=0
→ 800 goroutines all observe cache miss
→ 800 goroutines all call fetchFromOrigin()
→ Origin receives 800 simultaneous connections
→ Origin CPU spikes to 100%
→ Origin response time increases from 80ms to 5000ms
→ Each 800-request wave takes 5s instead of 80ms
→ Next cache entry expires while previous wave is still in-flight
→ Another 800 requests stampede
→ Origin never recovers (cascade failure)

This is a well-documented pattern in distributed systems and the root cause of many high-profile outages. Facebook described this exact failure mode in their 2010 memcache paper. Reddit’s 2012 outage was triggered by a thundering herd on a database backing a cached list.


Beyond singleflight: Production Patterns

1. Probabilistic Early Refresh (XFetch)

Instead of waiting for expiry, refresh slightly before expiry using probabilistic jitter. Each request has a small probability of triggering a background refresh:

// XFetch algorithm (Vattani et al., 2015)
func shouldRefresh(expiry time.Time, lastFetchDuration time.Duration, beta float64) bool {
    remaining := time.Until(expiry).Seconds()
    delta := lastFetchDuration.Seconds()
    return -delta * beta * math.Log(rand.Float64()) > remaining
}

When remaining approaches 0, math.Log(rand.Float64()) (which is negative) is multiplied by delta * beta (positive), and the result becomes likely to exceed remaining. Higher beta = more aggressive prefetching. This guarantees the cache is almost always warm.

2. Mutex per key (fine-grained locking)

type keyedMutex struct {
    mu    sync.Mutex
    locks map[string]*sync.Mutex
}

func (km *keyedMutex) Lock(key string) {
    km.mu.Lock()
    l, ok := km.locks[key]
    if !ok {
        l = &sync.Mutex{}
        km.locks[key] = l
    }
    km.mu.Unlock()
    l.Lock()
}

Only one goroutine per key can fetch from origin. Others wait. Simpler to reason about than singleflight but no result sharing (each waiter re-fetches independently when the mutex is released).

3. Background refresh with locked TTL

Keep serving the stale entry while refreshing in background, preventing any thundering herd entirely. See Lab 07 for the full stale-while-revalidate implementation.


singleflight vs. Caching

Note that singleflight is not a cache. It deduplicates in-flight requests, but once the first request completes, new requests will start a new group.Do call (the key is removed from the group after completion). singleflight + cache is the correct combination:

Request arrives
    │
    ▼
Cache hit? → serve immediately
    │ miss
    ▼
group.Do(key, fetch) → one origin request, all callers get result
    │
    ▼
cache.Set(key, result, ttl)
    │
    ▼
All waiters serve result

What to Measure

# Collapsed requests (singleflight saves)
rate(singleflight_shared_total[1m])

# Origin request rate — should be orders of magnitude lower than edge rate
rate(origin_requests_total[1m])

# Collapse ratio: how many requests per origin fetch
rate(edge_requests_total[1m]) / rate(origin_requests_total[1m])
# Target: 50-500x depending on content popularity

Try It

make lab-06

# The lab fires 800 concurrent requests to a cold cache entry
# Watch the output — it will show:
#   "800 concurrent requests → 1 origin request"
# vs the naive version which fires all 800

Lab 07 · Stale Content & RFC 5861

Run it: make lab-07
Source: labs/lab-07-stale-content/main.go


The Problem

When a cache entry expires, the CDN must go to the origin to get a fresh copy. During that fetch (typically 80–500 ms), what should the CDN do with incoming requests for that resource?

Two bad options:

  1. Block: Hold all requests until the origin responds. Adds 80–500 ms latency to the first request after every expiry. Under load, hundreds of requests pile up.
  2. Return 503: Refuse to serve. This is almost never acceptable.

The right option: serve the stale response while revalidating in the background. Users get a response immediately; the cache updates asynchronously.

This is the core idea of RFC 5861: HTTP Cache-Control Extensions for Stale Content.


RFC 5861: stale-while-revalidate

Cache-Control: max-age=60, stale-while-revalidate=30

Semantics:

  • Fresh (first 60s): serve from cache without consulting origin
  • Stale-while-revalidate (seconds 61–90): serve stale immediately, and trigger a background revalidation
  • Hard stale (after 90s): must wait for fresh copy
Time (seconds)
0         60        90        ∞
|─ fresh ──|─ SWR  ──|─ stale ─|

t=0:  First fetch. Cache stores response.
t=30: Request → cache HIT (fresh)
t=65: Request → cache MISS (expired, within SWR window)
       → Serve stale immediately (0ms latency spike!)
       → Background goroutine fires origin request
       → When origin responds, update cache entry
t=95: Request → beyond SWR window → must fetch fresh before responding

The user at t=65 receives a response that is 5 seconds old — invisible difference. The user at t=95 waits ~80ms for the origin response.


RFC 5861: stale-if-error

Cache-Control: max-age=60, stale-if-error=86400

If the origin returns a 5xx error, or is unreachable (connection refused, timeout), serve the stale cached copy for up to 86400 seconds (24 hours) beyond the original expiry.

This is the “graceful degradation” directive. Your CDN continues serving content even when the origin is completely down, for up to the specified duration.

t=0:   Origin healthy. Cache populated.
t=100: Cache entry expired (max-age=60 passed)
t=101: Origin request → 500 Internal Server Error
       → stale-if-error: serve cached copy from t=0
       → Client sees their article, not a 500 error

t=86460: stale-if-error window expired
          → If origin still down → return 502 Bad Gateway

The Background Revalidation Pattern

type cacheEntry struct {
    response    *http.Response
    body        []byte
    expiry      time.Time
    swrDeadline time.Time  // expiry + stale-while-revalidate
    sieDeadline time.Time  // expiry + stale-if-error
    revalidating atomic.Bool
}

func (c *Cache) get(key string) (*cacheEntry, string) {
    entry := c.store[key]
    now := time.Now()

    if now.Before(entry.expiry) {
        return entry, "HIT"
    }

    if now.Before(entry.swrDeadline) {
        // Serve stale, kick off background refresh
        if entry.revalidating.CompareAndSwap(false, true) {
            go c.revalidate(key, entry)
        }
        return entry, "STALE-REVALIDATING"
    }

    return nil, "EXPIRED"  // must wait for fresh copy
}

func (c *Cache) revalidate(key string, old *cacheEntry) {
    defer old.revalidating.Store(false)
    
    resp, err := fetch(key)
    if err != nil {
        // Background refresh failed; stale-if-error logic handles it
        return
    }
    c.store[key] = newEntry(resp)
}

The atomic.Bool on revalidating prevents multiple background goroutines for the same key — the first one wins, subsequent SWR requests see revalidating=true and skip launching another goroutine.


Stale-if-Error in Practice

The lab simulates origin failure with a configurable error rate flag. With stale-if-error, the sequence is:

1. Normal operation: cache populated, served fresh
2. Origin starts returning 503 (simulated)
3. CDN: entry is expired + origin erroring
   → Is there a stale-if-error window?
   → Yes → serve stale, log "STALE-ERROR"
4. Origin recovers
   → Next background revalidation succeeds
   → Cache entry updated
5. Normal operation resumes, no user-visible error

This is origin availability decoupled from user experience. For most content (articles, product pages, media), brief staleness is far preferable to a visible error.


When Not to Use stale-while-revalidate

Some content must always be fresh:

Content typeUse SWR?Reason
News articles✓ (short window)Mild staleness acceptable
Product pagesPrice/stock staleness OK for seconds
Authentication stateMust be current
Payment/checkoutCannot serve stale price
Medical informationAccuracy is legal requirement
Real-time scores/feeds✗ (or very short)Value proposition is freshness

Use Cache-Control: no-cache combined with conditional requests (Lab 04) instead of SWR for content where freshness is the product.


Production Detail: Cloudflare’s Default SWR

Cloudflare silently applies always online (a form of stale-if-error) for all cached content by default, serving a cached copy for up to 10 minutes if the origin returns a 5xx error. This is enabled by default and can be disabled per-zone.

Cloudflare also exposes stale-while-revalidate support since 2022, honoring the directive from origin responses.

Fastly’s equivalent is Shielding + Grace period: cached objects can be served from shield for up to a configurable grace period while background revalidation happens.


Timeline Annotation (What the Lab Prints)

t=0s:   /article/1 → MISS → fetch → store (TTL=30s, SWR=15s, SIE=300s)
t=5s:   /article/1 → HIT (25s remaining)
t=30s:  /article/1 → HIT (0s remaining, within SWR window)
         → background revalidation started
t=31s:  /article/1 → HIT (background revalidation complete, reset TTL)
[origin down]
t=50s:  /article/1 → expires → origin 503
         → STALE-IF-ERROR (250s remaining in SIE window)
[origin up]
t=60s:  background revalidation succeeds → back to normal HIT

Try It

make lab-07

# Normal behavior: articles served stale during background revalidation
curl http://localhost:8080/article/1
sleep 35
curl http://localhost:8080/article/1     # served stale, triggers background refresh
curl http://localhost:8080/article/1     # now fresh again

# Simulate origin down:
# Restart lab with --error-rate 1.0 to see stale-if-error in action

Lab 08 · Tiered Cache: Memory + Disk

Run it: make lab-08
Source: labs/lab-08-tiered-cache/main.go


The Problem

A single in-memory cache has two opposing constraints:

  • Memory is fast but limited: A 32 GB edge node has ~28 GB usable after OS. At an average response size of 10 KB, that’s ~2.8 million cached objects.
  • Disk is large but slower: NVMe at 500 µs is 50× slower than DRAM at 100 ns — but a 4 TB NVMe holds 400 million 10 KB objects.

The solution is a two-tier cache: hot objects in L1 memory, warm objects in L2 disk. On an L1 miss, check L2 before going to origin.

Request
  │
  ▼
L1 (Memory LRU)  ─ HIT (< 1 µs) → response
  │ miss
  ▼
L2 (Disk NVMe)   ─ HIT (< 1 ms) → promote to L1 → response
  │ miss
  ▼
Origin           ─ fetch (80+ ms) → store in L2 → promote to L1 → response

This is the architecture of Nginx’s proxy_cache with memory + file tiers, Varnish’s Massive Storage Engine (MSE), and Cloudflare’s tiered cache storage.


L1: LRU Memory Cache

The LRU Data Structure

LRU (Least Recently Used) eviction requires O(1) Get and O(1) Put. The classic implementation uses a doubly-linked list + hash map:

Hash map: key → *list.Element  (O(1) lookup by key)
List:     MRU [elem4, elem2, elem1, elem3] LRU  (O(1) move to front, remove from back)

On Get(key):
  1. Lookup element in hash map          O(1)
  2. Move element to front of list       O(1)  ← "recently used"
  3. Return value

On Put(key, value):
  1. If key exists: update + move to front
  2. If cache full: remove LRU (list.Back()), delete from hash map
  3. Insert new element at front

Go’s container/list provides the doubly-linked list. Combined with a sync.RWMutex for concurrent access:

type LRUCache struct {
    cap   int
    mu    sync.RWMutex
    list  *list.List
    items map[string]*list.Element
}

type item struct {
    key   string
    value []byte
    expiry time.Time
}

func (c *LRUCache) Get(key string) ([]byte, bool) {
    c.mu.Lock()
    defer c.mu.Unlock()
    el, ok := c.items[key]
    if !ok { return nil, false }
    entry := el.Value.(*item)
    if time.Now().After(entry.expiry) {
        c.list.Remove(el)
        delete(c.items, key)
        return nil, false
    }
    c.list.MoveToFront(el)  // Mark as recently used
    return entry.value, true
}

Note: Get takes a write lock (not read lock) because it modifies the list order. This is a subtle but important point: LRU Get is a mutation. Solutions include:

  • Accept write lock on every Get (simple, correct, ~50 ns per lock)
  • Use a CLOCK algorithm (approximate LRU, read-only Get — used by Linux VM)
  • Use a concurrent LRU like Ristretto (shard by key hash, reduces lock contention)

L2: Disk Cache with xxhash

Key-to-Filename Mapping

Storing arbitrary URL strings as filenames is error-prone (path traversal, length limits, special characters). Map the key to a hash:

import "github.com/cespare/xxhash/v2"

func diskPath(dir, key string) string {
    h := xxhash.Sum64String(key)
    // Use a two-level directory structure to avoid large directories:
    // "a1b2c3d4e5f60718" → "a1/b2/a1b2c3d4e5f60718"
    hex := fmt.Sprintf("%016x", h)
    return filepath.Join(dir, hex[:2], hex[2:4], hex)
}

The two-level directory sharding (common in systems like git’s object store) prevents the filesystem from having too many entries in a single directory. ext4 directories degrade at ~10 million entries; most filesystems struggle past 100k entries in a flat dir.

Why xxhash, Not MD5/SHA1?

HashThroughputOutput sizeCollision resistance
MD5~500 MB/s128 bitLow (broken for crypto)
SHA-1~300 MB/s160 bitLow (broken for crypto)
SHA-256~200 MB/s256 bitCryptographically strong
xxhash64~30 GB/s64 bitSufficient for cache keys
xxhash128~30 GB/s128 bitSufficient for cache keys

For cache key → filename, we don’t need cryptographic resistance — we need speed and low collision probability. With 10 million cached items, xxhash64’s 64-bit space (1.8 × 10^19) gives a collision probability of ~2.7 × 10^-9. Acceptable.

xxhash is also used internally by Fastly and in ClickHouse, Kafka, and numerous storage systems.

Atomic File Writes

To prevent a partially-written cache file from being read:

func writeDisk(path string, data []byte) error {
    // Write to temp file first
    tmp := path + ".tmp"
    if err := os.WriteFile(tmp, data, 0644); err != nil {
        return err
    }
    // Atomic rename: no reader can see a partial write
    return os.Rename(tmp, path)
}

os.Rename is atomic on POSIX systems (guaranteed by the kernel). This is the same technique used by SQLite (WAL mode), databases, and every serious storage system.


Cache Promotion

When an item is fetched from L2 (disk) to serve a request, promote it to L1 (memory) so subsequent requests for the same item are served from memory:

if data, ok := l2.Get(key); ok {
    l1.Put(key, data, ttl)   // promote to memory
    return data, "L2-HIT"
}

Promotion policies to consider:

  • Always promote: simple; hot items quickly migrate to L1
  • Promote after N hits: avoids polluting L1 with items only needed once
  • Promote based on size: don’t promote large files to L1 (wastes memory with low incremental benefit)

Production Numbers

TierTechnologyLatencySize (single node)
L1 memoryDRAM< 1 µs4–128 GB
L1.5 NVMePCIe 4.0 NVMe50–200 µs1–8 TB
L2 diskHDD RAID3–10 ms4–100 TB
L3 shieldOrigin Shield PoP5–20 ms100s of TB
OriginApp server / S350–500 msunlimited

Cloudflare uses NVMe SSDs as L2 across all PoPs, with the L1 being in-process memory (implemented in Rust/C++). The transition from HDD to NVMe across CDN infrastructure (2015–2020) reduced L2 miss penalty by ~50× and enabled much larger object counts.


Try It

make lab-08

# First request: L2 miss → origin fetch → stored in L2 + promoted to L1
curl http://localhost:8080/article/1

# Second request: L1 HIT (< 1µs)
curl http://localhost:8080/article/1

# Kill and restart the process — L1 (memory) is gone but L2 (disk) remains
# Restart: first request should be L2 HIT, then promoted to L1

Lab 09 · Cache Tags & Bulk Purge

Run it: make lab-09
Source: labs/lab-09-cache-tags/main.go


The Problem

When content changes, you need to invalidate the cached copies. The naive approach is to invalidate by URL:

DELETE /cache/article/42

But /article/42 might appear under many URLs:

/article/42
/article/42?format=mobile
/api/v1/articles/42
/api/v2/articles/42
/feed/latest      ← includes article 42's content
/user/123/posts   ← includes article 42 if user 123 wrote it
/search?q=keyword ← search results containing article 42

You can’t enumerate every affected URL. Content relationships are graph-shaped, not path-shaped. You need a way to say: “invalidate everything tagged with article-42.”


Surrogate Keys / Cache Tags

A cache tag (also: surrogate key, soft purge tag) is a label you apply to one or more cache entries. When content changes, you purge by tag, and all tagged entries are invalidated simultaneously.

The origin sets tags via a response header:

Surrogate-Key: article-42 author-123 category-tech

or the equivalent headers used by different vendors:

VendorHeader
FastlySurrogate-Key
CloudflareCache-Tag
AkamaiEdge-Control: tag
VarnishX-Tags
AWS CloudFront(custom Lambda@Edge)

The CDN strips this header from responses sent to browsers (it’s a CDN-internal directive) and maintains a tag → URL mapping internally.


Data Structure: Tag → URL Mapping

type TagStore struct {
    mu       sync.RWMutex
    tagToURLs map[string]map[string]struct{}  // tag → set of URLs
    urlToTags map[string][]string             // URL → list of tags
}

func (s *TagStore) Tag(url string, tags []string) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.urlToTags[url] = tags
    for _, tag := range tags {
        if s.tagToURLs[tag] == nil {
            s.tagToURLs[tag] = make(map[string]struct{})
        }
        s.tagToURLs[tag][url] = struct{}{}
    }
}

func (s *TagStore) PurgeByTag(tag string) []string {
    s.mu.Lock()
    defer s.mu.Unlock()
    urls := s.tagToURLs[tag]
    purged := make([]string, 0, len(urls))
    for url := range urls {
        purged = append(purged, url)
        // Remove reverse mapping
        for _, t := range s.urlToTags[url] {
            delete(s.tagToURLs[t], url)
        }
        delete(s.urlToTags, url)
    }
    delete(s.tagToURLs, tag)
    return purged
}

The Purge API

POST /cache/purge
Content-Type: application/json

{"tags": ["article-42", "author-123"]}

Response:

{
  "purged_urls": [
    "/article/42",
    "/article/42?format=mobile",
    "/api/v1/articles/42",
    "/feed/latest"
  ],
  "count": 4
}

The CDN removes those entries from L1 and L2 storage. New requests will trigger origin fetches to repopulate.


Consistency Challenge: Distributed Purge

In a single-node setup, purge is a local operation. In a multi-PoP CDN, a purge request must propagate to every node that may have cached the tagged content.

Approaches:

1. Central purge broadcast

Application → Purge API → Central coordinator
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
                 NYC-01    LHR-01    NRT-01

Simple, but the coordinator is a single point of failure. Latency from coordinator to distant PoPs can be 100–200 ms, meaning stale content is served during the propagation window.

2. Gossip-based purge (Lab 14)

Purge messages propagate using epidemic (gossip) protocol. Each node tells a random subset of peers about the purge. Within O(log N) rounds, all nodes are notified. At scale (100+ nodes), gossip is more robust than central broadcast.

3. Versioned cache keys

Embed a content version in the cache key:

cache key = normalize(url) + "|v=" + content_version

When content changes, increment content_version at the application layer. Old entries never get accessed again (they’re naturally evicted). No explicit purge needed. Purge becomes a no-op for versioned content.

This is how Google Cloud CDN and most “CDN for static assets” setups work: immutable assets with fingerprinted URLs (main.abc123.js).


Real-World Usage: CMS Integration

WordPress → publishes article update
   → WP plugin fires: POST /cdn/purge {"tags": ["post-42", "category-8", "tag-php"]}
   → CDN removes:
      - /2025/01/article-about-php
      - /category/php/
      - /tag/php/
      - /                          ← homepage (if it shows recent posts)
      - /sitemap.xml
      - RSS feed entries

Drupal, WordPress, and most CMSs have plugins for Fastly, Cloudflare, and Akamai that fire these purges on content save/publish events.

At The New York Times, Fastly Surrogate-Key purge is used to invalidate all representations of an article simultaneously — the canonical URL, AMP version, app API response, and share preview — with a single purge call containing the article’s surrogate key.


Tag Design Best Practices

PatternExampleNotes
Entity IDarticle-42Always tag with entity type + ID
Entity typearticlesPurge all articles in one call
Authorauthor-123Invalidate author profile changes
Categorycat-techCategory page + all articles in it
Layout templatetemplate-homepageIf homepage template changes
API versionapi-v1Deprecating an API endpoint

Don’t create tags with high cardinality as single values — e.g., a tag per user session is meaningless for shared CDN cache.


Try It

make lab-09

# Tag gets automatically set on origin responses
curl http://localhost:8080/article/42 -v
# Look for Surrogate-Key in origin response (stripped from CDN response to client)

# Article served from cache (HIT)
curl http://localhost:8080/article/42

# Purge by tag → invalidates all tagged entries
curl -X POST http://localhost:8080/cache/purge \
  -H "Content-Type: application/json" \
  -d '{"tags": ["article-42"]}'

# Next request should be a MISS (content re-fetched from origin)
curl http://localhost:8080/article/42

Lab 10 · Compression

Run it: make lab-10
Source: labs/lab-10-compression/main.go


The Problem

Network bandwidth is neither free nor unlimited. Compressing HTTP responses before delivery:

  1. Reduces latency: smaller payload = faster transfer, especially on mobile networks (LTE: ~20 Mbps, high latency)
  2. Saves egress cost: CDN egress pricing ($0.01–0.09/GB); compression typically achieves 60–80% size reduction on text
  3. Improves user experience: a 500 KB page compressed to 120 KB loads 4× faster on a 1 Mbps mobile connection

The CDN is uniquely positioned to apply compression because:

  • It has fast CPUs dedicated to edge functions
  • Pre-compressing on cache store amortizes CPU cost over many serves
  • Origin doesn’t need to compress repeatedly for each request

HTTP Content Negotiation

The client advertises its supported encodings:

Accept-Encoding: br, gzip, deflate, zstd;q=0.9

The server selects from the client’s list and responds with:

Content-Encoding: br
Content-Length: 12340
Vary: Accept-Encoding

Quality Values (q-values)

The ;q=N suffix is a preference weight from 0 to 1. The CDN should select the encoding with the highest q-value that it supports:

Accept-Encoding: br;q=1.0, gzip;q=0.9, *;q=0.5
→ Prefer br, then gzip, then any other encoding

The Three Encodings

gzip (RFC 1952 + deflate)

The universal standard. Every HTTP client built since 1997 supports gzip. Wrap deflate (DEFLATE algorithm) with a CRC-32 checksum.

compression ratio: ~67% (text)    3 KB HTML → ~1 KB
throughput:        ~400 MB/s (klauspost/compress implementation)

gzip is based on LZ77 sliding window compression + Huffman coding. The sliding window size (8 KB–32 KB) controls compression ratio vs. memory. Larger window = better ratio, more memory.

Brotli (RFC 7932)

Developed by Google, released 2015. Designed specifically for HTTP text compression. Uses a pre-built dictionary of common HTML/CSS/JS tokens plus the standard DEFLATE approach.

compression ratio: ~82% (text)    3 KB HTML → ~540 bytes  (≈15% better than gzip)
throughput:        ~300 MB/s
browser support:   all modern browsers (IE 11 and below: no)

Brotli at quality level 11 (max) achieves the best ratio but is very slow to compress (~10 MB/s). CDNs typically use quality 4–6 for on-the-fly compression and quality 11 for pre-compressed static assets.

Zstd (RFC 8478)

Facebook’s Zstandard, released 2016. Extremely fast decompression.

compression ratio: ~70–80% (text)
throughput:        ~2 GB/s compression, ~5 GB/s decompression
use case:          origin-to-CDN links, inter-datacenter transfers

Zstd is not yet universally supported in browsers (Chrome only, 2023). Its main CDN use case is origin-to-edge compression: Cloudflare uses zstd between their edge nodes and origin servers where both ends are controlled.


Storage Strategies

1. Store compressed, serve compressed

Store one compressed version per encoding. On request, check Accept-Encoding and serve the matching stored version:

type cacheEntry struct {
    rawBody    []byte   // uncompressed
    gzipBody   []byte   // gzip compressed
    brotliBody []byte   // brotli compressed
}
  • Pros: zero per-request CPU for compression
  • Cons: 2–3× storage overhead (each encoding stored separately)
  • Best for: static assets with long TTL, high request volume

2. Compress on-the-fly

Store uncompressed. Compress each response at serve time:

func compressResponse(w io.Writer, body []byte, encoding string) error {
    switch encoding {
    case "br":
        bw := brotli.NewWriterLevel(w, brotli.DefaultCompression)
        defer bw.Close()
        _, err := bw.Write(body)
        return err
    case "gzip":
        gw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
        defer gw.Close()
        _, err := gw.Write(body)
        return err
    }
    _, err := w.Write(body)
    return err
}
  • Pros: 1× storage, always fresh compression
  • Cons: CPU cost per request (~1 µs/KB for gzip, ~3 µs/KB for brotli)
  • Best for: dynamic content with short TTL, low repetition

3. Pre-compressed at origin

Origin stores pre-compressed files:

/assets/app.js       → not pre-compressed
/assets/app.js.gz    → gzip pre-compressed
/assets/app.js.br    → brotli pre-compressed

CDN serves app.js.gz or app.js.br based on Accept-Encoding. No CPU overhead at CDN. Common for static site CDNs (S3 + CloudFront).


The Vary: Accept-Encoding Requirement

When the CDN stores multiple encodings of the same URL, it must include Vary: Accept-Encoding in responses. This tells downstream caches (browsers, ISP proxies) that the response differs by encoding.

Without Vary, a browser may cache the gzip version and later send it to a client that only supports plain text — corrupted response.

Also: the CDN must maintain separate cache entries keyed by encoding. See Lab 05 for how the cache key is expanded using Vary headers.


What Not to Compress

Content typeCompress?Reason
HTML, CSS, JS✓ alwaysHigh text entropy; 60–80% savings
JSON APIs✓ alwaysOften compresses 5–10×
SVG, XMLXML is verbose
JPEG, PNG, WebPAlready compressed; gzip adds overhead
MP4, WebMAlready compressed
PDFUsually already compressed internally
Already gzippedDouble-compressing = larger output
< 1 KBOptionalOverhead exceeds savings

The CDN should check Content-Type before compressing and skip binary formats. Most CDNs have a built-in list of compressible MIME types.


Compression Savings Calculator

For a site serving 1 TB/month with 70% text responses (700 GB):

gzip saves 67%:     700 GB × 0.67 = 469 GB saved
At $0.05/GB egress: 469 GB × $0.05 = $23.45/month saved

brotli saves 82%:   700 GB × 0.82 = 574 GB saved
At $0.05/GB egress: 574 GB × $0.05 = $28.70/month saved

At petabyte scale (Netflix, YouTube), compression savings run to millions of dollars per month.


Try It

make lab-10

# Request with brotli (best compression)
curl http://localhost:8080/article/1 -H "Accept-Encoding: br" \
  -v --output /dev/null 2>&1 | grep -i "content-encoding"

# Request with gzip
curl http://localhost:8080/article/1 -H "Accept-Encoding: gzip" \
  --compressed -v

# No compression (compare sizes)
curl http://localhost:8080/article/1 -H "Accept-Encoding: identity" -v

# Compare content lengths:
for enc in br gzip identity; do
  echo -n "$enc: "
  curl -s http://localhost:8080/article/1 -H "Accept-Encoding: $enc" \
    -o /tmp/response -w "%{size_download} bytes\n"
done

Lab 11 · Range Requests & Byte Serving

Run it: make lab-11
Source: labs/lab-11-range-requests/main.go


The Problem

A user seeks to 00:45:00 in a 4-hour movie. The full file is 4 GB. Without range requests, the client must:

  1. Start streaming from the beginning
  2. Buffer through 2.8 GB before reaching the 45-minute mark
  3. Or re-download the entire file after a connection drop

This is obviously unacceptable. HTTP Range Requests (RFC 7233) solve this by allowing clients to request specific byte ranges of a resource.


HTTP Range Requests: RFC 7233

The Request

GET /video/movie.mp4 HTTP/1.1
Range: bytes=2097152-4194303

Requesting bytes 2,097,152 to 4,194,303 (a 2 MB chunk starting at the 2 MB mark).

The Response

HTTP/1.1 206 Partial Content
Content-Range: bytes 2097152-4194303/10485760
Content-Length: 2097152
Content-Type: video/mp4

206 Partial Content indicates a successful range request. The full resource size is 10,485,760 (10 MB) in this example.

Byte Range Syntax

FormatMeaning
bytes=0-499First 500 bytes
bytes=500-999Second 500 bytes
bytes=-500Last 500 bytes
bytes=9500-Bytes from 9500 to end
bytes=0-0,-1First and last byte

Multipart Range Responses

A single request can ask for multiple disjoint ranges:

Range: bytes=0-50, 100-150

Response:

HTTP/1.1 206 Partial Content
Content-Type: multipart/byteranges; boundary=3d6b6a416f9b5

--3d6b6a416f9b5
Content-Type: text/plain
Content-Range: bytes 0-50/1270

[50 bytes of data]
--3d6b6a416f9b5
Content-Type: text/plain
Content-Range: bytes 100-150/1270

[50 bytes of data]
--3d6b6a416f9b5--

Multipart ranges are rarely used in practice — most video players request sequential single ranges.

Accept-Ranges Header

The server advertises range request support:

Accept-Ranges: bytes

If absent or Accept-Ranges: none, the client knows not to bother with range requests.


How the CDN Handles Range Requests

Case 1: Full object cached

If the CDN has the full object cached, it can serve any range locally without contacting the origin:

func serveRange(w http.ResponseWriter, r *http.Request, body []byte) {
    // Parse Range header
    start, end := parseRange(r.Header.Get("Range"), len(body))
    
    w.Header().Set("Content-Range", fmt.Sprintf("bytes %d-%d/%d", start, end, len(body)))
    w.Header().Set("Content-Length", strconv.Itoa(end-start+1))
    w.WriteHeader(http.StatusPartialContent)
    w.Write(body[start : end+1])
}

Case 2: Object not cached (miss)

The CDN must fetch the object from origin. Two strategies:

  1. Fetch full object: Request the complete file, cache it, serve the requested range. Simpler, but wasteful if the user only watches 5 minutes of a 4-hour movie.

  2. Forward range request: Pass the Range header upstream. The origin returns the exact bytes requested, which the CDN serves and caches. Problem: the CDN now has a partial object cached. Subsequent requests for different ranges must all go to origin.

Most CDNs use a hybrid approach: Range request forwarding with background fetch of the full object. Serve the requested range immediately (low TTFB), fetch the rest in background for future requests.

Case 3: Partial object cached (common in video)

A popular approach is segment-based caching: the CDN maps the byte range to a fixed-size segment (e.g., 1 MB) and caches segments independently. Any range request is decomposed into cached segments plus (at most) two uncached boundary segments.

This is what Akamai Adaptive Media Delivery and AWS CloudFront Media Store do internally for large video files.


Video Player Behavior

HTML5 video players (<video>) issue range requests with a characteristic pattern:

  1. Initial fetch: Range: bytes=0-65535 (first 64 KB — moov atom for MP4)
  2. Seek: Range: bytes=<offset of target timestamp>-<offset+1MB>
  3. Buffer ahead: sequential range requests slightly ahead of playback
  4. On pause: cancel in-flight range request
  5. On resume: restart from current position

The CDN sees these as a sequence of range requests to the same URL. Caching the full object ensures all of these can be served locally after the first full miss.


Download Resumption

When a large file download is interrupted (connection drop, browser closed):

Resumed download:
GET /downloads/large-file.zip
Range: bytes=52428800-

↑ Resume from exactly where it stopped (50 MB mark)

Without range support, the user restarts the entire download. With range support, the download resumes from where it was.

The CDN must set ETag or Last-Modified on the initial response so the client can validate the resource hasn’t changed before resuming:

If-Range: "abc123"
Range: bytes=52428800-

If-Range says: “If the ETag still matches, resume; otherwise send the full file again.” This prevents serving corrupt data if the file was updated between the initial download and the resume.


Implementation: http.ServeContent

Go’s standard library provides a complete range request implementation:

func handler(w http.ResponseWriter, r *http.Request) {
    body := getContent(r.URL.Path)
    reader := bytes.NewReader(body)
    modTime := time.Now() // or real modification time
    http.ServeContent(w, r, r.URL.Path, modTime, reader)
}

http.ServeContent handles:

  • Range header parsing and validation
  • 206 Partial Content responses
  • 304 Not Modified via If-Modified-Since and If-None-Match
  • Content-Range header generation
  • Multipart ranges

For CDN caching layers, the lab implements manual range handling to show the full mechanics. For production use of static files, http.ServeContent or http.ServeFile are correct choices.


What to Measure

# Ratio of 206 vs 200 responses (high ratio = lots of video/download traffic)
rate(http_responses_total{status="206"}[5m]) /
  rate(http_responses_total{status="200"}[5m])

# Range request cache hit ratio
rate(cache_hits_total{request_type="range"}[5m]) /
  rate(cache_requests_total{request_type="range"}[5m])

# Large object hit ratio (bytes, not requests — often more meaningful)
rate(cache_hit_bytes_total[5m]) /
  rate(cache_total_bytes_total[5m])

Try It

make lab-11

# Full file
curl http://localhost:8080/file/video.mp4 -v

# First 10 KB
curl http://localhost:8080/file/video.mp4 -H "Range: bytes=0-10239" -v

# Last 4 KB
curl http://localhost:8080/file/video.mp4 -H "Range: bytes=-4096" -v

# Resume from 50 MB
curl http://localhost:8080/file/large.bin -H "Range: bytes=52428800-" -v

# With If-Range (ETag-validated resume)
ETAG=$(curl -si http://localhost:8080/file/video.mp4 | grep -i etag | awk '{print $2}')
curl http://localhost:8080/file/video.mp4 \
  -H "Range: bytes=1024-2047" \
  -H "If-Range: $ETAG" -v

Lab 12 · Consistent Hashing

Run it: make lab-12
Source: labs/lab-12-consistent-hashing/main.go


The Problem

You have a pool of N CDN edge nodes. You want to route each URL to the same node consistently — so the same URL is always cached at the same node, maximizing cache reuse. How do you map URLs to nodes?

The Naive Approach: Modular Hashing

node := hash(url) % N   // assign URL to node index

This works — until you add or remove a node. When N changes to N+1:

hash(url) % N   →  different node for almost every URL

Remapping fraction ≈ (N-1)/N. With 10 nodes, adding one node remaps 90% of all cache keys to different nodes. 90% of your cache invalidates instantly — a thundering herd against your origin.

This is why CDNs don’t use modular hashing for node selection.


Consistent Hashing (Karger et al., 1997)

Consistent hashing places both nodes and keys on a virtual ring (a circle with positions 0 to 2^32 or 2^64). A key is assigned to the first node clockwise from the key’s position on the ring.

Ring (0 to 2^32)

         0
    ───────────
   /     N1     \
  │    (pos=15)  │
  │         ●   │
  │              │
  │    ●    ●   │
  │   K1   N2   │
  │              │
  │              │
   \   N3   K2  /
    ─────────────
         max

K1 (pos=45) → first clockwise node → N2 (pos=60)
K2 (pos=90) → first clockwise node → N3 (pos=95)

When a node is added: only the keys that fall between the new node’s position and its predecessor need to move. Expected remapping: only 1/N of all keys, regardless of N.

When a node is removed: only keys assigned to that node need to move to the next node. Again, only 1/N remapped.


Virtual Nodes (Vnodes)

With only one ring position per node, the key distribution is uneven — some nodes get more keys than others, especially with few nodes.

The solution: each physical node occupies multiple positions on the ring (virtual nodes). The buraksezer/consistent library defaults to 100 vnodes per node:

Physical node A → virtual nodes at positions: 15, 234, 567, 891, 1043, ...
Physical node B → virtual nodes at positions: 72, 310, 640, 958, 1200, ...

With 100 vnodes per node and 3 nodes: 300 ring positions. Key distribution becomes approximately uniform (σ ≈ 10% of mean load per node).

Tradeoff: more vnodes = better balance, but more ring metadata to maintain. At 1000 nodes × 100 vnodes = 100,000 ring positions. Still trivial in memory.


Implementation with buraksezer/consistent

import "github.com/buraksezer/consistent"

type Member string
func (m Member) String() string { return string(m) }

// Create ring
cfg := consistent.Config{
    PartitionCount:    271,   // prime number for distribution
    ReplicationFactor: 40,    // vnodes per member
    Load:              1.25,  // max load imbalance factor
    Hasher:            hasher{},
}
c := consistent.New(nil, cfg)

// Add nodes
c.Add(Member("node-1"))
c.Add(Member("node-2"))
c.Add(Member("node-3"))

// Route a key
member := c.LocateKey([]byte(url))  // returns the responsible node
// member.String() → "node-2"

Important API note: LocateKey returns a consistent.Member interface, not a (Member, error) pair. It always returns one member (the ring is never empty once populated). If the ring is empty, it panics — guard with a node count check.


The PartitionCount Parameter

Consistent library’s PartitionCount (not to be confused with Kafka partitions) divides the hash space into PartitionCount slices. Each partition is assigned to a member. Better explanation of the API:

PartitionCount:    271  → 271 hash space segments (prime to minimize collisions)
ReplicationFactor: 40   → each member appears in ~40 partitions

With 3 members and ReplicationFactor=40, each member owns ~89 partitions (271/3 ≈ 90, slight imbalance due to prime).


Applications in CDN Architecture

1. Shield routing (Lab 13)

Origin Shield uses consistent hashing to route all requests for a URL to the same shield PoP. This maximizes the shield’s cache utilization: if 10 edge PoPs all forward misses for /popular-image to the same shield node, that shield node only fetches from origin once.

2. Peer-to-peer CDN (BitTorrent-style)

CDN nodes use consistent hashing to decide which peer to request cached content from before going to origin. Key = object ID, ring = all CDN nodes in a region.

3. Memcache cluster routing

Client-side consistent hashing for memcached clusters. The application client routes each key to the same cache server. Adding a new cache server only remaps 1/N keys (instead of all keys with modular hashing). This was described in Facebook’s 2013 memcache paper.

4. Load balancing with session stickiness

Route users to the same backend server (for session data stored in-process) using consistent hashing on the client IP or session cookie.


Failure Modes

FailureConsistent hashing behaviorPlain mod-N behavior
Add 1 node to 101/11 keys remap10/11 keys remap
Remove 1 node from 101/10 keys remap9/10 keys remap
Node flapping (add/remove rapidly)Same 1/N segment shifts each timeWholesale remapping
Uneven key distributionVnodes reduce imbalanceN/A

Hot Keys

Consistent hashing assigns each key to exactly one node. If a key is extremely popular (a viral video URL), one node gets all the traffic.

Solutions:

  1. Consistent hash → multiple replicas: store popular objects on K nodes (the primary plus K-1 successors), distribute reads randomly among them.
  2. Application-level scatter: Nginx upstream zones with least_conn (route to least-loaded backend).
  3. Per-node in-memory cache: popular objects are already in L1 on every node; consistent hashing only affects L2 and origin routing.

Cloudflare’s Argo routing uses a variant: route based on real-time network latency and node load rather than pure hash, accepting the cache inefficiency for better tail latency.


Try It

make lab-12

# Route URLs to nodes — each URL consistently maps to the same node
curl http://localhost:8080/route/article/1
curl http://localhost:8080/route/article/1  # same node every time

# Show distribution
curl http://localhost:8080/stats

# Simulate node removal — minimal remapping
curl -X DELETE http://localhost:8080/nodes/node-2
curl http://localhost:8080/stats  # articles on node-2 moved to next node

Lab 13 · Origin Shield

Run it: make lab-13
Source: labs/lab-13-origin-shield/main.go


The Problem

A CDN with 200 PoPs worldwide, each with an independent cache. Your origin handles peak traffic fine: 100 req/s of cache misses.

Then a popular video goes viral. Every PoP simultaneously gets cache misses for that video URL. 200 PoPs × simultaneous misses = 200 simultaneous origin requests. Origin collapses.

Even with singleflight within a single PoP (Lab 06), there’s no deduplication across PoPs. Each PoP independently decides to fetch from the origin.


Origin Shield: A Designated Parent PoP

The solution: designate one PoP as the shield (or parent PoP). All 200 edge PoPs forward their misses to the shield instead of to the origin. The shield may have the cached copy; if not, it fetches from origin once and serves all 200 edge misses from that single fetch.

200 Edge PoPs (all miss simultaneously)
    │
    ├── NYC:  forward to shield
    ├── LHR:  forward to shield
    ├── NRT:  forward to shield
    │   ...
    └── SYD:  forward to shield
             │
             ▼
        Shield PoP (e.g. IAD)
             │
             ├── Shield HIT → serve all 200 edges
             │
             └── Shield MISS → 1 origin request
                      │
                      ▼
                   Origin

Result: 200 origin requests → 1 origin request.

The shield applies singleflight itself: even if 200 edges arrive within a millisecond, the shield collapses all 200 into a single upstream fetch. Combined with shield-level caching, origin sees at most 1 request per content piece per TTL period regardless of CDN scale.


Vendor Implementations

VendorShield nameDesignation
FastlyShielding / POP-to-POPAny PoP can be shield
CloudFrontOrigin ShieldSingle regional shield
CloudflareTiered CacheSmart Tiering (auto)
AkamaiSureRoute / Tiered DistributionHierarchical

Fastly Shielding allows any PoP to be designated as shield, with routing based on latency to origin. You configure it per-service in VCL:

sub vcl_recv {
    if (req.backend == F_origin && !req.http.Fastly-FF) {
        set req.backend = shield:IAD;   # route through IAD shield
    }
}

CloudFront Origin Shield is a dedicated regional tier between the edge PoPs and your origin. You enable it with:

{
  "OriginShield": {
    "Enabled": true,
    "OriginShieldRegion": "us-east-1"
  }
}

CloudFront charges $0.0087–0.0050/10,000 requests for origin shield traffic — still vastly cheaper than paying for origin infrastructure to handle unshielded traffic.


Implementation

Three-Tier Architecture

Client → Edge (:8080, :8081) → Shield (:8082) → Origin (:9001)

Each tier is a separate Go process. The edge nodes use consistent hashing (Lab 12) to select which shield node handles each URL, and singleflight to collapse concurrent same-key requests within the edge:

// At the edge, for a cache miss:
result, _, shared := sfGroup.Do(cacheKey, func() (interface{}, error) {
    return fetchFromShield(cacheKey)
})

The shield does the same before forwarding to origin:

// At the shield, for a cache miss:
result, _, _ := sfGroup.Do(cacheKey, func() (interface{}, error) {
    return fetchFromOrigin(cacheKey)
})

The Shield Selection

For a shield tier with multiple shield nodes, use consistent hashing to select which shield handles each URL:

Edge → consistent_hash(url) → ShieldNode-X → Origin

All edges route requests for URL X to the same shield node, maximizing shield cache hit ratio. If a shield node fails, consistent hashing automatically routes to the next node (only 1/N of URLs are remapped).


The Math: Origin Request Reduction

Without origin shield:

E = number of edge PoPs (200)
T = TTL (300 seconds)
R = request rate per URL (1000/s across all PoPs)

Origin requests per URL = E = 200 (on each TTL expiry)

With origin shield:

S = number of shield nodes (2–5 typically)
Origin requests per URL = S = 2–5 (one per shield node per TTL)

Reduction factor: 200 ÷ 3 ≈ 67× fewer origin requests.

In practice with singleflight at the shield level, even S requests are collapsed to 1. Origin sees exactly 1 request per URL per TTL regardless of CDN scale.


Shield Latency Tradeoff

Origin shielding adds one network hop. Edge → Shield adds latency:

Without shield: Edge → Origin = 150 ms
With shield:    Edge → Shield → Origin = 5 ms + 150 ms = 155 ms

5 ms overhead for edge-to-shield hop (same region, dedicated link). The tradeoff is worth it because:

  1. 99% of requests are cache hits at either edge or shield
  2. The 5 ms penalty only applies to the remaining ~1% miss path

For a well-shielded CDN serving popular content:

Hit ratio at edge:    85%  → 0 ms overhead
Hit ratio at shield:  12%  → 5 ms overhead
Cache miss:            3%  → 155 ms (5 + 150)

Average added latency: 0.85×0 + 0.12×5 + 0.03×155 = 5.25 ms average

Origin protection vastly outweighs the 5.25 ms average latency cost.


Failure Modes

FailureBehavior without shieldBehavior with shield
Origin spike200 PoPs × misses = 200 requests1–3 shield requests
Origin down200 PoPs serve stale or error1–3 shield requests (stale-if-error)
Shield node downEdge falls back to origin directlyConsistent hash routes to next node
Shield cache invalidationMust purge all edges tooPurge shield = automatic edge invalidation

Try It

make lab-13

# Start all three tiers
# Lab automatically starts edge1(:8080), edge2(:8081), shield(:8082), origin(:9001)

# Request through edge 1
curl http://localhost:8080/article/1 -H "X-Debug: tiers"
# Response should show: Edge MISS → Shield MISS → Origin HIT

# Same request through edge 2 (different PoP)
curl http://localhost:8081/article/1 -H "X-Debug: tiers"
# Should show: Edge MISS → Shield HIT (shield already has it)

# Repeat both — edges should be HIT now
curl http://localhost:8080/article/1
curl http://localhost:8081/article/1

Lab 14 · Gossip Cluster & Distributed Purge

Run it: make lab-14
Source: labs/lab-14-gossip-cluster/main.go


The Problem

You have 100 CDN edge nodes. Content changes. You need every node to know about the invalidation within seconds.

Why Not a Central Coordinator?

A single “purge coordinator” that notifies all nodes:

Application → Coordinator → [Node1, Node2, ..., Node100]

Problems:

  1. Single point of failure: coordinator down = no purges propagate
  2. O(N) work per purge: coordinator sends 100 messages
  3. N connection overhead: coordinator maintains 100 persistent connections
  4. Thundering herd on coordinator: during deployments, thousands of purges
  5. Partitioned PoPs: nodes behind a network partition miss purges silently

The solution used by Cassandra, CockroachDB, and Cloudflare’s edge network: gossip protocol (epidemic dissemination).


Gossip Protocol: Epidemic Dissemination

Named by analogy to biological epidemics: one infected node tells a few others, who each tell a few more. Within O(log N) rounds, all nodes are informed.

Algorithm:

Round 1: Node A has new information
  → A tells: B, E

Round 2: B and E spread:
  → B tells: C, F
  → E tells: G, D

Round 3: C, F, G, D spread:
  → C tells: H, I
  → F tells: J, K
  → G tells: L, M
  → D tells: N, A (A already knows)

With 100 nodes and fanout=3: propagates to all in log₃(100) ≈ 4.2 rounds

Each round is a small fixed-cost message. Total network messages per gossip cycle: O(N log N). Compare to broadcast: O(N). Gossip is slightly more expensive per event but infinitely more resilient.


hashicorp/memberlist

The memberlist library implements the SWIM protocol (Scalable Weakly-consistent Infection-style Membership protocol) with enhancements from “Lifeguard”:

  • Member discovery: nodes find each other via gossip
  • Failure detection: probe + indirect probe to detect crashes
  • Broadcast: attach arbitrary data to membership messages (e.g., cache purge events)
import "github.com/hashicorp/memberlist"

// Configure the local node
config := memberlist.DefaultLocalConfig()
config.Name     = "edge-node-1"
config.BindAddr = "0.0.0.0"
config.BindPort = 7946
config.Delegate = &myDelegate{}  // receives user data
config.Events   = &myEventDelegate{}  // membership change callbacks

list, err := memberlist.Create(config)

// Join an existing cluster
list.Join([]string{"edge-node-2:7946", "edge-node-3:7946"})

// Broadcast a message to all nodes
list.LocalNode().Meta = []byte("hello")  // meta is broadcast with membership
list.UpdateNode(5 * time.Second)

// Or use the TransmitLimitedQueue for arbitrary messages
queue := &memberlist.TransmitLimitedQueue{
    NumNodes:       func() int { return list.NumMembers() },
    RetransmitMult: 3,
}
queue.QueueBroadcast(&purgeMessage{tag: "article-42"})

Implementing Distributed Cache Purge

1. Purge message format

type PurgeMessage struct {
    ID        string    `json:"id"`        // UUID for deduplication
    Tags      []string  `json:"tags"`
    URLs      []string  `json:"urls"`
    Origin    string    `json:"origin"`    // which node originated the purge
    Timestamp time.Time `json:"ts"`
}

func (m *PurgeMessage) Invalidates() bool { return true }
func (m *PurgeMessage) Message() []byte   { return mustMarshal(m) }
func (m *PurgeMessage) Finished()         {}

2. The Delegate

The memberlist.Delegate interface is how you plug in custom logic:

type cacheDelegate struct {
    queue *memberlist.TransmitLimitedQueue
    seen  sync.Map  // deduplication: message ID → struct{}
}

func (d *cacheDelegate) NotifyMsg(b []byte) {
    var msg PurgeMessage
    json.Unmarshal(b, &msg)
    
    // Deduplication: skip messages we've already processed
    if _, loaded := d.seen.LoadOrStore(msg.ID, struct{}{}); loaded {
        return
    }
    
    // Apply the purge locally
    for _, tag := range msg.Tags { localCache.PurgeByTag(tag) }
    for _, url := range msg.URLs { localCache.Delete(url) }
}

func (d *cacheDelegate) GetBroadcasts(overhead, limit int) [][]byte {
    return d.queue.GetBroadcasts(overhead, limit)
}

3. Gossip anti-entropy

Beyond event-driven purge, gossip implements anti-entropy: nodes periodically compare state with a random peer and reconcile differences. This catches missed messages (due to network partitions, node restarts, message drops under load).

Every 30s:
  → Node A picks random peer B
  → A sends a digest of its cache state (Bloom filter or version vectors)
  → B responds with any items A is missing
  → A applies the delta

This ensures eventual consistency: even if a purge message is dropped, the anti-entropy scan will catch the discrepancy within 30 seconds.


SWIM Protocol: Failure Detection

SWIM’s failure detection is probabilistic but fast:

1. Every T_probe seconds: node A probes random node B with a ping
2. If B doesn't respond within T_timeout:
   → A asks K other random nodes to probe B indirectly (indirect probe)
3. If no indirect probe succeeds:
   → A marks B as SUSPECT, gossips the suspicion
4. If B doesn't refute (send alive message) within T_suspect:
   → B is declared DEAD, gossipped as such
5. Dead members are removed from the ring

This gives O(1) probe messages per node and detects failures in ~3–5 seconds with default settings. Compare to a central heartbeat system: O(N) messages per probe cycle.


Push-Pull Gossip for State Synchronization

memberlist also implements push-pull gossip:

Node A pushes its full local state to random node B
Node B pushes its full local state back to A
Both reconcile differences

This is more expensive (full state exchange) but faster convergence for new nodes joining the cluster. Frequency: once per 30–60 seconds.

For cache invalidation: push-pull can sync the full set of currently-valid cache tags, ensuring a node that was offline for 5 minutes catches up on all purges it missed.


Real-World: Cloudflare’s Cache Purge

Cloudflare’s cache purge propagates across 300+ PoPs using a gossip-adjacent system. Their 2022 blog post describes how a purge request:

  1. Hits Cloudflare’s API endpoint
  2. Is distributed via their internal notification system (similar to gossip)
  3. Reaches all PoPs within 150ms for 95th percentile

At Cloudflare scale, this requires highly optimized serialization (Protocol Buffers), binary gossip protocols, and infrastructure tuned for low-latency small-message delivery.


Try It

make lab-14

# Three nodes form a gossip cluster automatically
# Look for "Cluster formed: 3 members" in the output

# Issue a purge on node 1
curl -X POST http://localhost:8080/cache/purge \
  -H "Content-Type: application/json" \
  -d '{"tags": ["article-42"]}'

# Within ~100ms, the purge propagates to nodes 2 and 3
# Verify by checking their cache state:
curl http://localhost:8081/cache/stats
curl http://localhost:8082/cache/stats
# Both should show article-42 as purged

Lab 15 · Geographic Routing & PoP Failover

Run it: make lab-15
Source: labs/lab-15-geo-routing/main.go


The Problem

A CDN node in Singapore is useless to a user in Berlin. Latency on a Singapore → Berlin path is ~160 ms one-way. A Frankfurt PoP would serve Berlin in ~5 ms.

Geographic routing — directing each user to the nearest CDN PoP — is one of the most impactful optimizations in CDN infrastructure. The difference between 160 ms and 5 ms TTFB is the difference between a bounced visitor and a retained one.


Routing Mechanisms

1. Anycast BGP (used by Cloudflare, Fastly)

The same IP address is announced from every PoP via BGP. Internet routing automatically directs packets to the topologically nearest PoP:

209.91.64.22 announced from:
  - Frankfurt PoP → European users reach Frankfurt
  - Tokyo PoP → Asian users reach Tokyo
  - Chicago PoP → US Midwest users reach Chicago

BGP anycast routing is handled entirely by the internet’s routing infrastructure. CDN operator’s job: configure BGP announcements correctly and monitor AS path lengths.

Advantage: Zero application-level routing logic. Failover is automatic (BGP withdraws the broken PoP’s announcement).

Disadvantage: BGP convergence is slow (~30–180 seconds for a prefix withdrawal to propagate globally). A PoP that goes down may continue receiving traffic for minutes.

DNS-level failover is faster (~30 seconds with low TTL), but requires additional coordination.

2. GeoDNS (used by many second-tier CDNs)

DNS returns different IP addresses based on the client’s IP’s geographic region:

User from Germany resolves cdn.example.com:
  → DNS returns 203.0.113.10 (Frankfurt PoP)

User from Japan resolves cdn.example.com:
  → DNS returns 203.0.113.20 (Tokyo PoP)

Advantage: Simple to implement; works with any CDN infrastructure.

Disadvantage: DNS caching (TTL 60s–300s) means failover is slow. During failover, users who cached the old IP get routed to a dead PoP. NXDOMAIN or connection refused until TTL expires.

3. Application-Layer Routing (HTTP Redirect)

User → cdn.example.com → Routing server
                           → 302 Redirect to "ams01.cdn.example.com"

This lab implements application-layer routing. A routing server receives all requests, calculates the optimal PoP, and either redirects or proxies to it.


Haversine Distance Calculation

The lab computes geographic distance using the haversine formula, which gives the great-circle distance between two points on a sphere:

func haversine(lat1, lon1, lat2, lon2 float64) float64 {
    const R = 6371 // Earth radius in km
    
    φ1 := lat1 * math.Pi / 180
    φ2 := lat2 * math.Pi / 180
    Δφ := (lat2 - lat1) * math.Pi / 180
    Δλ := (lon2 - lon1) * math.Pi / 180
    
    a := math.Sin(Δφ/2)*math.Sin(Δφ/2) +
         math.Cos(φ1)*math.Cos(φ2)*
         math.Sin(Δλ/2)*math.Sin(Δλ/2)
    
    c := 2 * math.Atan2(math.Sqrt(a), math.Sqrt(1-a))
    return R * c // distance in km
}

Given client location, find the closest PoP:

func nearestPoP(clientLat, clientLon float64, pops []PoP) PoP {
    var nearest PoP
    minDist := math.MaxFloat64
    for _, pop := range pops {
        if !pop.healthy.Load() { continue }  // skip unhealthy PoPs
        d := haversine(clientLat, clientLon, pop.Lat, pop.Lon)
        if d < minDist {
            minDist = d
            nearest = pop
        }
    }
    return nearest
}

The 5 PoPs

The lab simulates 5 geographically distributed PoPs:

PoPCityCoordsPort
NYCNew York40.71°N, 74.00°W:9010
LHRLondon51.51°N, 0.13°W:9011
NRTTokyo35.65°N, 139.76°E:9012
SYDSydney33.87°S, 151.21°E:9013
GRUSão Paulo23.43°S, 46.47°W:9014

Health Checking & Failover

Each PoP exposes a /health endpoint. The router runs periodic health checks:

type PoP struct {
    Name    string
    Addr    string
    Lat     float64
    Lon     float64
    healthy atomic.Bool
}

func (r *Router) healthCheckLoop() {
    ticker := time.NewTicker(5 * time.Second)
    for range ticker.C {
        for i := range r.pops {
            pop := &r.pops[i]
            go func() {
                resp, err := http.Get(pop.Addr + "/health")
                healthy := err == nil && resp.StatusCode == 200
                pop.healthy.Store(healthy)
            }()
        }
    }
}

atomic.Bool for the health state means reads in the routing hot path require no lock. Health checks run concurrently with requests; a false health state is propagated within one health-check interval.

When the nearest PoP is unhealthy, routing falls back to the next-nearest healthy PoP automatically.


Real-World PoP Selection

Geographic distance is a proxy for network latency, but not a perfect one. BGP path length, network peering relationships, and inter-AS latency can cause a geographically farther PoP to have lower latency.

Production CDNs use active latency measurements:

  • Cloudflare Argo: routes traffic based on real-time network telemetry measured across the actual internet paths between PoPs
  • Fastly: uses Anycast BGP (network handles routing) plus performance-based override for known poor paths
  • AWS CloudFront: uses latency-based routing in Route 53

The haversine approach in this lab is a good approximation (within ~20% of actual latency in most cases) and zero-overhead at runtime.


Client Location Detection

In production, client location comes from:

  1. IP geolocation: MaxMind GeoLite2 database or IP-API, maps IP → country/city/coords
  2. CDN headers: Cloudflare adds CF-IPCountry, CF-IPCity, CF-IPLatitude, CF-IPLongitude to every request automatically
  3. GPS/browser API: browser can provide precise location (user permission required)
  4. CDN PoP metadata: the PoP itself knows its geographic location; route users to the PoP they connected to

The lab accepts lat/lon as query parameters for testability.


PoP Infrastructure Design

When selecting where to locate PoPs, the key criteria are:

  1. Internet Exchange Points (IXPs): co-locate at major IXPs (DE-CIX Frankfurt, AMS-IX Amsterdam, LINX London) for direct peering with hundreds of ISPs, reducing latency and cost
  2. Traffic density: PoPs near large populations (NYC, London, Tokyo, São Paulo, Mumbai) serve the most users
  3. Data center tier: Tier 3+ (99.999% uptime, redundant power/cooling)
  4. Network diversity: multiple transit providers per PoP prevents single-provider outages from taking down the PoP

Try It

make lab-15

# Route a request from NYC (40.71, -74.00) — should go to NYC PoP
curl "http://localhost:8080/?lat=40.71&lon=-74.00" -v

# Route from London (51.51, -0.13) — should go to LHR PoP
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v

# Route from Tokyo — should go to NRT PoP
curl "http://localhost:8080/?lat=35.65&lon=139.76" -v

# Simulate LHR failure — London user should reroute to nearest healthy PoP
curl -X DELETE "http://localhost:8080/pops/LHR"
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v
# Should now route to NYC or GRU (next closest)

Lab 16 · Signed URLs & Token Authentication

Run it: make lab-16
Source: labs/lab-16-signed-urls/main.go


The Problem

Public CDN caching works for content anyone can access. But what about:

  • A Netflix video: only the paying subscriber should be able to access it
  • A signed download link: expires in 1 hour
  • A presigned S3 URL: locked to a specific IP address
  • A livestream: viewers who joined must stay authorized, not share URLs

The CDN must enforce authorization at the edge, before delivering content, without calling the origin for every request (that would destroy the CDN’s performance advantage).

Signed URLs solve this: the application server generates a URL that contains a cryptographic signature. The CDN verifies the signature without contacting the origin.


HMAC-SHA256: The Signature Primitive

HMAC (Hash-based Message Authentication Code) uses a secret key and a hash function (SHA-256 here) to produce an authentication code:

HMAC-SHA256(key, message) = H(key XOR opad || H(key XOR ipad || message))

Properties:

  • Unforgeability: without the key, it’s computationally infeasible to produce a valid MAC for a different message
  • Key-binding: same message + different key → different MAC
  • Non-collision: different messages → different MACs (with overwhelmingly high probability)

This is the same primitive used by JWT (HS256 variant), AWS Signature V4, and cookie signing in Django/Rails.

import "crypto/hmac"
import "crypto/sha256"

func sign(key []byte, message string) string {
    mac := hmac.New(sha256.New, key)
    mac.Write([]byte(message))
    return hex.EncodeToString(mac.Sum(nil))
}

// Verify — always use hmac.Equal, never ==
func verify(key []byte, message, signature string) bool {
    expected := sign(key, message)
    // hmac.Equal is constant-time to prevent timing attacks
    return hmac.Equal([]byte(signature), []byte(expected))
}

The Canonical String

The signature must cover all inputs that should be tamper-proof. The lab uses:

func canonicalString(method, path string, expires int64, clientIP string) string {
    // Deterministic: same inputs always produce same string
    return fmt.Sprintf("%s\n%s\n%d\n%s", 
        strings.ToUpper(method),  // GET
        path,                      // /video/movie.mp4
        expires,                   // Unix timestamp
        clientIP,                  // 1.2.3.4 (or "" if not IP-bound)
    )
}

This canonical string is signed. The URL then carries:

/video/movie.mp4?expires=1735689600&ip=1.2.3.4&sig=a1b2c3d4...&keyver=v2

What to include in the canonical string

ParameterInclude?Why
HTTP methodRecommendedPrevent GET token being used for DELETE
URL pathRequiredPrevent token for /video/1 being used for /video/2
Expiry timestampRequiredTime-bound the token
Client IPOptionalIP-locked tokens prevent sharing; breaks VPNs
Content typeOptionalPrevent download link being used for upload
Key versionVia URLEnables rotation without invalidating all tokens

Timing-Safe Comparison: hmac.Equal

Never use == or bytes.Equal to compare HMACs. These perform byte-by-byte comparison and short-circuit on the first mismatch.

An attacker can measure response time to determine how many bytes of their forged signature match the real signature (timing oracle). With enough requests:

sig[0] == correct?  → 200 ns (one comparison)
sig[0] != correct?  → 100 ns (short-circuit)

→ Binary search on each byte → forge a valid signature in O(256×32) = 8192 requests

hmac.Equal (and subtle.ConstantTimeCompare) always compare the full input regardless of where the first mismatch is, eliminating the timing oracle:

// WRONG — timing oracle vulnerability
if signature != expected {
    return false
}

// CORRECT — constant-time comparison
if !hmac.Equal([]byte(signature), []byte(expected)) {
    return false
}

This is OWASP Top 10 territory (A07: Identification and Authentication Failures).


Key Rotation

Secrets must be rotatable without invalidating all outstanding tokens. The URL carries a keyver parameter:

/video/movie.mp4?sig=abc123&keyver=v1&expires=...
/video/movie.mp4?sig=xyz789&keyver=v2&expires=...

The CDN maintains multiple keys:

var signingKeys = map[string][]byte{
    "v1": []byte("old-secret-key"),      // still accepted for in-flight URLs
    "v2": []byte("new-secret-key-2025"), // current signing key
}

func verifySignedURL(r *http.Request) bool {
    keyver := r.URL.Query().Get("keyver")
    key, ok := signingKeys[keyver]
    if !ok { return false }
    
    // Verify with the key for this version
    return hmac.Equal(
        []byte(computeExpectedSig(key, r)),
        []byte(r.URL.Query().Get("sig")),
    )
}

Rotation procedure:

  1. Generate new key; add as v2 to CDN config (old key still active)
  2. Configure application server to sign new URLs with v2
  3. Wait for all outstanding v1 tokens to expire (or force-expire them)
  4. Remove v1 from CDN config

Expiry Validation

func checkExpiry(r *http.Request) bool {
    expiresStr := r.URL.Query().Get("expires")
    expires, err := strconv.ParseInt(expiresStr, 10, 64)
    if err != nil { return false }
    
    return time.Now().Unix() < expires
}

Clock skew: CDN nodes across PoPs may have slight clock differences. Add a small grace period (30–60 seconds) to tolerate this:

return time.Now().Unix() < expires + 60  // 60-second grace window

IP Binding

Binding a signed URL to the client’s IP prevents the link from being shared. When a user logs into your streaming service and requests a video URL:

Application: token = sign(path, expires, clientIP="1.2.3.4")
CDN: verify(path, expires, clientIP=request.RemoteAddr)

If the user shares the URL with a friend (IP 5.6.7.8), the CDN rejects with 403 Forbidden.

Tradeoff: IP binding breaks users on:

  • VPNs (IP changes between token generation and use)
  • Mobile networks (IP changes during handoff)
  • Large corporate NAT (all employees share one IP — one user’s token would be usable by all)

Most streaming services IP-bind only for high-value content or use short-TTL tokens (15 minutes) instead.


Vendor Implementation

VendorSigned URL mechanism
CloudflareSigned URLs + Token Auth (Workers or built-in)
FastlySigned tokens via VCL
AWS CloudFrontSigned URLs (RSA) or Signed Cookies
AkamaiEdge Auth Token

AWS CloudFront uses RSA signatures (asymmetric): the application signs with a private key; CloudFront verifies with the public key. This means the CDN never needs to know the private key — useful when you don’t fully trust the CDN operator with the signing secret.


Try It

make lab-16

# Generate a signed URL (the lab provides a /sign endpoint for testing)
SIGNED=$(curl -s "http://localhost:8080/sign?path=/video/movie.mp4&ttl=300")
echo "Signed URL: $SIGNED"

# Access with valid signature
curl "$SIGNED" -v

# Access without signature — should be 403
curl "http://localhost:8080/video/movie.mp4" -v

# Expired token (manipulate the expires param)
EXPIRED=$(echo "$SIGNED" | sed 's/expires=[0-9]*/expires=1000000000/')
curl "$EXPIRED" -v  # should be 403

# Wrong IP (change the ip param if IP-bound)
curl "$SIGNED" -v  # will succeed from your IP
# → Serving signed content from a different IP would fail

Lab 17 · Edge Compute via WebAssembly

Run it: make lab-17
Source: labs/lab-17-edge-compute/main.go


The Problem

Every CDN feature we’ve built so far is fixed at deployment time: the routing logic, the cache key normalization, the compression settings. What if you want application-specific logic at the edge that changes independently of the CDN infrastructure?

Use cases:

  • Custom request routing logic (A/B test, feature flag)
  • Bot and device detection
  • Request authentication and rate limiting
  • Header manipulation (add, remove, rewrite)
  • Edge-rendered personalisation fragments
  • URL rewriting and canonical redirects

Traditionally this required deploying custom CDN software (Nginx modules, Varnish VMODs) or Lua/JS scripts (Nginx Lua, CloudFront Lambda@Edge). WebAssembly (WASM) provides a more universal and safer alternative.


WebAssembly at the Edge

WebAssembly is a binary instruction format designed for safe, fast execution in sandboxed environments. Key properties for edge compute:

  • Sandboxed: WASM modules cannot access the filesystem, network, or system calls directly. All I/O is mediated by the host.
  • Language-agnostic: Compile Go, Rust, C, AssemblyScript, or any WASM target to the same binary format.
  • Near-native speed: WASM runtime compiles to machine code; typical overhead is 5–10% vs. native.
  • Instant startup: WASM modules start in ~50 µs. Lambda/container cold starts are 100 ms–10 s.

Production edge compute platforms using WASM

PlatformRuntimeGuest languages
Cloudflare WorkersV8 isolates (JS + WASM)JS, Rust, Go, Python
Fastly ComputeLucet → WasmtimeRust, JS, Go, C
Deno DeployV8 + Deno WASMJS, TS
Fermyon SpinWasmtimeRust, Go, Python
wazero (this lab)Pure Go WASM runtimeAny WASI target

wazero: Pure Go WASM Host

tetratelabs/wazero is a zero-dependency, pure Go WASM runtime that implements WASI (WebAssembly System Interface). It runs WASM modules in-process:

import (
    "github.com/tetratelabs/wazero"
    "github.com/tetratelabs/wazero/imports/wasi_snapshot_preview1"
)

// Create a runtime
ctx := context.Background()
r := wazero.NewRuntime(ctx)
defer r.Close(ctx)

// Instantiate the WASI environment (stdin/stdout/stderr for the module)
wasi_snapshot_preview1.MustInstantiate(ctx, r)

// Load and compile the WASM binary
wasmBinary, _ := os.ReadFile("detector.wasm")
mod, _ := r.InstantiateModuleFromBinary(ctx, wasmBinary)

// Call a function exported by the WASM module
fn := mod.ExportedFunction("detect_bot")
result, _ := fn.Call(ctx, /* args... */)

API note: The lab uses wasi_snapshot_preview1.Instantiate (not MustInstantiate). MustInstantiate panics on error; Instantiate returns an error, which is more appropriate in a production handler.


The Guest WASM Module

The WASM “guest” is compiled from Go with GOOS=wasip1 GOARCH=wasm:

//go:build wasip1

package main

import (
    "strings"
    "unsafe"
)

// DetectBot checks if the User-Agent is a known bot
//
//export detect_bot
func DetectBot(uaPtr uint32, uaLen uint32) uint32 {
    ua := ptrToString(uaPtr, uaLen)
    if isBotUA(ua) { return 1 }
    return 0
}

func isBotUA(ua string) bool {
    lower := strings.ToLower(ua)
    bots := []string{"googlebot", "bingbot", "slurp", "duckduckbot",
                      "baiduspider", "yandexbot", "sogou", "facebot",
                      "ia_archiver", "curl/", "python-requests", "go-http"}
    for _, bot := range bots {
        if strings.Contains(lower, bot) { return true }
    }
    return false
}

// ptrToString converts a WASM memory pointer+length to a Go string
func ptrToString(ptr, length uint32) string {
    var buf []byte
    s := (*[1 << 30]byte)(unsafe.Pointer(uintptr(ptr)))
    buf = s[:length:length]
    return string(buf)
}

func main() {} // required for WASI

Build:

GOOS=wasip1 GOARCH=wasm go build -o detector.wasm ./guest/

Host-Guest Memory Communication

WASM has a flat 32-bit address space shared between host and guest. To pass a string from Go host to WASM guest:

// 1. Call the guest's allocate function to get a memory pointer
allocate := mod.ExportedFunction("allocate")
ptr, _ := allocate.Call(ctx, uint64(len(ua)))

// 2. Write the string into WASM memory
mod.Memory().Write(uint32(ptr[0]), []byte(ua))

// 3. Call the WASM function with the pointer and length
fn := mod.ExportedFunction("detect_bot")
result, _ := fn.Call(ctx, ptr[0], uint64(len(ua)))

This shared-memory model is efficient but requires careful memory management. The lab uses a simple approach (allocate once per request); production implementations use memory pools or arena allocators.


Fail-Open vs. Fail-Closed

When the WASM module errors (invalid input, out-of-memory, assertion failure), the edge has two choices:

Fail-open: serve the request normally, log the WASM error:

if err != nil {
    log.Warn("WASM error, serving anyway", "err", err)
    next.ServeHTTP(w, r)  // continue without bot detection
    return
}

Fail-closed: reject the request on WASM error:

if err != nil {
    http.Error(w, "Internal Error", 503)
    return
}

The lab uses fail-open for bot detection — if the WASM module crashes, it’s better to serve the user (potentially a bot, but probably a real user) than to block everyone.

Fail-closed is appropriate for: authentication checks, fraud detection, rate limiting where the risk of missing a check exceeds the risk of false rejection.


Cloudflare Workers Architecture

Cloudflare Workers run JavaScript (or WASM via JS) in V8 isolates — the same V8 engine used by Chrome/Node.js, but isolated per-worker:

V8 Isolate (< 128MB RAM, 10ms CPU):
  → One isolate per worker code deployment
  → Thousands of concurrent isolates per PoP
  → Cold start: ~5 ms (pre-warmed isolates: 0 ms)
  → Network: fetch() API proxied through Cloudflare infrastructure

Workers are intentionally limited to prevent abuse: no persistent state, no direct filesystem access, no raw network sockets. State must go through Workers KV (eventually consistent store), D1 (SQLite), or Durable Objects.

The WASM approach in this lab is more similar to Fastly Compute, which gives WASM modules more direct access to request/response objects.


Try It

make lab-17

# Normal request — WASM module processes the User-Agent
curl http://localhost:8080/article/1 \
  -H "User-Agent: Mozilla/5.0 (compatible; Chrome)" -v

# Bot request — should be identified and optionally blocked/tagged
curl http://localhost:8080/article/1 \
  -H "User-Agent: Googlebot/2.1" -v

# Curl (also a bot)
curl http://localhost:8080/article/1 -v
# Check X-Bot-Detected header in response

# Python-requests (bot)
curl http://localhost:8080/article/1 \
  -H "User-Agent: python-requests/2.28.0" -v

Lab 18 · HTTP/3 and QUIC

Run it: make lab-18
Source: labs/lab-18-http3-quic/main.go


The Problem

TCP was designed in 1974. Every HTTP version from 0.9 to 2.0 runs on TCP. But TCP has a fundamental flaw for modern web performance: Head-of-Line (HoL) Blocking.

In HTTP/2, all streams share a single TCP connection. If one TCP packet is lost, all streams stall until the lost packet is retransmitted:

HTTP/2 connection (single TCP)
Stream 1: HTML   [====|       |=====>]   ← stalled by lost packet
Stream 2: CSS    [====|       |=====>]   ← stalled by lost packet
Stream 3: image  [====|  LOST |=====>]   ← packet lost here

All streams wait for the retransmission of stream 3's packet.

On a path with 2% packet loss (common on mobile, satellite, congested networks), HTTP/2 throughput can be worse than HTTP/1.1 because of amplified HoL blocking.

QUIC solves this by rebuilding transport from scratch on top of UDP.


QUIC: A New Transport

QUIC (Quick UDP Internet Connections, RFC 9000) is a transport protocol built on UDP. It replicates TCP’s reliability guarantees while eliminating HoL blocking:

QUIC connection (over UDP)
Stream 1: HTML   [=============>]   ← independent stream
Stream 2: CSS    [=============>]   ← independent stream
Stream 3: image  [=====  LOST  →]   ← only this stream pauses for retransmit

Streams 1 and 2 continue unaffected.

Key QUIC Features

Connection IDs (CIDs)

In TCP, a connection is identified by (src IP, src port, dst IP, dst port). Changing any element tears down the connection.

QUIC connections are identified by a 64-bit opaque Connection ID:

QUIC Connection: CID=0xdeadbeef01234567
  Can migrate: src IP changes (mobile handoff) → connection survives
  Can migrate: src port changes (NAT rebinding) → connection survives

This enables Connection Migration: a mobile user moving from WiFi to LTE doesn’t break QUIC connections. TCP connections would require a full TLS+TCP handshake on the new network.

0-RTT Resumption

A client that previously connected to a server can resume with 0 RTT:

1st connection: 1-RTT (QUIC INIT + crypto handshake)
2nd connection: 0-RTT (client sends data immediately with cached session ticket)

0-RTT data is not forward secret (replay attack risk). For safe 0-RTT:

  • Non-mutating requests only (GET, HEAD, OPTIONS)
  • Servers must use replay protection (nonce tracking or time-window limits)

QPACK Header Compression

HTTP/2 uses HPACK for header compression. HPACK requires in-order delivery (a single dynamic table synchronized between endpoints). HPACK breaks under packet reordering.

QPACK (RFC 9204) is QUIC’s header compression scheme. It uses separate encoder and decoder streams that don’t block request streams on packet loss.


TLS 1.3 Integration

QUIC encrypts the transport layer itself. There is no unencrypted QUIC. QUIC integrates TLS 1.3 handshake into its own handshake:

TCP/TLS 1.3:          QUIC/TLS 1.3:
[TCP SYN]             [Initial Packet (Client Hello)]
[TCP SYN-ACK]         [Initial Packet (Server Hello)]
[TCP ACK]             [Handshake Packet (Finished)]
[TLS ClientHello]     
[TLS ServerHello]     ← QUIC fuses these into fewer round-trips
[TLS Finished]        
[First request]       [First request]

Result: QUIC 1-RTT vs. TCP+TLS 2-RTT for new connections.


ECDSA vs. RSA Certificates

The lab generates a self-signed ECDSA P-256 certificate (not RSA). ECDSA offers smaller key sizes for equivalent security:

AlgorithmKey size (128-bit security)Signature sizeHandshake CPU
RSA3072 bits384 bytes~3ms
ECDSA P-256256 bits64 bytes~0.3ms

For a CDN terminating millions of TLS connections per second, ECDSA is significantly more efficient. Cloudflare uses ECDSA certificates by default.

// Generate ECDSA P-256 key
privKey, _ := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)

// Self-signed certificate
template := &x509.Certificate{
    SerialNumber: big.NewInt(1),
    Subject:      pkix.Name{CommonName: "localhost"},
    NotBefore:    time.Now(),
    NotAfter:     time.Now().Add(365 * 24 * time.Hour),
    DNSNames:     []string{"localhost"},
}
certDER, _ := x509.CreateCertificate(rand.Reader, template, template, &privKey.PublicKey, privKey)

quic-go Implementation

The lab uses github.com/quic-go/quic-go:

import (
    "github.com/quic-go/quic-go/http3"
    "net/http"
)

// HTTP/3 server runs on UDP
server := &http3.Server{
    Addr:    ":443",
    Handler: mux,
    TLSConfig: &tls.Config{
        Certificates: []tls.Certificate{cert},
        NextProtos:   []string{"h3"},  // ALPN for HTTP/3
    },
}

// Also run HTTP/1.1 + HTTP/2 on TCP (for clients that don't support H3)
go server.ListenAndServeTLS(certFile, keyFile) // UDP 443

// Alt-Svc header tells clients "this server speaks H3 on port 443"
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Alt-Svc", `h3=":443"; ma=86400`)
    // ... serve content
})

Alt-Svc: Protocol Upgrade Negotiation

Browsers discover H3 support via the Alt-Svc response header:

Alt-Svc: h3=":443"; ma=86400
  • h3=":443" — server speaks HTTP/3 on port 443
  • ma=86400 — this advertisement is valid for 86400 seconds (24h)

On first request, the browser uses TCP (H1/H2). On subsequent requests, it connects via QUIC instead. The browser caches Alt-Svc per origin.


H1 vs. H2 vs. H3 Performance

Relative performance depends on network conditions:

Protocol0% loss1% loss5% loss
HTTP/1.1 (6 connections)baseline-30%-60%
HTTP/2 (1 connection)+15%-50%-70%
HTTP/3 (QUIC)+10%-5%-20%

H3 shines on lossy/high-latency networks. On a clean datacenter network, H2 and H3 are comparable. This is why CDN edge nodes gain more from H3 than CDN-to-origin connections (origin is typically on a reliable path).


Firewall Considerations

QUIC runs on UDP. Many corporate firewalls block all non-DNS UDP traffic. When QUIC is blocked, clients fall back to TCP:

Client: send QUIC Initial packet (UDP 443)
Firewall: drops UDP 443
Client: timeout after ~150ms
Client: fall back to TCP + TLS 1.3

This is called “QUIC Happy Eyeballs”: parallel TCP and QUIC attempts, use whichever succeeds first. Chrome/Firefox implement this.

Browser QUIC adoption statistics (2024): ~28% of all web requests (dominated by Google services which pioneered QUIC via gQUIC).


Try It

make lab-18

# HTTP/1.1 request
curl -k http://localhost:8080/ -v

# HTTP/3 request (skip certificate validation for self-signed cert)
curl -k --http3 https://localhost:8443/ -v

# Benchmark: compare H1 vs H3 latency
time curl -k http://localhost:8080/large-file -o /dev/null
time curl -k --http3 https://localhost:8443/large-file -o /dev/null

# Verify Alt-Svc header is present
curl -k https://localhost:8443/ -I | grep Alt-Svc
# Should show: Alt-Svc: h3=":8443"; ma=86400

# Check what protocol was negotiated (curl verbose shows it)
curl -k --http3 https://localhost:8443/ -v 2>&1 | grep "Using HTTP"

Lab 19 · HLS Streaming & Segment Caching

Run it: make lab-19
Source: labs/lab-19-hls-streaming/main.go


The Problem

Video delivery is the dominant use case for CDN infrastructure — in 2024, video represents ~65% of all internet traffic. Unlike web pages (one-shot request-response), video streaming is:

  • High-bandwidth: a 4K stream is 25 Mbps; 1 million concurrent viewers require 25 Tbps of aggregate bandwidth
  • Time-sensitive: a 2-second buffer stall causes viewer abandonment rates to jump 20%
  • Long-duration: sessions last 30–120 minutes; cache TTLs matter differently for live vs. VOD

The CDN must cache aggressively to serve millions of concurrent viewers from edge rather than hammering the origin’s encoder/packager.


HLS: HTTP Live Streaming

HLS (RFC 8216) is the dominant streaming protocol for CDNs. It works by slicing video into short segments and serving them over plain HTTP:

Client                      CDN                     Origin Encoder
  │                          │                          │
  │── GET master.m3u8 ──────>│── (cache miss) ─────────>│
  │<──── master playlist ────│<──── master playlist ─────│
  │                          │ (cache TTL: 60s)          │
  │── GET 720p/playlist.m3u8>│── (cache miss) ─────────>│
  │<──── variant playlist ───│<──── variant playlist ────│
  │                          │ (cache TTL: 5s for live)  │
  │── GET seg001.ts ────────>│── (cache miss) ─────────>│
  │<──── segment ────────────│<──── segment ─────────────│
  │                          │ (cache TTL: 24h immutable)│
  │── GET seg002.ts ────────>│── (cache HIT) ────────────│  ← no origin hit

Three Types of Content, Three TTLs

HLS has three distinct content types with fundamentally different caching characteristics:


1. Media Segments (.ts, .fmp4) — TTL: 24 hours, immutable

Segments are content-addressed: once seg001.ts is generated and named, it never changes. The name uniquely identifies the content.

// Immutable segment — cache forever
w.Header().Set("Cache-Control", "public, max-age=86400, immutable")
w.Header().Set("ETag", `"seg001-v1"`)

This is identical to the approach used for hashed static assets (main.abc123.js). CDN hit ratios for segment requests should be ~99% once the initial viewers warm the cache.

Thundering herd implication: When a new segment is published, the first viewer to request it causes a cache miss to origin. All subsequent viewers hit the cache. For popular streams (100k+ viewers), the initial miss is a single request to origin. This is excellent.


2. Variant Playlist (.m3u8 per quality level) — TTL: 5 seconds (live), longer for VOD

The variant playlist (e.g., 720p/playlist.m3u8) lists the available segments:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:42

#EXTINF:6.006,
seg042.ts
#EXTINF:6.006,
seg043.ts
#EXTINF:6.006,
seg044.ts

For live streams, the playlist changes every segment duration (typically 2–6 seconds). It must not be cached too long or viewers fall behind the live edge.

// Short TTL for live variant playlist
w.Header().Set("Cache-Control", "public, max-age=5")
w.Header().Set("ETag", etag)  // still ETag for conditional requests

The thundering herd problem here: Every viewer polls the variant playlist every ~5 seconds. With 100k viewers, that’s 20k requests/second to the CDN for a single stream’s variant playlist — all simultaneously (viewers synchronize on segment boundaries).

Singleflight at the CDN level is essential here. The lab’s populateCache function uses singleflight.Group to collapse concurrent playlist requests.


3. Master Playlist (.m3u8 top-level) — TTL: 60 seconds

The master playlist lists the variant streams:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8

This changes rarely (new quality levels, DRM changes). A 60-second TTL allows clients to adapt to changes within a minute.

// Medium TTL for master playlist
w.Header().Set("Cache-Control", "public, max-age=60")

Segment Prefetch

After parsing a variant playlist, the CDN can proactively fetch the next segments from origin before any client requests them:

func (c *Cache) prefetchSegments(playlistURL string, playlist []byte) {
    urls := parseSegmentURLs(playlist)  // extract seg URLs from M3U8
    for _, url := range urls {
        if !c.Has(url) {
            go c.warmSegment(url)  // fetch in background
        }
    }
}

This converts cache misses on segment requests to cache hits:

Without prefetch:
  Viewer arrives → GET seg042.ts (miss) → wait for origin → play
  Next viewer → GET seg042.ts (hit) → instant play

With prefetch:
  New playlist published → CDN prefetches seg042.ts
  Viewer arrives → GET seg042.ts (hit) → instant play
  All viewers get cache hits

Prefetch is standard on CDNs like Cloudflare Stream and Fastly.


LL-HLS: Low-Latency HLS

Standard HLS has a live latency of 3–5 segments (~15–30 seconds). This is acceptable for broadcast TV but too high for sports, gaming streams, or live auctions.

LL-HLS (Low-Latency HLS, RFC 8216 Appendix) reduces latency to 2–5 seconds:

  1. Partial Segments: segments are delivered as they’re being encoded in partial 200ms chunks
  2. Playlist Delta Updates: only changed lines of the playlist are sent
  3. Blocking Playlist Request: client sends _HLS_msn=44 parameter; CDN holds the request until segment 44 is available (HTTP long poll)
Client: GET /playlist.m3u8?_HLS_msn=44&_HLS_part=0
CDN:    [holds request until segment 44 part 0 is available]
CDN:    → 200 OK with updated playlist  ← instant delivery at segment publish

LL-HLS requires CDN support. As of 2024, Cloudflare, Fastly, and AWS CloudFront all support LL-HLS.


CMAF: Common Media Application Format

Traditional HLS uses MPEG-2 TS (.ts) container. MPEG-DASH uses fMP4. These are incompatible, requiring separate encoder pipelines.

CMAF (ISO 23000-19) standardizes on fMP4 as the container for both HLS and DASH:

CMAF Encoder:
  Input → fMP4 chunks → HLS playlist (.m3u8 + .cmfv/.cmfa)
                      → DASH manifest (.mpd + .cmfv/.cmfa)

One encode, two protocol manifests. Netflix, Apple, and major CDNs use CMAF. The .ts format is legacy at this point; new deployments should use fMP4/CMAF.


VOD vs. Live Caching Strategy

AspectVODLive
Segment TTLForever (immutable)Forever (immutable — same!)
Variant playlist TTLMinutes to hours2–10 seconds
Master playlist TTLHours30–60 seconds
Cache fillCan prefetch everythingMust chase live edge
Thundering herdOnly at launchEvery 5 seconds, always
Cache-Control headerimmutableShort max-age

Try It

make lab-19

# Fetch master playlist
curl http://localhost:8080/stream/master.m3u8 -v

# Fetch 720p variant playlist
curl http://localhost:8080/stream/720p/playlist.m3u8 -v
# Note Cache-Control: max-age=5

# Fetch a segment (first one is a cache miss, note timing)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Fetch the same segment again (cache hit, much faster)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Hit the cache stats endpoint
curl http://localhost:8080/metrics/cache -s | python3 -m json.tool

# Simulate thundering herd on playlist
for i in $(seq 1 20); do
  curl -s http://localhost:8080/stream/720p/playlist.m3u8 -o /dev/null &
done
wait
# Check singleflight collapsed these into one origin request
curl http://localhost:8080/metrics/cache

Lab 20 · Observability: Metrics, Logging & SLOs

Run it: make lab-20
Source: labs/lab-20-observability/main.go


The Problem

A CDN you cannot observe is a CDN you cannot operate. Without metrics:

  • You don’t know your cache hit ratio is degrading
  • You don’t know latency spiked at 3 AM while you slept
  • You can’t tell if a deploy improved or degraded performance
  • You can’t define SLAs because you can’t measure SLOs

Production CDN observability has three pillars:

  1. Metrics: numeric time-series data (Prometheus)
  2. Structured logs: machine-parseable event records (slog)
  3. Traces: distributed request tracking (OpenTelemetry — not in this lab)

Prometheus: The Metrics System

Prometheus uses a pull model: the metrics server scrapes your application’s /metrics endpoint at regular intervals (typically 15–60s). Your application doesn’t push; it exposes a snapshot of current state.

Metric Types

Counter — monotonically increasing. Never decreases.

var requestsTotal = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "cdn_requests_total",
        Help: "Total number of requests served",
    },
    []string{"method", "status", "cache"},
)

// Increment on each request
requestsTotal.WithLabelValues("GET", "200", "hit").Inc()

Use counters for: request count, bytes transferred, error count, cache hits.

Gauge — can go up or down. Represents current state.

var cacheSize = prometheus.NewGauge(prometheus.GaugeOpts{
    Name: "cdn_cache_size_bytes",
    Help: "Current cache size in bytes",
})

// Set on cache eviction/addition
cacheSize.Set(float64(currentSize))

Use gauges for: active connections, cache size, queue depth, goroutine count.

Histogram — samples observations into buckets. Calculates percentiles.

var requestDuration = prometheus.NewHistogramVec(
    prometheus.HistogramOpts{
        Name:    "cdn_request_duration_seconds",
        Help:    "Request duration distribution",
        Buckets: []float64{.001, .005, .01, .025, .05, .1, .25, .5, 1, 2.5},
    },
    []string{"cache"},  // "hit" or "miss"
)

// Record each request's duration
start := time.Now()
// ... serve request ...
requestDuration.WithLabelValues(cacheStatus).Observe(time.Since(start).Seconds())

Use histograms for: request latency, response size, queue wait time.


The Cardinality Trap

Cardinality = the number of unique combinations of label values. High cardinality is Prometheus’s kryptonite.

// WRONG — user_id can have millions of values!
requestsTotal.WithLabelValues("GET", "200", userId).Inc()
// → Millions of time series → Prometheus OOM → pager at 3 AM

Safe labels:

  • HTTP method: 5 values (GET, POST, PUT, DELETE, HEAD)
  • HTTP status code category: 5 values (1xx–5xx) or discrete codes (~30 values)
  • Cache status: 3 values (hit, miss, bypass)
  • Region: 10–20 values (US, EU, APAC, …)
  • Host: only if you have a bounded number of hosts

Never use as labels:

  • User IDs, session IDs, account IDs
  • Full URL paths with IDs embedded (/user/123/profile)
  • IP addresses
  • Trace IDs, request IDs (use logs for per-request data)

Rule of thumb: any label that can have more than ~1000 distinct values in production will cause cardinality explosion.


CDN Metrics Catalog

The lab implements these metrics:

// === Request counters ===
cdn_requests_total{method, status, cache}    // "cache" ∈ {hit, miss, bypass}
cdn_bytes_served_total{cache}                // bytes, same labels

// === Latency ===
cdn_request_duration_seconds{cache}          // histogram, per cache status

// === Cache state ===
cdn_cache_entries                            // gauge: count of items in cache
cdn_cache_size_bytes                         // gauge: bytes used

// === Origin ===
cdn_origin_requests_total{status}            // requests forwarded to origin
cdn_origin_duration_seconds                  // histogram: origin TTFB

// === Compression ===
cdn_compression_ratio                        // histogram: compressed/uncompressed

Structured Access Logging with slog

Go 1.21 introduced log/slog, a structured logging package. Every request should produce a structured JSON log line:

logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))

// Per-request log (inside middleware)
logger.Info("request",
    "method",    r.Method,
    "path",      r.URL.Path,
    "status",    status,
    "bytes",     bytesWritten,
    "duration",  time.Since(start).Milliseconds(),
    "cache",     cacheStatus,
    "ip",        r.RemoteAddr,
    "ua",        r.Header.Get("User-Agent"),
    "referer",   r.Header.Get("Referer"),
)

Output:

{
  "time": "2025-01-15T14:23:01Z",
  "level": "INFO",
  "msg": "request",
  "method": "GET",
  "path": "/image/hero.jpg",
  "status": 200,
  "bytes": 102400,
  "duration": 3,
  "cache": "hit",
  "ip": "1.2.3.4:54321",
  "ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
}

Structured logs enable direct processing in log aggregators (Loki, Splunk, Elasticsearch) without parsing regex patterns.


Key PromQL Recipes

Cache Hit Ratio

# Instant hit ratio (last 5 minutes)
rate(cdn_requests_total{cache="hit"}[5m])
/
rate(cdn_requests_total[5m])

Target: > 0.90 (90% hit ratio). Below 0.80 indicates a caching problem.

Byte Hit Ratio

# Bytes served from cache vs. total bytes served
rate(cdn_bytes_served_total{cache="hit"}[5m])
/
rate(cdn_bytes_served_total[5m])

Byte hit ratio is more meaningful than request hit ratio for billing purposes (CDN vendors charge for bytes to/from origin).

p99 Latency

# 99th percentile request latency
histogram_quantile(0.99,
  sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)
)

p99 Latency by Cache Status

# Compare hit vs. miss latency
histogram_quantile(0.99,
  sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le, cache)
)

Expect cache hits to be 5–100x faster than misses.

Error Rate (5xx)

# Percentage of 5xx responses
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])

Origin Request Rate

# Origin requests per second (should be low relative to total)
rate(cdn_origin_requests_total[5m])

Requests Per Second

sum(rate(cdn_requests_total[1m]))

SLOs and Error Budgets

An SLO (Service Level Objective) defines the target reliability:

SLO: 99.9% of requests return a successful response (2xx/3xx)
     within 500ms at p99, measured over 30 days

An error budget is the allowed amount of failure:

30-day error budget = 30 * 24 * 60 * 60 * (1 - 0.999) = 2592 seconds = 43.2 minutes

If your error budget is consumed, you stop feature deployments and focus on reliability until the budget refills.

SLO Burn Rate

The burn rate measures how fast you’re consuming the error budget:

# 1-hour burn rate (how fast are we consuming monthly budget?)
(
  sum(rate(cdn_requests_total{status=~"5.."}[1h]))
  /
  sum(rate(cdn_requests_total[1h]))
)
/ (1 - 0.999)  # error budget fraction

A burn rate of 1.0 = consuming budget at exactly the sustainable rate. Burn rate > 14.4 = exhausting the monthly budget in 2 hours → page immediately. Google SRE Workbook recommends multi-window alerting:

  • Fast burn (1h + 5m windows): alert for rapid consumption
  • Slow burn (3d + 6h windows): alert for gradual degradation

Grafana Dashboard

The lab exposes:

  • /metrics — Prometheus metrics endpoint
  • /metrics/cache — JSON cache diagnostics

Point Grafana at Prometheus and import a CDN dashboard. The docker-compose.yml in Lab 21 wires up the full stack (Prometheus + Grafana).


Try It

make lab-20

# Send some traffic to generate metrics
for i in $(seq 1 100); do
  curl -s http://localhost:8080/item/$((RANDOM % 20)) -o /dev/null
done

# View Prometheus metrics
curl -s http://localhost:8080/metrics | grep cdn_

# View cache diagnostics
curl -s http://localhost:8080/metrics/cache | python3 -m json.tool

# Compute hit ratio manually from raw counters
HITS=$(curl -s http://localhost:8080/metrics | grep 'cdn_requests_total{.*cache="hit"' | awk '{print $2}')
TOTAL=$(curl -s http://localhost:8080/metrics | grep 'cdn_requests_total' | grep -v '^#' | awk '{sum+=$2} END{print sum}')
echo "Hit ratio: $(echo "scale=3; $HITS/$TOTAL" | bc)"

Lab 21 · The Full System

Run it: make lab-21
Source: labs/lab-21-full-system/main.go
Compose: labs/lab-21-full-system/docker-compose.yml


The Architecture

This final lab wires together everything from Labs 1–20 into a production-representative CDN system. It is a microcosm of how real CDNs like Cloudflare, Fastly, and AWS CloudFront are structured.

                 ┌─────────────────────────────────────────────────┐
                 │                  CDN System                      │
                 │                                                   │
  Internet  ──>  │  Edge NYC (:8080)  ──\                           │
                 │  (singleflight,         \                         │
                 │   signed URL verify,     → Shield (:8082)  ──>  Origin (:9001)
                 │   30s TTL, metrics)     /  (singleflight,
                 │                        /   300s TTL,
                 │  Edge LHR (:8081)  ──/    metrics)
                 │  (same config)            
                 │                                                   │
                 │  Prometheus (:9090)  Grafana (:3000)             │
                 └─────────────────────────────────────────────────┘

Component Responsibilities

ComponentPortRole
Origin:9001Source of truth. Serves all content. Simulates 50ms processing delay.
Shield:8082Aggregation layer. One connection to origin for many edge requests. 300s TTL.
Edge NYC:8080User-facing edge in New York. Validates signed URLs. 30s TTL.
Edge LHR:8081User-facing edge in London. Same config as NYC. 30s TTL.
Prometheus:9090Scrapes metrics from all nodes.
Grafana:3000Dashboards over Prometheus.

Multi-Tier TTL Design

The TTL cascade is intentional and critical:

User ── Edge (30s TTL) ── Shield (300s TTL) ── Origin

Why Edge TTL < Shield TTL?

The edge serves users directly. Fresh content reaches users within 30 seconds of origin publication. But the edge collapses requests from many users into one request to the shield.

The shield’s 300s TTL means: for any given piece of content, the shield makes at most one request to origin per 5 minutes. A popular item might be requested by 10,000 users/minute across both edges — the shield ensures origin sees only 1 request every 5 minutes for that item.

Without shield:
  10,000 users/min × 30s TTL edge = 333 cache misses/min to origin
  (every edge miss → origin request)

With shield (300s TTL):
  10,000 users/min × 30s TTL edge = 333 edge misses/min
  → All go to shield
  → Shield hit ratio ~98% (only 1 miss per 5 min)
  → ~7 requests/min reach origin

This is a 50× reduction in origin load.


Singleflight at Two Layers

Both edge and shield run singleflight.Group:

type CachingProxy struct {
    cache  *Cache
    origin string
    group  singleflight.Group  // deduplicates concurrent misses
}

func (p *CachingProxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    key := cacheKey(r)
    
    if item, ok := p.cache.Get(key); ok {
        serveFromCache(w, item)
        return
    }
    
    // Multiple concurrent requests for the same key?
    // singleflight collapses them into ONE upstream request
    result, _, _ := p.group.Do(key, func() (interface{}, error) {
        return p.fetchFromUpstream(r)
    })
    
    item := result.(*CacheItem)
    p.cache.Set(key, item)
    serveFromCache(w, item)
}

The thundering herd cascade: without singleflight at both layers, a popular item expiring simultaneously at 1,000 edge nodes would cause 1,000 concurrent requests to the shield, which would cause 1,000 concurrent requests to origin. Singleflight at edge reduces 1,000 → 1 per edge node. Singleflight at shield reduces 2 edge misses → 1 shield request to origin.


Signed URL Verification

The edge validates HMAC-signed URLs before serving any content:

func (e *Edge) verifySignedURL(r *http.Request) bool {
    sig := r.URL.Query().Get("sig")
    if sig == "" { return false }  // or true for public content
    
    expires, _ := strconv.ParseInt(r.URL.Query().Get("expires"), 10, 64)
    if time.Now().Unix() > expires {
        return false  // expired
    }
    
    keyver := r.URL.Query().Get("keyver")
    key, ok := e.signingKeys[keyver]
    if !ok { return false }
    
    canonical := fmt.Sprintf("GET\n%s\n%d\n", r.URL.Path, expires)
    expected := computeHMAC(key, canonical)
    
    return hmac.Equal([]byte(sig), []byte(expected))
}

The shield and origin do not re-verify — they trust the edge. This is the standard trust boundary design: validation happens at the first authorized boundary, not repeatedly at every tier.


Docker Compose

# labs/lab-21-full-system/docker-compose.yml
services:
  origin:
    build: .
    command: ["./cdn-lab21", "-role=origin", "-addr=:9001"]
    ports: ["9001:9001"]

  shield:
    build: .
    command: ["./cdn-lab21", "-role=shield", "-addr=:8082", "-upstream=http://origin:9001"]
    ports: ["8082:8082"]
    depends_on: [origin]

  edge-nyc:
    build: .
    command: ["./cdn-lab21", "-role=edge", "-addr=:8080", "-upstream=http://shield:8082", "-pop=NYC"]
    ports: ["8080:8080"]
    depends_on: [shield]

  edge-lhr:
    build: .
    command: ["./cdn-lab21", "-role=edge", "-addr=:8081", "-upstream=http://shield:8082", "-pop=LHR"]
    ports: ["8081:8081"]
    depends_on: [shield]

  prometheus:
    image: prom/prometheus:latest
    volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]
    ports: ["9090:9090"]

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    depends_on: [prometheus]

Prometheus Configuration

# labs/lab-21-full-system/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'cdn-edge'
    static_configs:
      - targets: ['edge-nyc:8080', 'edge-lhr:8081']

  - job_name: 'cdn-shield'
    static_configs:
      - targets: ['shield:8082']

  - job_name: 'cdn-origin'
    static_configs:
      - targets: ['origin:9001']

Observing the System Under Load

With the system running, generate load and observe the cascade:

# Generate 1000 requests across 50 unique URLs
for i in $(seq 1 1000); do
  curl -s "http://localhost:8080/item/$((RANDOM % 50))" -o /dev/null
done

# Check metrics at each tier
# Edge NYC hit ratio
curl -s http://localhost:8080/metrics | grep cdn_requests_total

# Shield hit ratio  
curl -s http://localhost:8082/metrics | grep cdn_requests_total

# Origin request count (should be tiny compared to edge total)
curl -s http://localhost:9001/metrics | grep cdn_requests_total

You should see:

  • Edge hit ratio: ~80–90% (after warmup)
  • Shield hit ratio: ~95–99%
  • Origin requests: ~1–5% of edge total

Failure Modes & Resilience

Origin failure

Origin down → Shield gets 502/503 from origin
           → Shield returns stale-if-error (from Cache-Control)
           → Edge returns stale content to users

This is the “stale-if-error” pattern from Lab 7, applied system-wide. Users see slightly stale content rather than errors.

Shield failure

Shield down → Edge cannot reach upstream
           → Edge serves stale (if available) or 503

In production, the shield tier has multiple nodes behind a load balancer. A single shield failure routes to another shield node.

Edge failure

Edge-NYC down → Geo routing redirects NYC users to Edge-LHR
             → Higher latency but service continues

This is the health-check failover from Lab 15. Each edge registers with the geo-routing layer and is removed from rotation when health checks fail.


Path to Production

To harden this system for real traffic:

  1. Replace in-memory cache with Redis: enables shared cache state across edge instances and survives restarts
  2. Add TLS termination: Let’s Encrypt or ACME protocol for automatic certificate provisioning
  3. Add rate limiting: token bucket per IP/user with Redis-backed counters
  4. Add WAF rules: block common attack patterns (SQLi, XSS, path traversal)
  5. Add CDN purge API: authenticated endpoint to purge cache keys by tag
  6. Add distributed tracing: OpenTelemetry spans across edge → shield → origin
  7. Add chaos testing: kill origin/shield randomly to validate resilience

Try It

# Start the full system with Docker Compose
cd labs/lab-21-full-system
docker compose up --build

# In another terminal: generate signed URL and fetch content
TOKEN=$(curl -s "http://localhost:8080/sign?path=/article/1&ttl=300")
curl -s "$TOKEN" -v

# View Prometheus metrics
open http://localhost:9090

# View Grafana (default credentials: admin/admin)
open http://localhost:3000

# Generate load test
for i in $(seq 1 5000); do
  curl -s "http://localhost:8080/item/$((RANDOM % 100))" -o /dev/null &
done
wait

# Observe the request waterfall through the tiers
curl -s http://localhost:8080/metrics | grep cdn_requests_total | head -5
curl -s http://localhost:8082/metrics | grep cdn_requests_total | head -5  
curl -s http://localhost:9001/metrics | grep cdn_requests_total | head -5

Appendix A · Production Deployment Guide

This appendix covers deploying a CDN to public internet traffic. It is written for a principal engineer who has worked through the labs and is now hardening the system for production.


CDN Vendor Decision Matrix

For most organizations, building CDN infrastructure from scratch is not the right choice. Use a managed CDN vendor unless you have >1 Tbps of traffic and specific requirements that no vendor meets.

VendorBest forDifferentiatorsPricing model
CloudflareSecurity-first CDN, DDoS protection, global networkLargest anycast network, Workers edge compute, free tier, Zero TrustPer-seat or flat-rate (no bandwidth cost on higher tiers)
FastlyDevelopers, real-time purging, full VCL controlInstant purge API, Varnish/VCL programmability, Compute@Edge WASMPer-GB + per-request
AWS CloudFrontAWS-native applications, Lambda@Edge, tight IAM integrationDeep AWS integration, Lambda@Edge, S3 origins, OACPer-GB + per-request + per-origin HTTPS
AkamaiEnterprise, compliance, media deliveryLargest network footprint, MPLS backbone, strict SLAsEnterprise contracts
BunnyCDNCost-sensitive mid-tierVery low per-GB pricing, simple APIPer-GB only

When to build your own CDN

Build only if all of these are true:

  • Traffic > 1 Tbps sustained
  • Specific protocol requirements (custom routing, custom protocols)
  • Cost at scale exceeds vendor pricing significantly
  • In-house expertise to operate CDN infrastructure 24/7

Companies that built their own: Netflix (Open Connect), Google, Meta, ByteDance/TikTok. Everyone else uses vendors.


Origin Protection

Your origin servers must not be directly reachable from the internet when you’re using a CDN. If they are, attackers can bypass the CDN:

Attacker discovers origin IP → sends traffic directly → DDoS bypasses CDN

Origin Protection Methods

1. Cloudflare-to-Origin mTLS (Authenticated Origin Pulls)

# nginx: only accept connections presenting Cloudflare's client certificate
ssl_verify_client on;
ssl_client_certificate /etc/nginx/cloudflare-origin-pull-ca.pem;

Cloudflare presents its certificate when connecting to your origin. Your origin rejects any connection without it.

2. IP allowlist at origin

Only allow traffic from CDN IP ranges:

# Cloudflare IPs: https://www.cloudflare.com/ips/
iptables -A INPUT -p tcp --dport 443 -s 103.21.244.0/22 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j DROP

Cloudflare, Fastly, and AWS publish their IP ranges. Use the vendor’s API to pull the current list (IPs change).

3. Shared secret header

CDN adds a secret header; origin validates it:

CDN → Origin: X-CDN-Secret: <32-byte-random-secret>
Origin: reject any request missing this header

This is simpler but less secure than mTLS (header could leak in logs).

4. Private networking

Put origin on a private VPC; CDN connects via private peering:

  • Cloudflare Magic WAN
  • Fastly Secure Edge Connector
  • AWS CloudFront + VPC Origins

WAF: Web Application Firewall

A WAF sits in front of your origin (at the CDN edge) and blocks malicious traffic before it reaches your application.

Common WAF Rule Sets

OWASP Core Rule Set (CRS): covers OWASP Top 10 attacks

  • SQLi: GET /api/user?id=1 OR 1=1
  • XSS: GET /search?q=<script>alert(1)</script>
  • Path traversal: GET /files/../../../etc/passwd
  • Remote file inclusion, SSRF

Bot management rules:

  • Block known malicious user agents
  • Challenge suspicious traffic (CAPTCHA)
  • Rate-limit scraper patterns

Custom rules for your application:

  • Block geographic regions not served
  • Rate-limit unauthenticated API endpoints
  • Block requests without required headers

Cloudflare WAF Setup

# Via Cloudflare API: enable OWASP Managed Rules
curl -X PUT "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/phases/http_request_firewall_managed/entrypoint" \
  -H "Authorization: Bearer $CF_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "rules": [{
      "action": "managed_ruleset",
      "expression": "true",
      "action_parameters": {
        "id": "efb7b8c949ac4650a09736fc376e9aee"
      }
    }]
  }'

WAF False Positive Management

WAFs generate false positives. Common causes:

  • API requests containing JSON with SQL-like syntax
  • Developer tools testing edge cases
  • Search queries containing special characters

Use audit mode first (log but don’t block), then move to block mode after reviewing false positives. Most vendors support per-rule enable/disable.


TLS Configuration

Certificate Provisioning

Cloudflare: Handles certificate provisioning automatically via Universal SSL or Advanced Certificate Manager. Zero configuration required.

Self-managed: Use ACME (Let’s Encrypt) with automatic renewal:

# certbot with nginx
certbot --nginx -d cdn.example.com --agree-tos --email ops@example.com

# Auto-renewal via cron
0 0,12 * * * certbot renew --quiet

TLS Minimum Version

Disable TLS 1.0 and 1.1 (deprecated; known attacks):

ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers off;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:...

TLS 1.3 should be preferred everywhere. TLS 1.2 minimum is the current industry standard (PCI DSS requires at least 1.2).

HSTS

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

Once HSTS is deployed and a client sees this header, the browser will refuse to make any HTTP request to this origin for 31,536,000 seconds (1 year). Submit to the HSTS preload list for even stronger enforcement.


Cost Optimization

CDN costs are typically:

  • Bandwidth to end users: $0.01–0.08/GB depending on vendor and region
  • Requests: $0.001–0.01 per 10,000 requests
  • Origin egress: typically covered by CDN bandwidth; watch for AWS data transfer charges separately

Optimization Strategies

1. Maximize cache hit ratio

Every CDN hit replaces a CDN miss. CDN bandwidth is cheaper than:

  • Origin bandwidth (higher utilization costs)
  • Origin compute (request processing)
  • Origin database queries

Target > 90% byte hit ratio. Each 1% improvement in hit ratio on 100 TB/day = 1 TB/day saved from origin, potentially saving thousands per month.

2. Compression

Compressing assets before caching multiplies your CDN capacity:

10 TB/day uncompressed HTML/CSS/JS
→ Brotli compression ~70% ratio
→ 3 TB/day stored and served
→ 70% bandwidth cost reduction

3. Choose regions wisely

Premium regions (Australia, South America, India) cost 2–5× more per GB than US/EU. If your user base is primarily US/EU, serving Asia from US CDN PoPs (higher latency but lower cost) may be acceptable for static assets.

4. Origin Shield

Origin shield (covered in Lab 13) reduces origin-bound requests. CDN vendors charge less for shield-to-origin bandwidth than edge-to-origin in some pricing models. More importantly, it reduces origin compute costs.


Deployment Checklist

Before going public with your CDN:

  • Origin is not directly reachable from internet (IP allowlist or mTLS)
  • WAF is enabled with OWASP CRS in audit mode → block mode after validation
  • TLS 1.2+ minimum; TLS 1.3 preferred
  • HSTS header set on all HTTPS responses
  • Cache-Control headers set correctly on all content types
  • Sensitive paths (admin, API tokens, auth callbacks) bypassed from CDN caching
  • Purge API configured and tested
  • Metrics and alerting configured (hit ratio, latency, error rate)
  • SLOs defined and error budget dashboards live
  • Load test with traffic 10× expected peak
  • Incident runbook written and shared with on-call team

Appendix B · HTTP Caching Headers Reference

A complete reference for every HTTP header relevant to CDN caching. Cross-referenced against RFC 7234 (HTTP/1.1 Caching), RFC 9110 (HTTP Semantics), RFC 9111 (HTTP Caching), and RFC 5861 (stale extensions).


Cache-Control

The primary mechanism for controlling caching behavior. Both request and response headers; semantics differ.

Response Cache-Control Directives

max-age=<seconds>

Content is fresh for this many seconds after the Date response header.

Cache-Control: max-age=3600

After 3600 seconds, the cached response is stale. CDN must revalidate or refetch from origin.

Gotcha: max-age is relative to Date, not to when the CDN received the response. If origin is slow and the response travels for 5 seconds, the CDN’s effective max-age is reduced by 5 seconds.

s-maxage=<seconds>

Overrides max-age for shared caches only (CDN, proxy). Does not affect browser cache. Ideal for: short browser TTL + long CDN TTL.

Cache-Control: max-age=60, s-maxage=3600

Browser caches the page for 1 minute; CDN caches it for 1 hour.

no-cache

Does not mean “don’t cache”. Means: store the response, but revalidate with origin before every use. Equivalent to max-age=0, must-revalidate.

Cache-Control: no-cache

The CDN will send a conditional request (If-None-Match or If-Modified-Since) for every cache hit. Origin returns 304 if unchanged (fast), or 200 + new body.

no-store

Do not cache at all. Not in memory, not on disk, not in shared caches.

Cache-Control: no-store

Use for: authentication tokens, session data, personalized responses.

private

Cache only in the browser (private cache). CDN/proxies must not cache.

Cache-Control: private, max-age=3600

Use for: user-specific pages (shopping cart, account dashboard).

public

Cache in all caches (browser + CDN), even for responses to requests with Authorization headers.

Cache-Control: public, max-age=86400

Without public, responses to authenticated requests are not cached by CDNs by default (even if max-age is set).

must-revalidate

Once stale, must revalidate before serving. Do not serve stale content even if the origin is unavailable.

Cache-Control: max-age=3600, must-revalidate

Contrast with stale-if-error which allows serving stale when origin fails.

proxy-revalidate

Same as must-revalidate but applies only to shared caches (CDN/proxy), not browsers.

immutable

The response body will not change during its freshness lifetime. Browser (and some CDNs) will not send conditional revalidation requests until after max-age expires.

Cache-Control: public, max-age=31536000, immutable

Use for: hashed assets (main.abc123.js), HLS segments, versioned files. Saves one conditional request per asset per page load.

stale-while-revalidate=<seconds>

After max-age expires, continue serving the stale response while revalidating in the background. See Lab 7.

Cache-Control: max-age=60, stale-while-revalidate=600

From 60s to 660s after Date: serve stale, trigger background revalidation. After 660s: must revalidate before serving.

stale-if-error=<seconds>

If origin returns 5xx (or is unreachable), serve the stale cached response for up to this many seconds beyond the normal expiry.

Cache-Control: max-age=3600, stale-if-error=86400

Serve stale for up to 24 hours during origin outage. Combine with stale-while-revalidate for both performance and resilience.

no-transform

Prohibit intermediate caches (CDN) from transforming the response body. Prevents CDN from applying compression, image optimization, or minification.


Request Cache-Control Directives

no-cache

Force the CDN to revalidate with origin before returning a cached response. Used by browsers when the user clicks “Refresh”.

no-store

Request that the response not be cached (hint; servers may ignore).

max-age=<seconds> (request)

Accept only responses not older than this many seconds (fresh or revalidated).

max-stale=<seconds> (request)

Accept stale responses up to this many seconds past their expiry.

min-fresh=<seconds> (request)

Accept only responses that will remain fresh for at least this many more seconds.

only-if-cached (request)

Return a cached response or 504 Gateway Timeout. Used by clients that want to avoid network requests.


Expires (Legacy)

The original HTTP/1.0 cache expiry mechanism. Specifies an absolute date:

Expires: Wed, 21 Oct 2025 07:28:00 GMT

Cache-Control: max-age overrides Expires when both are present. Expires is still useful for backwards compatibility with very old clients and proxies that don’t understand Cache-Control.

Gotcha: If Expires is in the past, the response is immediately stale. If the value is malformed or invalid, it’s treated as expired.


ETag

An opaque validator for a specific version of a resource:

ETag: "686897696a7c876b7e"
ETag: W/"686897696a7c876b7e"   (weak ETag)

Strong ETag ("...") means byte-for-byte identical content. Weak ETag (W/"...") means semantically equivalent content (may have different whitespace, compression, etc.).

Revalidation with ETag:

Client: GET /resource HTTP/1.1
        If-None-Match: "686897696a7c876b7e"

Origin: 304 Not Modified  (if ETag matches — no body)
        200 OK + new body + new ETag (if changed)

ETag best practices:

  • Use content hash (SHA-1, MD5, xxhash) for strong ETags
  • For database-backed content: use version_number or updated_at
  • Avoid using timestamps alone — they don’t reflect content changes and can cause spurious mismatches

Last-Modified / If-Modified-Since

The older revalidation mechanism (HTTP/1.0):

Response: Last-Modified: Tue, 15 Nov 2024 08:12:31 GMT

Request:  If-Modified-Since: Tue, 15 Nov 2024 08:12:31 GMT
Response: 304 Not Modified  (if not modified since that date)
          200 OK + new body  (if modified)

Use ETag when possible — it’s more precise. Last-Modified is only second-resolution; files updated multiple times per second may have the same Last-Modified but different content.


Vary

Tells caches that the response varies based on request headers:

Vary: Accept-Encoding
Vary: Accept-Language
Vary: Accept, Accept-Encoding

With Vary: Accept-Encoding, the CDN stores a separate cached response for each Accept-Encoding value:

GET /page.html  Accept-Encoding: gzip      → cache key: /page.html#gzip
GET /page.html  Accept-Encoding: br        → cache key: /page.html#brotli
GET /page.html  (no Accept-Encoding)       → cache key: /page.html#identity

Warning: Vary: User-Agent or Vary: Cookie creates cardinality explosion — a separate cache entry per unique User-Agent string (thousands). Avoid unless necessary. CDNs often ignore Vary: Cookie for this reason.


Surrogate-Control / Surrogate-Key (CDN-specific)

Not in RFCs; vendor-specific cache control for CDN-only directives:

Surrogate-Control: max-age=86400

This header is intended for the CDN only. The CDN strips it before forwarding to the browser, so the browser uses its own Cache-Control.

Surrogate-Key (Fastly) / Cache-Tag (Cloudflare):

Surrogate-Key: article-123 author-456 category-sports
Cache-Tag: article-123 author-456 category-sports

Associates the cached response with logical tags. Enables instant purge by tag rather than by URL. See Lab 9.


CDN-Cache-Control

A CDN-specific Cache-Control variant proposed as a standard (draft):

CDN-Cache-Control: max-age=600
Cache-Control: max-age=60

CDN respects CDN-Cache-Control (600s TTL) while browsers respect Cache-Control (60s TTL). Supported by Cloudflare, Fastly, and others. More explicit than s-maxage because it targets CDNs specifically rather than all shared caches.


Pragma: no-cache (Legacy)

HTTP/1.0 equivalent of Cache-Control: no-cache. Ignore in new code; handle for backwards compatibility:

Pragma: no-cache  →  treated as Cache-Control: no-cache by modern caches

Age

Set by the CDN/proxy to indicate how old a cached response is:

Age: 1234

Age: 1234 means this response was fetched from origin 1234 seconds ago. Remaining freshness = max-age - Age. If Age >= max-age, the response is stale even before it leaves the CDN.


Warning (Deprecated in RFC 9111)

Formerly used to indicate stale or revalidation state:

Warning: 110 - "Response is Stale"
Warning: 214 - "Transformation Applied"

RFC 9111 deprecated all Warning headers. Do not generate them; ignore if received.


Quick Reference Table

HeaderDirectionPurpose
Cache-Control: max-ageResponseCache TTL in seconds
Cache-Control: s-maxageResponseCDN-only TTL override
Cache-Control: no-cacheResponseRevalidate before use
Cache-Control: no-storeResponseNever cache
Cache-Control: privateResponseBrowser cache only
Cache-Control: publicResponseAll caches including CDN
Cache-Control: immutableResponseNever revalidate during freshness
Cache-Control: stale-while-revalidateResponseAsync background refresh
Cache-Control: stale-if-errorResponseServe stale on origin failure
ETagResponseVersion identifier
Last-ModifiedResponseLast change timestamp
VaryResponseDifferentiate cache by request headers
AgeResponseTime in cache (seconds)
ExpiresResponseAbsolute expiry date (legacy)
Surrogate-ControlResponseCDN-only TTL (stripped before browser)
Surrogate-Key / Cache-TagResponseLogical purge grouping
If-None-MatchRequestConditional request by ETag
If-Modified-SinceRequestConditional request by date
Cache-Control: no-cacheRequestForce CDN revalidation

Appendix C · PromQL Recipes for CDN Monitoring

A cookbook of production-ready PromQL queries for CDN observability. Each query assumes the metric names from Lab 20. Adapt label names to your actual instrumentation.


Hit Ratio Queries

Request-level hit ratio (5-minute window)

rate(cdn_requests_total{cache="hit"}[5m])
/
rate(cdn_requests_total[5m])

Byte hit ratio (more meaningful for billing)

rate(cdn_bytes_served_total{cache="hit"}[5m])
/
rate(cdn_bytes_served_total[5m])

Hit ratio by PoP

rate(cdn_requests_total{cache="hit"}[5m]) by (pop)
/
rate(cdn_requests_total[5m]) by (pop)

Hit ratio trend (1-hour average over past 24 hours)

avg_over_time(
  (
    rate(cdn_requests_total{cache="hit"}[1h])
    /
    rate(cdn_requests_total[1h])
  )[24h:1h]
)

Latency Queries

p50, p95, p99 request latency (across all requests)

histogram_quantile(0.50, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le))

p99 latency, split by cache status

histogram_quantile(0.99,
  sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le, cache)
)

This reveals the latency gap between cache hits and misses.

p99 origin latency (time spent fetching from origin)

histogram_quantile(0.99,
  sum(rate(cdn_origin_duration_seconds_bucket[5m])) by (le)
)

Latency heatmap (for Grafana heatmap panel)

sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)

Traffic Volume Queries

Requests per second

sum(rate(cdn_requests_total[1m]))

Requests per second by status code class

sum(rate(cdn_requests_total[1m])) by (status)

Bytes served per second

sum(rate(cdn_bytes_served_total[1m]))

Bandwidth in Mbps

sum(rate(cdn_bytes_served_total[1m])) * 8 / 1e6

Top 10 most-requested paths (requires path label — use sparingly)

topk(10, sum(rate(cdn_requests_total[5m])) by (path))

Warning: Only add path label if your URL space is bounded. Unbounded paths cause cardinality explosion.


Error Rate Queries

Overall error rate (5xx)

rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])

Error rate by status code

sum(rate(cdn_requests_total{status=~"5.."}[5m])) by (status)

Origin error rate (errors returned by origin)

rate(cdn_origin_requests_total{status=~"5.."}[5m])
/
rate(cdn_origin_requests_total[5m])

Error rate that would violate 99.9% SLO

# If this is > 0, you're burning error budget
rate(cdn_requests_total{status=~"5.."}[5m])
/
rate(cdn_requests_total[5m])
> 0.001

SLO & Error Budget Queries

Error budget consumption rate (ratio to 30-day budget)

# Burn rate > 1 means you'll exhaust the monthly budget before the month ends
(
  rate(cdn_requests_total{status=~"5.."}[1h])
  /
  rate(cdn_requests_total[1h])
)
/ (1 - 0.999)

Remaining error budget (fraction, 30-day window)

1 - (
  sum(increase(cdn_requests_total{status=~"5.."}[30d]))
  /
  sum(increase(cdn_requests_total[30d]))
  /
  0.001  # error budget = 1 - SLO = 1 - 0.999
)

Multi-window burn rate (Google SRE approach)

# Fast burn: 1-hour window, threshold ~14.4× for 2-hour exhaustion alert
(
  rate(cdn_requests_total{status=~"5.."}[1h])
  /
  rate(cdn_requests_total[1h])
)
/
(1 - 0.999)
> 14.4

Cache Efficiency Queries

Cache entry count

cdn_cache_entries

Cache size in MB

cdn_cache_size_bytes / 1e6

Cache miss rate (requests going to origin)

rate(cdn_origin_requests_total[5m])
/
rate(cdn_requests_total[5m])

Average compression ratio

histogram_quantile(0.50,
  sum(rate(cdn_compression_ratio_bucket[5m])) by (le)
)

Alerting Rules

These are example Prometheus alerting rules (for prometheus.yml):

groups:
  - name: cdn_alerts
    rules:

      # Hit ratio dropped below 80% — caching problem
      - alert: CDNHitRatioLow
        expr: |
          rate(cdn_requests_total{cache="hit"}[5m])
          /
          rate(cdn_requests_total[5m])
          < 0.80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "CDN hit ratio below 80% for 10 minutes"
          description: "Current hit ratio: {{ $value | humanizePercentage }}"

      # p99 latency above 1 second
      - alert: CDNHighLatency
        expr: |
          histogram_quantile(0.99,
            sum(rate(cdn_request_duration_seconds_bucket[5m])) by (le)
          ) > 1.0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CDN p99 latency above 1s"

      # Error rate above 1% (burning 99.9% SLO)
      - alert: CDNHighErrorRate
        expr: |
          rate(cdn_requests_total{status=~"5.."}[5m])
          /
          rate(cdn_requests_total[5m])
          > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "CDN error rate above 1%"
          description: "Current error rate: {{ $value | humanizePercentage }}"

      # Fast burn: exhausting monthly error budget in under 2 hours
      - alert: CDNFastBurn
        expr: |
          (
            rate(cdn_requests_total{status=~"5.."}[1h])
            /
            rate(cdn_requests_total[1h])
          )
          / (1 - 0.999) > 14.4
        for: 2m
        labels:
          severity: critical
          page: "true"
        annotations:
          summary: "CDN burning error budget 14.4× faster than sustainable rate"

Grafana Dashboard Layout

Recommended panel organization:

Row 1: Traffic Overview

  • Total RPS (stat)
  • Bandwidth Mbps (stat)
  • Error rate % (stat)

Row 2: Cache Performance

  • Hit ratio over time (graph)
  • Byte hit ratio over time (graph)
  • Cache size (graph)

Row 3: Latency

  • p50/p95/p99 latency (graph)
  • Latency heatmap (heatmap panel)
  • Origin latency p99 (graph)

Row 4: Errors

  • Error rate over time (graph)
  • Error rate by status code (graph)
  • SLO burn rate (graph with threshold annotation)

Row 5: Infrastructure

  • Cache entry count (graph)
  • Goroutine count (graph)
  • Memory usage (graph)

Appendix D · Mental Models & Decision Trees

A collection of decision frameworks and mental models for CDN engineering. These are the heuristics principal engineers use to reason about CDN behavior quickly, without needing to run simulations.


Should I Cache This?

Is the response identical for all users?
├── No  → Cache-Control: private (browser only) or no-store
│         Examples: shopping cart, account dashboard, personalized feed
│
└── Yes → Can it be revalidated cheaply (ETag / Last-Modified)?
          ├── Yes → Cache with short max-age + conditional request support
          │         Examples: API responses with version IDs, database records
          │
          └── No  → How often does the content change?
                    ├── Never       → max-age=31536000, immutable
                    │                 Examples: hashed assets, video segments
                    ├── Rarely      → max-age=86400 (1 day)
                    │                 Examples: fonts, PDFs, images
                    ├── Daily       → max-age=3600 (1 hour) + stale-while-revalidate
                    │                 Examples: product listings, blog articles
                    ├── Frequently  → max-age=60 + stale-while-revalidate=300
                    │                 Examples: inventory counts, prices
                    └── Real-time   → max-age=5 (live streams) or no-store
                                      Examples: live sports scores, financial ticks

What TTL Should I Use?

The TTL tradeoff is always: freshness vs. CDN efficiency.

Content update frequencyRecommended TTLPattern
Never (hashed assets)1 yearimmutable
Months (legal pages)1–7 daysLong TTL + surrogate keys for emergency purge
Days (blog posts)1 hour + SWR 24hHigh hit ratio + fresh within a day
Hours (product pages)5 min + SWR 1hSWR bridges the gap
Minutes (inventory)60s + SWR 300sAccept slight staleness
Seconds (live playlist)5sNo SWR (viewers need current segment list)
Never cache (auth, cart)no-storeKeep off CDN entirely

SWR = stale-while-revalidate. Serve stale content instantly while fetching fresh copy in the background.


Thundering Herd Decision Tree

Multiple concurrent requests for the same uncached resource?
│
└── Is this a shared cache (CDN, origin shield)?
    ├── Yes → Use singleflight.Group to collapse concurrent misses
    │         → Only ONE upstream request, all waiters receive the result
    │
    └── No  → Multiple browser tabs? Multiple microservices?
              └── Client-side: stagger requests with jitter
                  Service-side: use a distributed lock (Redis SETNX)

Stale TTL expiry causes burst at cache expiry time?
│
└── Use stale-while-revalidate:
    → Content remains "fresh" for 30s more (serve stale instantly)
    → Background fetch rehydrates the cache
    → Eliminates the expiry spike entirely
    
OR use XFetch (probabilistic early refresh):
    → Randomly start refreshing BEFORE expiry
    → No single expiry moment → no thundering herd

Cache Key Design Checklist

When adding a new query parameter or header to your application, ask:

  1. Does it change the response body?
    Yes → Include it in cache key
    No → Exclude it (reduces cache fragmentation)

  2. Is it unbounded (e.g., user ID, session token)?
    Yes → Never include in cache key (creates cache pollution)
    Strip it, or bypass CDN caching entirely

  3. Is it a tracking parameter (utm_source, fbclid, gclid)?
    Strip it from the cache key (never affects content)
    Use Cache-Key VCL / Cloudflare transform rules / Fastly custom VCL

  4. Does the response vary by Accept-Encoding?
    Yes → Add Vary: Accept-Encoding
    CDN creates separate cache entries for gzip, brotli, identity

  5. Does the response vary by Accept-Language?
    Yes → Vary: Accept-Language or use URL path per language


Cache Tier Selection

Traffic volume per origin?
├── < 100 RPS    → Single CDN layer is sufficient
│
├── 100–10k RPS  → Add CDN with origin shield
│                   Edge (30s TTL) → Shield (300s TTL) → Origin
│
└── > 10k RPS   → Multi-tier CDN + distributed origin shield
                  + in-process cache at origin for hot items
                  + singleflight at every tier

Memory cache sizing rule of thumb:

Working set size (bytes) = median_item_size × unique_items_per_hour × revalidation_window

If you have 10k unique products, each averaging 10 KB, refreshed every 5 minutes:

10,000 items × 10 KB = 100 MB for the full working set

If you can only cache 50 MB, you’ll have a ~50% miss ratio in steady state (assuming uniform access). Add an LRU layer with hot-item bias.


Compression Algorithm Selection

Content type?
├── Already compressed (JPEG, PNG, video, WASM) 
│   → Skip compression (will be larger, not smaller)
│
└── Compressible (HTML, CSS, JS, JSON, SVG, plain text)
    │
    └── Client supports brotli (Accept-Encoding: br)?
        ├── Yes → Use brotli (20–30% better ratio than gzip, similar CPU)
        │
        └── No  → Client supports gzip?
                  ├── Yes → Use gzip (universal support, fast)
                  └── No  → Serve identity (uncompressed)

Response size?
├── < 1 KB → Skip compression (overhead > savings for tiny responses)
└── > 1 KB → Always compress compressible content

Origin Shield Sizing

Number of shield nodes:

If traffic_to_origin_without_shield = T requests/s
And shield_TTL = N seconds
And unique_items_requested_per_N_seconds = U

Then shield reduces origin to: U / N requests/s (one miss per item per TTL window)

Shield nodes needed = max(1, ceil(T / node_capacity))

For a single-node shield serving 10,000 edge requests/s with 300s TTL:

  • If working set has 1,000 unique items: ~3.3 origin requests/s
  • If working set has 100,000 unique items: ~333 origin requests/s

The shield’s cache size must comfortably hold the working set.


HTTP/3 Adoption Decision

Is your user base on mobile or high-latency networks?
├── Yes → Prioritize HTTP/3; measurable 15–30% improvement on lossy links
│
└── No  → HTTP/2 is sufficient for datacenter-quality paths

Does your CDN vendor support HTTP/3?
├── No  → Wait for vendor support before implementing
│
└── Yes → Enable HTTP/3 with Alt-Svc fallback to HTTP/2
          (no downside — clients that don't support H3 use H2 automatically)

Can you deploy QUIC? (UDP 443 not blocked at your edge)
├── No  → QUIC blocked by firewall; HTTP/3 won't work
│         Consider: if customers are on corporate networks, H3 gains are limited
│
└── Yes → Enable H3; measure adoption rate in 30 days

SLO Tier Selection

Traffic levelAppropriate SLOReasoning
< 10k users99.0% (87.6h/year downtime)Small teams can’t sustain higher
10k–1M users99.9% (8.76h/year downtime)Standard web service
1M+ users99.95% (4.38h/year downtime)Revenue-critical
Enterprise SLA99.99% (52.6 min/year downtime)Very expensive to maintain

The rule: SLO should be set below what you can actually achieve. If your system is at 99.97% reliability, set SLO at 99.9%. The gap between actual and SLO is your buffer for incidents.

Setting SLO at 99.99% when you achieve 99.97% means you’re always burning error budget, which prevents the team from shipping new features.


Purge Strategy Selection

How frequently do you need to invalidate cached content?
│
├── Rarely (< 1/day)
│   → URL-based purge is sufficient (specify exact URLs to purge)
│
├── Regularly (dozens per day)
│   → Use surrogate keys / cache tags
│   → Tag responses: "article-123", "author-456"
│   → Purge by tag on content update: purge("article-123")
│
└── Constantly (100s per day, CMS-driven)
    → Surrogate keys + webhook from CMS on publish event
    → CDN purge API called by publish webhook
    → Consider short TTLs (60s) + stale-while-revalidate as alternative
    
How many URLs does a content update affect?
├── 1 URL       → URL purge
├── 10–100 URLs → Surrogate key covering those URLs
└── 1000+ URLs  → Short TTL is better than purging 1000 URLs

Security Checklist for CDN Engineers

Before any CDN deployment to production:

  • Origin not reachable from internet — IP allowlist or mTLS
  • Signed URLs for private content — HMAC-SHA256 with constant-time comparison
  • WAF enabled — OWASP CRS at minimum
  • TLS 1.2 minimum — no TLS 1.0/1.1, no SSLv3
  • HSTS set — with preload for critical domains
  • no-store on sensitive paths — auth callbacks, session tokens, API keys
  • Purge API authenticated — not publicly accessible
  • Metrics endpoint restricted/metrics should not be public
  • Timing-safe signature comparisonhmac.Equal, not ==
  • Key rotation procedure documented — tested at least once
  • Rate limiting on public endpoints — prevents DoS via cache miss amplification