Lab 21 · The Full System

Run it: make lab-21
Source: labs/lab-21-full-system/main.go
Compose: labs/lab-21-full-system/docker-compose.yml

The Architecture

This final lab wires together everything from Labs 1–20 into a production-representative CDN system. It is a microcosm of how real CDNs like Cloudflare, Fastly, and AWS CloudFront are structured.

                 ┌─────────────────────────────────────────────────┐
                 │                  CDN System                      │
                 │                                                   │
  Internet  ──>  │  Edge NYC (:8080)  ──\                           │
                 │  (singleflight,         \                         │
                 │   signed URL verify,     → Shield (:8082)  ──>  Origin (:9001)
                 │   30s TTL, metrics)     /  (singleflight,
                 │                        /   300s TTL,
                 │  Edge LHR (:8081)  ──/    metrics)
                 │  (same config)            
                 │                                                   │
                 │  Prometheus (:9090)  Grafana (:3000)             │
                 └─────────────────────────────────────────────────┘

Component Responsibilities

Component	Port	Role
Origin	:9001	Source of truth. Serves all content. Simulates 50ms processing delay.
Shield	:8082	Aggregation layer. One connection to origin for many edge requests. 300s TTL.
Edge NYC	:8080	User-facing edge in New York. Validates signed URLs. 30s TTL.
Edge LHR	:8081	User-facing edge in London. Same config as NYC. 30s TTL.
Prometheus	:9090	Scrapes metrics from all nodes.
Grafana	:3000	Dashboards over Prometheus.

Multi-Tier TTL Design

The TTL cascade is intentional and critical:

User ── Edge (30s TTL) ── Shield (300s TTL) ── Origin

Why Edge TTL < Shield TTL?

The edge serves users directly. Fresh content reaches users within 30 seconds of origin publication. But the edge collapses requests from many users into one request to the shield.

The shield’s 300s TTL means: for any given piece of content, the shield makes at most one request to origin per 5 minutes. A popular item might be requested by 10,000 users/minute across both edges — the shield ensures origin sees only 1 request every 5 minutes for that item.

Without shield:
  10,000 users/min × 30s TTL edge = 333 cache misses/min to origin
  (every edge miss → origin request)

With shield (300s TTL):
  10,000 users/min × 30s TTL edge = 333 edge misses/min
  → All go to shield
  → Shield hit ratio ~98% (only 1 miss per 5 min)
  → ~7 requests/min reach origin

This is a 50× reduction in origin load.

Singleflight at Two Layers

Both edge and shield run singleflight.Group:

type CachingProxy struct {
    cache  *Cache
    origin string
    group  singleflight.Group  // deduplicates concurrent misses
}

func (p *CachingProxy) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    key := cacheKey(r)
    
    if item, ok := p.cache.Get(key); ok {
        serveFromCache(w, item)
        return
    }
    
    // Multiple concurrent requests for the same key?
    // singleflight collapses them into ONE upstream request
    result, _, _ := p.group.Do(key, func() (interface{}, error) {
        return p.fetchFromUpstream(r)
    })
    
    item := result.(*CacheItem)
    p.cache.Set(key, item)
    serveFromCache(w, item)
}

The thundering herd cascade: without singleflight at both layers, a popular item expiring simultaneously at 1,000 edge nodes would cause 1,000 concurrent requests to the shield, which would cause 1,000 concurrent requests to origin. Singleflight at edge reduces 1,000 → 1 per edge node. Singleflight at shield reduces 2 edge misses → 1 shield request to origin.

Signed URL Verification

The edge validates HMAC-signed URLs before serving any content:

func (e *Edge) verifySignedURL(r *http.Request) bool {
    sig := r.URL.Query().Get("sig")
    if sig == "" { return false }  // or true for public content
    
    expires, _ := strconv.ParseInt(r.URL.Query().Get("expires"), 10, 64)
    if time.Now().Unix() > expires {
        return false  // expired
    }
    
    keyver := r.URL.Query().Get("keyver")
    key, ok := e.signingKeys[keyver]
    if !ok { return false }
    
    canonical := fmt.Sprintf("GET\n%s\n%d\n", r.URL.Path, expires)
    expected := computeHMAC(key, canonical)
    
    return hmac.Equal([]byte(sig), []byte(expected))
}

The shield and origin do not re-verify — they trust the edge. This is the standard trust boundary design: validation happens at the first authorized boundary, not repeatedly at every tier.

Docker Compose

# labs/lab-21-full-system/docker-compose.yml
services:
  origin:
    build: .
    command: ["./cdn-lab21", "-role=origin", "-addr=:9001"]
    ports: ["9001:9001"]

  shield:
    build: .
    command: ["./cdn-lab21", "-role=shield", "-addr=:8082", "-upstream=http://origin:9001"]
    ports: ["8082:8082"]
    depends_on: [origin]

  edge-nyc:
    build: .
    command: ["./cdn-lab21", "-role=edge", "-addr=:8080", "-upstream=http://shield:8082", "-pop=NYC"]
    ports: ["8080:8080"]
    depends_on: [shield]

  edge-lhr:
    build: .
    command: ["./cdn-lab21", "-role=edge", "-addr=:8081", "-upstream=http://shield:8082", "-pop=LHR"]
    ports: ["8081:8081"]
    depends_on: [shield]

  prometheus:
    image: prom/prometheus:latest
    volumes: ["./prometheus.yml:/etc/prometheus/prometheus.yml"]
    ports: ["9090:9090"]

  grafana:
    image: grafana/grafana:latest
    ports: ["3000:3000"]
    depends_on: [prometheus]

Prometheus Configuration

# labs/lab-21-full-system/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'cdn-edge'
    static_configs:
      - targets: ['edge-nyc:8080', 'edge-lhr:8081']

  - job_name: 'cdn-shield'
    static_configs:
      - targets: ['shield:8082']

  - job_name: 'cdn-origin'
    static_configs:
      - targets: ['origin:9001']

Observing the System Under Load

With the system running, generate load and observe the cascade:

# Generate 1000 requests across 50 unique URLs
for i in $(seq 1 1000); do
  curl -s "http://localhost:8080/item/$((RANDOM % 50))" -o /dev/null
done

# Check metrics at each tier
# Edge NYC hit ratio
curl -s http://localhost:8080/metrics | grep cdn_requests_total

# Shield hit ratio  
curl -s http://localhost:8082/metrics | grep cdn_requests_total

# Origin request count (should be tiny compared to edge total)
curl -s http://localhost:9001/metrics | grep cdn_requests_total

You should see:

Edge hit ratio: ~80–90% (after warmup)
Shield hit ratio: ~95–99%
Origin requests: ~1–5% of edge total

Failure Modes & Resilience

Origin failure

Origin down → Shield gets 502/503 from origin
           → Shield returns stale-if-error (from Cache-Control)
           → Edge returns stale content to users

This is the “stale-if-error” pattern from Lab 7, applied system-wide. Users see slightly stale content rather than errors.

Shield failure

Shield down → Edge cannot reach upstream
           → Edge serves stale (if available) or 503

In production, the shield tier has multiple nodes behind a load balancer. A single shield failure routes to another shield node.

Edge failure

Edge-NYC down → Geo routing redirects NYC users to Edge-LHR
             → Higher latency but service continues

This is the health-check failover from Lab 15. Each edge registers with the geo-routing layer and is removed from rotation when health checks fail.

Path to Production

To harden this system for real traffic:

Replace in-memory cache with Redis: enables shared cache state across edge instances and survives restarts
Add TLS termination: Let’s Encrypt or ACME protocol for automatic certificate provisioning
Add rate limiting: token bucket per IP/user with Redis-backed counters
Add WAF rules: block common attack patterns (SQLi, XSS, path traversal)
Add CDN purge API: authenticated endpoint to purge cache keys by tag
Add distributed tracing: OpenTelemetry spans across edge → shield → origin
Add chaos testing: kill origin/shield randomly to validate resilience

Try It

# Start the full system with Docker Compose
cd labs/lab-21-full-system
docker compose up --build

# In another terminal: generate signed URL and fetch content
TOKEN=$(curl -s "http://localhost:8080/sign?path=/article/1&ttl=300")
curl -s "$TOKEN" -v

# View Prometheus metrics
open http://localhost:9090

# View Grafana (default credentials: admin/admin)
open http://localhost:3000

# Generate load test
for i in $(seq 1 5000); do
  curl -s "http://localhost:8080/item/$((RANDOM % 100))" -o /dev/null &
done
wait

# Observe the request waterfall through the tiers
curl -s http://localhost:8080/metrics | grep cdn_requests_total | head -5
curl -s http://localhost:8082/metrics | grep cdn_requests_total | head -5  
curl -s http://localhost:9001/metrics | grep cdn_requests_total | head -5

Keyboard shortcuts

The Hitchhiker's Guide to CDNs