Lab 19 · HLS Streaming & Segment Caching

Run it: make lab-19
Source: labs/lab-19-hls-streaming/main.go

The Problem

Video delivery is the dominant use case for CDN infrastructure — in 2024, video represents ~65% of all internet traffic. Unlike web pages (one-shot request-response), video streaming is:

High-bandwidth: a 4K stream is 25 Mbps; 1 million concurrent viewers require 25 Tbps of aggregate bandwidth
Time-sensitive: a 2-second buffer stall causes viewer abandonment rates to jump 20%
Long-duration: sessions last 30–120 minutes; cache TTLs matter differently for live vs. VOD

The CDN must cache aggressively to serve millions of concurrent viewers from edge rather than hammering the origin’s encoder/packager.

HLS: HTTP Live Streaming

HLS (RFC 8216) is the dominant streaming protocol for CDNs. It works by slicing video into short segments and serving them over plain HTTP:

Client                      CDN                     Origin Encoder
  │                          │                          │
  │── GET master.m3u8 ──────>│── (cache miss) ─────────>│
  │<──── master playlist ────│<──── master playlist ─────│
  │                          │ (cache TTL: 60s)          │
  │── GET 720p/playlist.m3u8>│── (cache miss) ─────────>│
  │<──── variant playlist ───│<──── variant playlist ────│
  │                          │ (cache TTL: 5s for live)  │
  │── GET seg001.ts ────────>│── (cache miss) ─────────>│
  │<──── segment ────────────│<──── segment ─────────────│
  │                          │ (cache TTL: 24h immutable)│
  │── GET seg002.ts ────────>│── (cache HIT) ────────────│  ← no origin hit

Three Types of Content, Three TTLs

HLS has three distinct content types with fundamentally different caching characteristics:

1. Media Segments (`.ts`, `.fmp4`) — TTL: 24 hours, immutable

Segments are content-addressed: once seg001.ts is generated and named, it never changes. The name uniquely identifies the content.

// Immutable segment — cache forever
w.Header().Set("Cache-Control", "public, max-age=86400, immutable")
w.Header().Set("ETag", `"seg001-v1"`)

This is identical to the approach used for hashed static assets (main.abc123.js). CDN hit ratios for segment requests should be ~99% once the initial viewers warm the cache.

Thundering herd implication: When a new segment is published, the first viewer to request it causes a cache miss to origin. All subsequent viewers hit the cache. For popular streams (100k+ viewers), the initial miss is a single request to origin. This is excellent.

2. Variant Playlist (`.m3u8` per quality level) — TTL: 5 seconds (live), longer for VOD

The variant playlist (e.g., 720p/playlist.m3u8) lists the available segments:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:42

#EXTINF:6.006,
seg042.ts
#EXTINF:6.006,
seg043.ts
#EXTINF:6.006,
seg044.ts

For live streams, the playlist changes every segment duration (typically 2–6 seconds). It must not be cached too long or viewers fall behind the live edge.

// Short TTL for live variant playlist
w.Header().Set("Cache-Control", "public, max-age=5")
w.Header().Set("ETag", etag)  // still ETag for conditional requests

The thundering herd problem here: Every viewer polls the variant playlist every ~5 seconds. With 100k viewers, that’s 20k requests/second to the CDN for a single stream’s variant playlist — all simultaneously (viewers synchronize on segment boundaries).

Singleflight at the CDN level is essential here. The lab’s populateCache function uses singleflight.Group to collapse concurrent playlist requests.

3. Master Playlist (`.m3u8` top-level) — TTL: 60 seconds

The master playlist lists the variant streams:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8

This changes rarely (new quality levels, DRM changes). A 60-second TTL allows clients to adapt to changes within a minute.

// Medium TTL for master playlist
w.Header().Set("Cache-Control", "public, max-age=60")

Segment Prefetch

After parsing a variant playlist, the CDN can proactively fetch the next segments from origin before any client requests them:

func (c *Cache) prefetchSegments(playlistURL string, playlist []byte) {
    urls := parseSegmentURLs(playlist)  // extract seg URLs from M3U8
    for _, url := range urls {
        if !c.Has(url) {
            go c.warmSegment(url)  // fetch in background
        }
    }
}

This converts cache misses on segment requests to cache hits:

Without prefetch:
  Viewer arrives → GET seg042.ts (miss) → wait for origin → play
  Next viewer → GET seg042.ts (hit) → instant play

With prefetch:
  New playlist published → CDN prefetches seg042.ts
  Viewer arrives → GET seg042.ts (hit) → instant play
  All viewers get cache hits

Prefetch is standard on CDNs like Cloudflare Stream and Fastly.

LL-HLS: Low-Latency HLS

Standard HLS has a live latency of 3–5 segments (~15–30 seconds). This is acceptable for broadcast TV but too high for sports, gaming streams, or live auctions.

LL-HLS (Low-Latency HLS, RFC 8216 Appendix) reduces latency to 2–5 seconds:

Partial Segments: segments are delivered as they’re being encoded in partial 200ms chunks
Playlist Delta Updates: only changed lines of the playlist are sent
Blocking Playlist Request: client sends _HLS_msn=44 parameter; CDN holds the request until segment 44 is available (HTTP long poll)

Client: GET /playlist.m3u8?_HLS_msn=44&_HLS_part=0
CDN:    [holds request until segment 44 part 0 is available]
CDN:    → 200 OK with updated playlist  ← instant delivery at segment publish

LL-HLS requires CDN support. As of 2024, Cloudflare, Fastly, and AWS CloudFront all support LL-HLS.

CMAF: Common Media Application Format

Traditional HLS uses MPEG-2 TS (.ts) container. MPEG-DASH uses fMP4. These are incompatible, requiring separate encoder pipelines.

CMAF (ISO 23000-19) standardizes on fMP4 as the container for both HLS and DASH:

CMAF Encoder:
  Input → fMP4 chunks → HLS playlist (.m3u8 + .cmfv/.cmfa)
                      → DASH manifest (.mpd + .cmfv/.cmfa)

One encode, two protocol manifests. Netflix, Apple, and major CDNs use CMAF. The .ts format is legacy at this point; new deployments should use fMP4/CMAF.

VOD vs. Live Caching Strategy

Aspect	VOD	Live
Segment TTL	Forever (immutable)	Forever (immutable — same!)
Variant playlist TTL	Minutes to hours	2–10 seconds
Master playlist TTL	Hours	30–60 seconds
Cache fill	Can prefetch everything	Must chase live edge
Thundering herd	Only at launch	Every 5 seconds, always
Cache-Control header	`immutable`	Short `max-age`

Try It

make lab-19

# Fetch master playlist
curl http://localhost:8080/stream/master.m3u8 -v

# Fetch 720p variant playlist
curl http://localhost:8080/stream/720p/playlist.m3u8 -v
# Note Cache-Control: max-age=5

# Fetch a segment (first one is a cache miss, note timing)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Fetch the same segment again (cache hit, much faster)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Hit the cache stats endpoint
curl http://localhost:8080/metrics/cache -s | python3 -m json.tool

# Simulate thundering herd on playlist
for i in $(seq 1 20); do
  curl -s http://localhost:8080/stream/720p/playlist.m3u8 -o /dev/null &
done
wait
# Check singleflight collapsed these into one origin request
curl http://localhost:8080/metrics/cache

Keyboard shortcuts

The Hitchhiker's Guide to CDNs