Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 19 · HLS Streaming & Segment Caching

Run it: make lab-19
Source: labs/lab-19-hls-streaming/main.go


The Problem

Video delivery is the dominant use case for CDN infrastructure — in 2024, video represents ~65% of all internet traffic. Unlike web pages (one-shot request-response), video streaming is:

  • High-bandwidth: a 4K stream is 25 Mbps; 1 million concurrent viewers require 25 Tbps of aggregate bandwidth
  • Time-sensitive: a 2-second buffer stall causes viewer abandonment rates to jump 20%
  • Long-duration: sessions last 30–120 minutes; cache TTLs matter differently for live vs. VOD

The CDN must cache aggressively to serve millions of concurrent viewers from edge rather than hammering the origin’s encoder/packager.


HLS: HTTP Live Streaming

HLS (RFC 8216) is the dominant streaming protocol for CDNs. It works by slicing video into short segments and serving them over plain HTTP:

Client                      CDN                     Origin Encoder
  │                          │                          │
  │── GET master.m3u8 ──────>│── (cache miss) ─────────>│
  │<──── master playlist ────│<──── master playlist ─────│
  │                          │ (cache TTL: 60s)          │
  │── GET 720p/playlist.m3u8>│── (cache miss) ─────────>│
  │<──── variant playlist ───│<──── variant playlist ────│
  │                          │ (cache TTL: 5s for live)  │
  │── GET seg001.ts ────────>│── (cache miss) ─────────>│
  │<──── segment ────────────│<──── segment ─────────────│
  │                          │ (cache TTL: 24h immutable)│
  │── GET seg002.ts ────────>│── (cache HIT) ────────────│  ← no origin hit

Three Types of Content, Three TTLs

HLS has three distinct content types with fundamentally different caching characteristics:


1. Media Segments (.ts, .fmp4) — TTL: 24 hours, immutable

Segments are content-addressed: once seg001.ts is generated and named, it never changes. The name uniquely identifies the content.

// Immutable segment — cache forever
w.Header().Set("Cache-Control", "public, max-age=86400, immutable")
w.Header().Set("ETag", `"seg001-v1"`)

This is identical to the approach used for hashed static assets (main.abc123.js). CDN hit ratios for segment requests should be ~99% once the initial viewers warm the cache.

Thundering herd implication: When a new segment is published, the first viewer to request it causes a cache miss to origin. All subsequent viewers hit the cache. For popular streams (100k+ viewers), the initial miss is a single request to origin. This is excellent.


2. Variant Playlist (.m3u8 per quality level) — TTL: 5 seconds (live), longer for VOD

The variant playlist (e.g., 720p/playlist.m3u8) lists the available segments:

#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:42

#EXTINF:6.006,
seg042.ts
#EXTINF:6.006,
seg043.ts
#EXTINF:6.006,
seg044.ts

For live streams, the playlist changes every segment duration (typically 2–6 seconds). It must not be cached too long or viewers fall behind the live edge.

// Short TTL for live variant playlist
w.Header().Set("Cache-Control", "public, max-age=5")
w.Header().Set("ETag", etag)  // still ETag for conditional requests

The thundering herd problem here: Every viewer polls the variant playlist every ~5 seconds. With 100k viewers, that’s 20k requests/second to the CDN for a single stream’s variant playlist — all simultaneously (viewers synchronize on segment boundaries).

Singleflight at the CDN level is essential here. The lab’s populateCache function uses singleflight.Group to collapse concurrent playlist requests.


3. Master Playlist (.m3u8 top-level) — TTL: 60 seconds

The master playlist lists the variant streams:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8

This changes rarely (new quality levels, DRM changes). A 60-second TTL allows clients to adapt to changes within a minute.

// Medium TTL for master playlist
w.Header().Set("Cache-Control", "public, max-age=60")

Segment Prefetch

After parsing a variant playlist, the CDN can proactively fetch the next segments from origin before any client requests them:

func (c *Cache) prefetchSegments(playlistURL string, playlist []byte) {
    urls := parseSegmentURLs(playlist)  // extract seg URLs from M3U8
    for _, url := range urls {
        if !c.Has(url) {
            go c.warmSegment(url)  // fetch in background
        }
    }
}

This converts cache misses on segment requests to cache hits:

Without prefetch:
  Viewer arrives → GET seg042.ts (miss) → wait for origin → play
  Next viewer → GET seg042.ts (hit) → instant play

With prefetch:
  New playlist published → CDN prefetches seg042.ts
  Viewer arrives → GET seg042.ts (hit) → instant play
  All viewers get cache hits

Prefetch is standard on CDNs like Cloudflare Stream and Fastly.


LL-HLS: Low-Latency HLS

Standard HLS has a live latency of 3–5 segments (~15–30 seconds). This is acceptable for broadcast TV but too high for sports, gaming streams, or live auctions.

LL-HLS (Low-Latency HLS, RFC 8216 Appendix) reduces latency to 2–5 seconds:

  1. Partial Segments: segments are delivered as they’re being encoded in partial 200ms chunks
  2. Playlist Delta Updates: only changed lines of the playlist are sent
  3. Blocking Playlist Request: client sends _HLS_msn=44 parameter; CDN holds the request until segment 44 is available (HTTP long poll)
Client: GET /playlist.m3u8?_HLS_msn=44&_HLS_part=0
CDN:    [holds request until segment 44 part 0 is available]
CDN:    → 200 OK with updated playlist  ← instant delivery at segment publish

LL-HLS requires CDN support. As of 2024, Cloudflare, Fastly, and AWS CloudFront all support LL-HLS.


CMAF: Common Media Application Format

Traditional HLS uses MPEG-2 TS (.ts) container. MPEG-DASH uses fMP4. These are incompatible, requiring separate encoder pipelines.

CMAF (ISO 23000-19) standardizes on fMP4 as the container for both HLS and DASH:

CMAF Encoder:
  Input → fMP4 chunks → HLS playlist (.m3u8 + .cmfv/.cmfa)
                      → DASH manifest (.mpd + .cmfv/.cmfa)

One encode, two protocol manifests. Netflix, Apple, and major CDNs use CMAF. The .ts format is legacy at this point; new deployments should use fMP4/CMAF.


VOD vs. Live Caching Strategy

AspectVODLive
Segment TTLForever (immutable)Forever (immutable — same!)
Variant playlist TTLMinutes to hours2–10 seconds
Master playlist TTLHours30–60 seconds
Cache fillCan prefetch everythingMust chase live edge
Thundering herdOnly at launchEvery 5 seconds, always
Cache-Control headerimmutableShort max-age

Try It

make lab-19

# Fetch master playlist
curl http://localhost:8080/stream/master.m3u8 -v

# Fetch 720p variant playlist
curl http://localhost:8080/stream/720p/playlist.m3u8 -v
# Note Cache-Control: max-age=5

# Fetch a segment (first one is a cache miss, note timing)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Fetch the same segment again (cache hit, much faster)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null

# Hit the cache stats endpoint
curl http://localhost:8080/metrics/cache -s | python3 -m json.tool

# Simulate thundering herd on playlist
for i in $(seq 1 20); do
  curl -s http://localhost:8080/stream/720p/playlist.m3u8 -o /dev/null &
done
wait
# Check singleflight collapsed these into one origin request
curl http://localhost:8080/metrics/cache