Lab 19 · HLS Streaming & Segment Caching
Run it:
make lab-19
Source:labs/lab-19-hls-streaming/main.go
The Problem
Video delivery is the dominant use case for CDN infrastructure — in 2024, video represents ~65% of all internet traffic. Unlike web pages (one-shot request-response), video streaming is:
- High-bandwidth: a 4K stream is 25 Mbps; 1 million concurrent viewers require 25 Tbps of aggregate bandwidth
- Time-sensitive: a 2-second buffer stall causes viewer abandonment rates to jump 20%
- Long-duration: sessions last 30–120 minutes; cache TTLs matter differently for live vs. VOD
The CDN must cache aggressively to serve millions of concurrent viewers from edge rather than hammering the origin’s encoder/packager.
HLS: HTTP Live Streaming
HLS (RFC 8216) is the dominant streaming protocol for CDNs. It works by slicing video into short segments and serving them over plain HTTP:
Client CDN Origin Encoder
│ │ │
│── GET master.m3u8 ──────>│── (cache miss) ─────────>│
│<──── master playlist ────│<──── master playlist ─────│
│ │ (cache TTL: 60s) │
│── GET 720p/playlist.m3u8>│── (cache miss) ─────────>│
│<──── variant playlist ───│<──── variant playlist ────│
│ │ (cache TTL: 5s for live) │
│── GET seg001.ts ────────>│── (cache miss) ─────────>│
│<──── segment ────────────│<──── segment ─────────────│
│ │ (cache TTL: 24h immutable)│
│── GET seg002.ts ────────>│── (cache HIT) ────────────│ ← no origin hit
Three Types of Content, Three TTLs
HLS has three distinct content types with fundamentally different caching characteristics:
1. Media Segments (.ts, .fmp4) — TTL: 24 hours, immutable
Segments are content-addressed: once seg001.ts is generated and
named, it never changes. The name uniquely identifies the content.
// Immutable segment — cache forever
w.Header().Set("Cache-Control", "public, max-age=86400, immutable")
w.Header().Set("ETag", `"seg001-v1"`)
This is identical to the approach used for hashed static assets (main.abc123.js).
CDN hit ratios for segment requests should be ~99% once the initial viewers
warm the cache.
Thundering herd implication: When a new segment is published, the first viewer to request it causes a cache miss to origin. All subsequent viewers hit the cache. For popular streams (100k+ viewers), the initial miss is a single request to origin. This is excellent.
2. Variant Playlist (.m3u8 per quality level) — TTL: 5 seconds (live), longer for VOD
The variant playlist (e.g., 720p/playlist.m3u8) lists the available
segments:
#EXTM3U
#EXT-X-TARGETDURATION:6
#EXT-X-MEDIA-SEQUENCE:42
#EXTINF:6.006,
seg042.ts
#EXTINF:6.006,
seg043.ts
#EXTINF:6.006,
seg044.ts
For live streams, the playlist changes every segment duration (typically 2–6 seconds). It must not be cached too long or viewers fall behind the live edge.
// Short TTL for live variant playlist
w.Header().Set("Cache-Control", "public, max-age=5")
w.Header().Set("ETag", etag) // still ETag for conditional requests
The thundering herd problem here: Every viewer polls the variant playlist every ~5 seconds. With 100k viewers, that’s 20k requests/second to the CDN for a single stream’s variant playlist — all simultaneously (viewers synchronize on segment boundaries).
Singleflight at the CDN level is essential here. The lab’s populateCache
function uses singleflight.Group to collapse concurrent playlist requests.
3. Master Playlist (.m3u8 top-level) — TTL: 60 seconds
The master playlist lists the variant streams:
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=640x360
360p/playlist.m3u8
This changes rarely (new quality levels, DRM changes). A 60-second TTL allows clients to adapt to changes within a minute.
// Medium TTL for master playlist
w.Header().Set("Cache-Control", "public, max-age=60")
Segment Prefetch
After parsing a variant playlist, the CDN can proactively fetch the next segments from origin before any client requests them:
func (c *Cache) prefetchSegments(playlistURL string, playlist []byte) {
urls := parseSegmentURLs(playlist) // extract seg URLs from M3U8
for _, url := range urls {
if !c.Has(url) {
go c.warmSegment(url) // fetch in background
}
}
}
This converts cache misses on segment requests to cache hits:
Without prefetch:
Viewer arrives → GET seg042.ts (miss) → wait for origin → play
Next viewer → GET seg042.ts (hit) → instant play
With prefetch:
New playlist published → CDN prefetches seg042.ts
Viewer arrives → GET seg042.ts (hit) → instant play
All viewers get cache hits
Prefetch is standard on CDNs like Cloudflare Stream and Fastly.
LL-HLS: Low-Latency HLS
Standard HLS has a live latency of 3–5 segments (~15–30 seconds). This is acceptable for broadcast TV but too high for sports, gaming streams, or live auctions.
LL-HLS (Low-Latency HLS, RFC 8216 Appendix) reduces latency to 2–5 seconds:
- Partial Segments: segments are delivered as they’re being encoded in partial 200ms chunks
- Playlist Delta Updates: only changed lines of the playlist are sent
- Blocking Playlist Request: client sends
_HLS_msn=44parameter; CDN holds the request until segment 44 is available (HTTP long poll)
Client: GET /playlist.m3u8?_HLS_msn=44&_HLS_part=0
CDN: [holds request until segment 44 part 0 is available]
CDN: → 200 OK with updated playlist ← instant delivery at segment publish
LL-HLS requires CDN support. As of 2024, Cloudflare, Fastly, and AWS CloudFront all support LL-HLS.
CMAF: Common Media Application Format
Traditional HLS uses MPEG-2 TS (.ts) container. MPEG-DASH uses fMP4.
These are incompatible, requiring separate encoder pipelines.
CMAF (ISO 23000-19) standardizes on fMP4 as the container for both HLS and DASH:
CMAF Encoder:
Input → fMP4 chunks → HLS playlist (.m3u8 + .cmfv/.cmfa)
→ DASH manifest (.mpd + .cmfv/.cmfa)
One encode, two protocol manifests. Netflix, Apple, and major CDNs use
CMAF. The .ts format is legacy at this point; new deployments should
use fMP4/CMAF.
VOD vs. Live Caching Strategy
| Aspect | VOD | Live |
|---|---|---|
| Segment TTL | Forever (immutable) | Forever (immutable — same!) |
| Variant playlist TTL | Minutes to hours | 2–10 seconds |
| Master playlist TTL | Hours | 30–60 seconds |
| Cache fill | Can prefetch everything | Must chase live edge |
| Thundering herd | Only at launch | Every 5 seconds, always |
| Cache-Control header | immutable | Short max-age |
Try It
make lab-19
# Fetch master playlist
curl http://localhost:8080/stream/master.m3u8 -v
# Fetch 720p variant playlist
curl http://localhost:8080/stream/720p/playlist.m3u8 -v
# Note Cache-Control: max-age=5
# Fetch a segment (first one is a cache miss, note timing)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null
# Fetch the same segment again (cache hit, much faster)
time curl http://localhost:8080/stream/720p/seg001.ts -o /dev/null
# Hit the cache stats endpoint
curl http://localhost:8080/metrics/cache -s | python3 -m json.tool
# Simulate thundering herd on playlist
for i in $(seq 1 20); do
curl -s http://localhost:8080/stream/720p/playlist.m3u8 -o /dev/null &
done
wait
# Check singleflight collapsed these into one origin request
curl http://localhost:8080/metrics/cache