Lab 10 · Compression
Run it:
make lab-10
Source:labs/lab-10-compression/main.go
The Problem
Network bandwidth is neither free nor unlimited. Compressing HTTP responses before delivery:
- Reduces latency: smaller payload = faster transfer, especially on mobile networks (LTE: ~20 Mbps, high latency)
- Saves egress cost: CDN egress pricing ($0.01–0.09/GB); compression typically achieves 60–80% size reduction on text
- Improves user experience: a 500 KB page compressed to 120 KB loads 4× faster on a 1 Mbps mobile connection
The CDN is uniquely positioned to apply compression because:
- It has fast CPUs dedicated to edge functions
- Pre-compressing on cache store amortizes CPU cost over many serves
- Origin doesn’t need to compress repeatedly for each request
HTTP Content Negotiation
The client advertises its supported encodings:
Accept-Encoding: br, gzip, deflate, zstd;q=0.9
The server selects from the client’s list and responds with:
Content-Encoding: br
Content-Length: 12340
Vary: Accept-Encoding
Quality Values (q-values)
The ;q=N suffix is a preference weight from 0 to 1. The CDN should
select the encoding with the highest q-value that it supports:
Accept-Encoding: br;q=1.0, gzip;q=0.9, *;q=0.5
→ Prefer br, then gzip, then any other encoding
The Three Encodings
gzip (RFC 1952 + deflate)
The universal standard. Every HTTP client built since 1997 supports gzip. Wrap deflate (DEFLATE algorithm) with a CRC-32 checksum.
compression ratio: ~67% (text) 3 KB HTML → ~1 KB
throughput: ~400 MB/s (klauspost/compress implementation)
gzip is based on LZ77 sliding window compression + Huffman coding. The sliding window size (8 KB–32 KB) controls compression ratio vs. memory. Larger window = better ratio, more memory.
Brotli (RFC 7932)
Developed by Google, released 2015. Designed specifically for HTTP text compression. Uses a pre-built dictionary of common HTML/CSS/JS tokens plus the standard DEFLATE approach.
compression ratio: ~82% (text) 3 KB HTML → ~540 bytes (≈15% better than gzip)
throughput: ~300 MB/s
browser support: all modern browsers (IE 11 and below: no)
Brotli at quality level 11 (max) achieves the best ratio but is very slow to compress (~10 MB/s). CDNs typically use quality 4–6 for on-the-fly compression and quality 11 for pre-compressed static assets.
Zstd (RFC 8478)
Facebook’s Zstandard, released 2016. Extremely fast decompression.
compression ratio: ~70–80% (text)
throughput: ~2 GB/s compression, ~5 GB/s decompression
use case: origin-to-CDN links, inter-datacenter transfers
Zstd is not yet universally supported in browsers (Chrome only, 2023). Its main CDN use case is origin-to-edge compression: Cloudflare uses zstd between their edge nodes and origin servers where both ends are controlled.
Storage Strategies
1. Store compressed, serve compressed
Store one compressed version per encoding. On request, check
Accept-Encoding and serve the matching stored version:
type cacheEntry struct {
rawBody []byte // uncompressed
gzipBody []byte // gzip compressed
brotliBody []byte // brotli compressed
}
- Pros: zero per-request CPU for compression
- Cons: 2–3× storage overhead (each encoding stored separately)
- Best for: static assets with long TTL, high request volume
2. Compress on-the-fly
Store uncompressed. Compress each response at serve time:
func compressResponse(w io.Writer, body []byte, encoding string) error {
switch encoding {
case "br":
bw := brotli.NewWriterLevel(w, brotli.DefaultCompression)
defer bw.Close()
_, err := bw.Write(body)
return err
case "gzip":
gw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
defer gw.Close()
_, err := gw.Write(body)
return err
}
_, err := w.Write(body)
return err
}
- Pros: 1× storage, always fresh compression
- Cons: CPU cost per request (~1 µs/KB for gzip, ~3 µs/KB for brotli)
- Best for: dynamic content with short TTL, low repetition
3. Pre-compressed at origin
Origin stores pre-compressed files:
/assets/app.js → not pre-compressed
/assets/app.js.gz → gzip pre-compressed
/assets/app.js.br → brotli pre-compressed
CDN serves app.js.gz or app.js.br based on Accept-Encoding.
No CPU overhead at CDN. Common for static site CDNs (S3 + CloudFront).
The Vary: Accept-Encoding Requirement
When the CDN stores multiple encodings of the same URL, it must include
Vary: Accept-Encoding in responses. This tells downstream caches (browsers,
ISP proxies) that the response differs by encoding.
Without Vary, a browser may cache the gzip version and later send it to
a client that only supports plain text — corrupted response.
Also: the CDN must maintain separate cache entries keyed by encoding.
See Lab 05 for how the cache key is expanded using Vary headers.
What Not to Compress
| Content type | Compress? | Reason |
|---|---|---|
| HTML, CSS, JS | ✓ always | High text entropy; 60–80% savings |
| JSON APIs | ✓ always | Often compresses 5–10× |
| SVG, XML | ✓ | XML is verbose |
| JPEG, PNG, WebP | ✗ | Already compressed; gzip adds overhead |
| MP4, WebM | ✗ | Already compressed |
| ✗ | Usually already compressed internally | |
| Already gzipped | ✗ | Double-compressing = larger output |
| < 1 KB | Optional | Overhead exceeds savings |
The CDN should check Content-Type before compressing and skip binary
formats. Most CDNs have a built-in list of compressible MIME types.
Compression Savings Calculator
For a site serving 1 TB/month with 70% text responses (700 GB):
gzip saves 67%: 700 GB × 0.67 = 469 GB saved
At $0.05/GB egress: 469 GB × $0.05 = $23.45/month saved
brotli saves 82%: 700 GB × 0.82 = 574 GB saved
At $0.05/GB egress: 574 GB × $0.05 = $28.70/month saved
At petabyte scale (Netflix, YouTube), compression savings run to millions of dollars per month.
Try It
make lab-10
# Request with brotli (best compression)
curl http://localhost:8080/article/1 -H "Accept-Encoding: br" \
-v --output /dev/null 2>&1 | grep -i "content-encoding"
# Request with gzip
curl http://localhost:8080/article/1 -H "Accept-Encoding: gzip" \
--compressed -v
# No compression (compare sizes)
curl http://localhost:8080/article/1 -H "Accept-Encoding: identity" -v
# Compare content lengths:
for enc in br gzip identity; do
echo -n "$enc: "
curl -s http://localhost:8080/article/1 -H "Accept-Encoding: $enc" \
-o /tmp/response -w "%{size_download} bytes\n"
done