Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 10 · Compression

Run it: make lab-10
Source: labs/lab-10-compression/main.go


The Problem

Network bandwidth is neither free nor unlimited. Compressing HTTP responses before delivery:

  1. Reduces latency: smaller payload = faster transfer, especially on mobile networks (LTE: ~20 Mbps, high latency)
  2. Saves egress cost: CDN egress pricing ($0.01–0.09/GB); compression typically achieves 60–80% size reduction on text
  3. Improves user experience: a 500 KB page compressed to 120 KB loads 4× faster on a 1 Mbps mobile connection

The CDN is uniquely positioned to apply compression because:

  • It has fast CPUs dedicated to edge functions
  • Pre-compressing on cache store amortizes CPU cost over many serves
  • Origin doesn’t need to compress repeatedly for each request

HTTP Content Negotiation

The client advertises its supported encodings:

Accept-Encoding: br, gzip, deflate, zstd;q=0.9

The server selects from the client’s list and responds with:

Content-Encoding: br
Content-Length: 12340
Vary: Accept-Encoding

Quality Values (q-values)

The ;q=N suffix is a preference weight from 0 to 1. The CDN should select the encoding with the highest q-value that it supports:

Accept-Encoding: br;q=1.0, gzip;q=0.9, *;q=0.5
→ Prefer br, then gzip, then any other encoding

The Three Encodings

gzip (RFC 1952 + deflate)

The universal standard. Every HTTP client built since 1997 supports gzip. Wrap deflate (DEFLATE algorithm) with a CRC-32 checksum.

compression ratio: ~67% (text)    3 KB HTML → ~1 KB
throughput:        ~400 MB/s (klauspost/compress implementation)

gzip is based on LZ77 sliding window compression + Huffman coding. The sliding window size (8 KB–32 KB) controls compression ratio vs. memory. Larger window = better ratio, more memory.

Brotli (RFC 7932)

Developed by Google, released 2015. Designed specifically for HTTP text compression. Uses a pre-built dictionary of common HTML/CSS/JS tokens plus the standard DEFLATE approach.

compression ratio: ~82% (text)    3 KB HTML → ~540 bytes  (≈15% better than gzip)
throughput:        ~300 MB/s
browser support:   all modern browsers (IE 11 and below: no)

Brotli at quality level 11 (max) achieves the best ratio but is very slow to compress (~10 MB/s). CDNs typically use quality 4–6 for on-the-fly compression and quality 11 for pre-compressed static assets.

Zstd (RFC 8478)

Facebook’s Zstandard, released 2016. Extremely fast decompression.

compression ratio: ~70–80% (text)
throughput:        ~2 GB/s compression, ~5 GB/s decompression
use case:          origin-to-CDN links, inter-datacenter transfers

Zstd is not yet universally supported in browsers (Chrome only, 2023). Its main CDN use case is origin-to-edge compression: Cloudflare uses zstd between their edge nodes and origin servers where both ends are controlled.


Storage Strategies

1. Store compressed, serve compressed

Store one compressed version per encoding. On request, check Accept-Encoding and serve the matching stored version:

type cacheEntry struct {
    rawBody    []byte   // uncompressed
    gzipBody   []byte   // gzip compressed
    brotliBody []byte   // brotli compressed
}
  • Pros: zero per-request CPU for compression
  • Cons: 2–3× storage overhead (each encoding stored separately)
  • Best for: static assets with long TTL, high request volume

2. Compress on-the-fly

Store uncompressed. Compress each response at serve time:

func compressResponse(w io.Writer, body []byte, encoding string) error {
    switch encoding {
    case "br":
        bw := brotli.NewWriterLevel(w, brotli.DefaultCompression)
        defer bw.Close()
        _, err := bw.Write(body)
        return err
    case "gzip":
        gw, _ := gzip.NewWriterLevel(w, gzip.BestSpeed)
        defer gw.Close()
        _, err := gw.Write(body)
        return err
    }
    _, err := w.Write(body)
    return err
}
  • Pros: 1× storage, always fresh compression
  • Cons: CPU cost per request (~1 µs/KB for gzip, ~3 µs/KB for brotli)
  • Best for: dynamic content with short TTL, low repetition

3. Pre-compressed at origin

Origin stores pre-compressed files:

/assets/app.js       → not pre-compressed
/assets/app.js.gz    → gzip pre-compressed
/assets/app.js.br    → brotli pre-compressed

CDN serves app.js.gz or app.js.br based on Accept-Encoding. No CPU overhead at CDN. Common for static site CDNs (S3 + CloudFront).


The Vary: Accept-Encoding Requirement

When the CDN stores multiple encodings of the same URL, it must include Vary: Accept-Encoding in responses. This tells downstream caches (browsers, ISP proxies) that the response differs by encoding.

Without Vary, a browser may cache the gzip version and later send it to a client that only supports plain text — corrupted response.

Also: the CDN must maintain separate cache entries keyed by encoding. See Lab 05 for how the cache key is expanded using Vary headers.


What Not to Compress

Content typeCompress?Reason
HTML, CSS, JS✓ alwaysHigh text entropy; 60–80% savings
JSON APIs✓ alwaysOften compresses 5–10×
SVG, XMLXML is verbose
JPEG, PNG, WebPAlready compressed; gzip adds overhead
MP4, WebMAlready compressed
PDFUsually already compressed internally
Already gzippedDouble-compressing = larger output
< 1 KBOptionalOverhead exceeds savings

The CDN should check Content-Type before compressing and skip binary formats. Most CDNs have a built-in list of compressible MIME types.


Compression Savings Calculator

For a site serving 1 TB/month with 70% text responses (700 GB):

gzip saves 67%:     700 GB × 0.67 = 469 GB saved
At $0.05/GB egress: 469 GB × $0.05 = $23.45/month saved

brotli saves 82%:   700 GB × 0.82 = 574 GB saved
At $0.05/GB egress: 574 GB × $0.05 = $28.70/month saved

At petabyte scale (Netflix, YouTube), compression savings run to millions of dollars per month.


Try It

make lab-10

# Request with brotli (best compression)
curl http://localhost:8080/article/1 -H "Accept-Encoding: br" \
  -v --output /dev/null 2>&1 | grep -i "content-encoding"

# Request with gzip
curl http://localhost:8080/article/1 -H "Accept-Encoding: gzip" \
  --compressed -v

# No compression (compare sizes)
curl http://localhost:8080/article/1 -H "Accept-Encoding: identity" -v

# Compare content lengths:
for enc in br gzip identity; do
  echo -n "$enc: "
  curl -s http://localhost:8080/article/1 -H "Accept-Encoding: $enc" \
    -o /tmp/response -w "%{size_download} bytes\n"
done