Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 05 · Cache Key Design

Run it: make lab-05
Source: labs/lab-05-cache-key-design/main.go


The Problem

A cache key is the identifier under which a response is stored and looked up. If the key is wrong, everything downstream breaks:

  • Too narrow (same key for different content): serve wrong content to wrong user, or collapse variations into one response.
  • Too wide (include irrelevant query params): artificially low hit ratio, wasted storage, repeated origin fetches for identical content.

In production, cache key design is one of the highest-leverage activities a CDN engineer performs. A 10-minute key design review can improve hit ratio from 70% to 95%.


The Basic Key: URL Normalization

The naive key is req.URL.String(). This breaks immediately when:

/article/1?foo=bar         # different key from
/article/1?bar=foo         # same semantically

Query parameters don’t have a defined order. Two requests for the same resource with parameters in different order are identical, but a naive cache sees them as different URLs.

Normalization steps (applied by Lab 05):

func normalizeKey(u *url.URL) string {
    // 1. Lowercase the path
    u.Path = strings.ToLower(u.Path)

    // 2. Remove tracking parameters
    q := u.Query()
    for _, p := range trackingParams {
        q.Del(p)
    }

    // 3. Sort remaining parameters deterministically
    for k, v := range q {
        sort.Strings(v)
    }
    keys := make([]string, 0, len(q))
    for k := range q { keys = append(keys, k) }
    sort.Strings(keys)
    // Rebuild in sorted order
    ...

    // 4. Strip fragment (#anchor — never sent to server but can appear in
    //    reconstructed URLs)
    u.Fragment = ""
}

Tracking Parameter Pollution

Marketing teams append tracking parameters to every URL. These are meaningless to the origin but fragment your cache:

ParameterSource
utm_source, utm_medium, utm_campaign, utm_term, utm_contentGoogle Analytics
fbclidFacebook click ID
gclid, gad_sourceGoogle Ads
mc_eidMailchimp email ID
_gaGoogle Analytics cross-domain
msclkidMicrosoft Ads
twclidTwitter click ID
ref, referralGeneric referral parameters

Without stripping these, each user who clicks a Facebook ad link (?fbclid=XYZ) generates a unique cache key even though they want the same article. A single popular article shared on Facebook could generate millions of unique cache keys — all for identical content.

Cloudflare, Fastly, and Akamai all maintain curated lists of these parameters and strip them from cache keys by default.


The Vary Header

Vary tells the cache: “this response may differ based on these request headers”. Example:

Vary: Accept-Encoding

This means there are potentially multiple stored versions of the same URL: one for clients that accept gzip, one for brotli, one for uncompressed.

Key expansion for Vary

cache_key = normalize(url) + "|" + canonicalize(vary_headers)

GET /article/1
Accept-Encoding: br          → key: "/article/1|br"
Accept-Encoding: gzip        → key: "/article/1|gzip"
Accept-Encoding: (absent)    → key: "/article/1|"

Common Vary values and their implications:

Vary valueCache behaviorRisk
Accept-EncodingStore one copy per encodingFine; well-enumerated set
Accept-LanguageStore per languageCan explode: 50+ languages
User-AgentStore per UA stringCatastrophic; millions of unique strings
CookieStore per cookieNever do this on shared cache
AuthorizationPer auth tokenResponse must also be private

Vary: * means the response is unique per request and must never be cached in a shared cache. CDNs treat it as uncacheable.

Vary: User-Agent is the most destructive mistake in CDN history. Nginx docs used to recommend it for mobile detection, causing cache hit ratios to collapse to near 0% as every browser sent a unique UA string. The fix: perform device detection at the origin and emit Vary: User-Agent only when necessary, or better, use a normalized X-Device-Type: mobile|desktop header in a custom Vary.


Cache Keying in Production CDNs

Cloudflare Cache Rules

Cloudflare provides a Cache Rules UI and API to configure:

Cache Rule: "Strip marketing params"
  When: hostname matches "example.com"
  Cache Key: exclude query strings "utm_*", "fbclid", "gclid"

Fastly VCL

Fastly’s VCL (Varnish Configuration Language) gives full control:

sub vcl_hash {
    # Normalize host
    set req.hash += req.http.host;

    # Normalize path (lowercase)
    set req.hash += regsuball(req.url.path, "[A-Z]", {"\L&"});

    # Strip tracking params from hash
    declare local var.qs STRING;
    set var.qs = regsuball(req.url.qs,
        "(?:^|&)(?:utm_[^=]*|fbclid|gclid)[^&]*", "");
    set req.hash += regsub(var.qs, "^&", "");

    return(hash);
}

Akamai

Akamai uses “cache ID” rules configured via Property Manager or the APIs. Key parameters can be included/excluded per URL pattern.


Naive vs. Smart Key: The Hit Ratio Impact

The lab demonstrates this directly. Given traffic:

/article/1?utm_source=google&utm_medium=cpc
/article/1?utm_source=facebook&utm_medium=social
/article/1?utm_source=twitter
/article/1        ← direct visit
Key strategyCache hitsHit ratio
Naive (full URL)0/40%
Strip utm_*3/475%
Strip utm_* + normalize4/4100%

In production with millions of requests, this difference is the difference between a $200/month origin bill and a $20,000/month one.


Production Checklist: Cache Key Design

  • Strip all known tracking parameters
  • Sort query string parameters alphabetically
  • Lowercase path components
  • Handle Vary explicitly per resource type
  • Never vary on User-Agent or Cookie in shared cache
  • Use Surrogate-Control to set CDN-specific TTLs
  • Test key normalization with realistic traffic samples

Try It

make lab-05

# Same content, different tracking params — should be HIT on second request
curl "http://localhost:8080/article/1?utm_source=google"
curl "http://localhost:8080/article/1?utm_source=facebook"
# Second request should be a cache HIT

# Different Accept-Encoding → different Vary bucket
curl "http://localhost:8080/article/1" -H "Accept-Encoding: gzip"
curl "http://localhost:8080/article/1" -H "Accept-Encoding: br"
# Both should be stored separately