Lab 05 · Cache Key Design
Run it:
make lab-05
Source:labs/lab-05-cache-key-design/main.go
The Problem
A cache key is the identifier under which a response is stored and looked up. If the key is wrong, everything downstream breaks:
- Too narrow (same key for different content): serve wrong content to wrong user, or collapse variations into one response.
- Too wide (include irrelevant query params): artificially low hit ratio, wasted storage, repeated origin fetches for identical content.
In production, cache key design is one of the highest-leverage activities a CDN engineer performs. A 10-minute key design review can improve hit ratio from 70% to 95%.
The Basic Key: URL Normalization
The naive key is req.URL.String(). This breaks immediately when:
/article/1?foo=bar # different key from
/article/1?bar=foo # same semantically
Query parameters don’t have a defined order. Two requests for the same resource with parameters in different order are identical, but a naive cache sees them as different URLs.
Normalization steps (applied by Lab 05):
func normalizeKey(u *url.URL) string {
// 1. Lowercase the path
u.Path = strings.ToLower(u.Path)
// 2. Remove tracking parameters
q := u.Query()
for _, p := range trackingParams {
q.Del(p)
}
// 3. Sort remaining parameters deterministically
for k, v := range q {
sort.Strings(v)
}
keys := make([]string, 0, len(q))
for k := range q { keys = append(keys, k) }
sort.Strings(keys)
// Rebuild in sorted order
...
// 4. Strip fragment (#anchor — never sent to server but can appear in
// reconstructed URLs)
u.Fragment = ""
}
Tracking Parameter Pollution
Marketing teams append tracking parameters to every URL. These are meaningless to the origin but fragment your cache:
| Parameter | Source |
|---|---|
utm_source, utm_medium, utm_campaign, utm_term, utm_content | Google Analytics |
fbclid | Facebook click ID |
gclid, gad_source | Google Ads |
mc_eid | Mailchimp email ID |
_ga | Google Analytics cross-domain |
msclkid | Microsoft Ads |
twclid | Twitter click ID |
ref, referral | Generic referral parameters |
Without stripping these, each user who clicks a Facebook ad link (?fbclid=XYZ)
generates a unique cache key even though they want the same article. A
single popular article shared on Facebook could generate millions of unique
cache keys — all for identical content.
Cloudflare, Fastly, and Akamai all maintain curated lists of these parameters and strip them from cache keys by default.
The Vary Header
Vary tells the cache: “this response may differ based on these
request headers”. Example:
Vary: Accept-Encoding
This means there are potentially multiple stored versions of the same URL: one for clients that accept gzip, one for brotli, one for uncompressed.
Key expansion for Vary
cache_key = normalize(url) + "|" + canonicalize(vary_headers)
GET /article/1
Accept-Encoding: br → key: "/article/1|br"
Accept-Encoding: gzip → key: "/article/1|gzip"
Accept-Encoding: (absent) → key: "/article/1|"
Common Vary values and their implications:
| Vary value | Cache behavior | Risk |
|---|---|---|
Accept-Encoding | Store one copy per encoding | Fine; well-enumerated set |
Accept-Language | Store per language | Can explode: 50+ languages |
User-Agent | Store per UA string | Catastrophic; millions of unique strings |
Cookie | Store per cookie | Never do this on shared cache |
Authorization | Per auth token | Response must also be private |
Vary: * means the response is unique per request and must never be
cached in a shared cache. CDNs treat it as uncacheable.
Vary: User-Agent is the most destructive mistake in CDN history.
Nginx docs used to recommend it for mobile detection, causing cache hit
ratios to collapse to near 0% as every browser sent a unique UA string.
The fix: perform device detection at the origin and emit Vary: User-Agent
only when necessary, or better, use a normalized X-Device-Type: mobile|desktop
header in a custom Vary.
Cache Keying in Production CDNs
Cloudflare Cache Rules
Cloudflare provides a Cache Rules UI and API to configure:
Cache Rule: "Strip marketing params"
When: hostname matches "example.com"
Cache Key: exclude query strings "utm_*", "fbclid", "gclid"
Fastly VCL
Fastly’s VCL (Varnish Configuration Language) gives full control:
sub vcl_hash {
# Normalize host
set req.hash += req.http.host;
# Normalize path (lowercase)
set req.hash += regsuball(req.url.path, "[A-Z]", {"\L&"});
# Strip tracking params from hash
declare local var.qs STRING;
set var.qs = regsuball(req.url.qs,
"(?:^|&)(?:utm_[^=]*|fbclid|gclid)[^&]*", "");
set req.hash += regsub(var.qs, "^&", "");
return(hash);
}
Akamai
Akamai uses “cache ID” rules configured via Property Manager or the APIs. Key parameters can be included/excluded per URL pattern.
Naive vs. Smart Key: The Hit Ratio Impact
The lab demonstrates this directly. Given traffic:
/article/1?utm_source=google&utm_medium=cpc
/article/1?utm_source=facebook&utm_medium=social
/article/1?utm_source=twitter
/article/1 ← direct visit
| Key strategy | Cache hits | Hit ratio |
|---|---|---|
| Naive (full URL) | 0/4 | 0% |
| Strip utm_* | 3/4 | 75% |
| Strip utm_* + normalize | 4/4 | 100% |
In production with millions of requests, this difference is the difference between a $200/month origin bill and a $20,000/month one.
Production Checklist: Cache Key Design
- Strip all known tracking parameters
- Sort query string parameters alphabetically
- Lowercase path components
- Handle
Varyexplicitly per resource type - Never vary on
User-AgentorCookiein shared cache - Use
Surrogate-Controlto set CDN-specific TTLs - Test key normalization with realistic traffic samples
Try It
make lab-05
# Same content, different tracking params — should be HIT on second request
curl "http://localhost:8080/article/1?utm_source=google"
curl "http://localhost:8080/article/1?utm_source=facebook"
# Second request should be a cache HIT
# Different Accept-Encoding → different Vary bucket
curl "http://localhost:8080/article/1" -H "Accept-Encoding: gzip"
curl "http://localhost:8080/article/1" -H "Accept-Encoding: br"
# Both should be stored separately