Lab 09 · Cache Tags & Bulk Purge
Run it:
make lab-09
Source:labs/lab-09-cache-tags/main.go
The Problem
When content changes, you need to invalidate the cached copies. The naive approach is to invalidate by URL:
DELETE /cache/article/42
But /article/42 might appear under many URLs:
/article/42
/article/42?format=mobile
/api/v1/articles/42
/api/v2/articles/42
/feed/latest ← includes article 42's content
/user/123/posts ← includes article 42 if user 123 wrote it
/search?q=keyword ← search results containing article 42
You can’t enumerate every affected URL. Content relationships are
graph-shaped, not path-shaped. You need a way to say: “invalidate
everything tagged with article-42.”
Surrogate Keys / Cache Tags
A cache tag (also: surrogate key, soft purge tag) is a label you apply to one or more cache entries. When content changes, you purge by tag, and all tagged entries are invalidated simultaneously.
The origin sets tags via a response header:
Surrogate-Key: article-42 author-123 category-tech
or the equivalent headers used by different vendors:
| Vendor | Header |
|---|---|
| Fastly | Surrogate-Key |
| Cloudflare | Cache-Tag |
| Akamai | Edge-Control: tag |
| Varnish | X-Tags |
| AWS CloudFront | (custom Lambda@Edge) |
The CDN strips this header from responses sent to browsers (it’s a CDN-internal directive) and maintains a tag → URL mapping internally.
Data Structure: Tag → URL Mapping
type TagStore struct {
mu sync.RWMutex
tagToURLs map[string]map[string]struct{} // tag → set of URLs
urlToTags map[string][]string // URL → list of tags
}
func (s *TagStore) Tag(url string, tags []string) {
s.mu.Lock()
defer s.mu.Unlock()
s.urlToTags[url] = tags
for _, tag := range tags {
if s.tagToURLs[tag] == nil {
s.tagToURLs[tag] = make(map[string]struct{})
}
s.tagToURLs[tag][url] = struct{}{}
}
}
func (s *TagStore) PurgeByTag(tag string) []string {
s.mu.Lock()
defer s.mu.Unlock()
urls := s.tagToURLs[tag]
purged := make([]string, 0, len(urls))
for url := range urls {
purged = append(purged, url)
// Remove reverse mapping
for _, t := range s.urlToTags[url] {
delete(s.tagToURLs[t], url)
}
delete(s.urlToTags, url)
}
delete(s.tagToURLs, tag)
return purged
}
The Purge API
POST /cache/purge
Content-Type: application/json
{"tags": ["article-42", "author-123"]}
Response:
{
"purged_urls": [
"/article/42",
"/article/42?format=mobile",
"/api/v1/articles/42",
"/feed/latest"
],
"count": 4
}
The CDN removes those entries from L1 and L2 storage. New requests will trigger origin fetches to repopulate.
Consistency Challenge: Distributed Purge
In a single-node setup, purge is a local operation. In a multi-PoP CDN, a purge request must propagate to every node that may have cached the tagged content.
Approaches:
1. Central purge broadcast
Application → Purge API → Central coordinator
│
┌─────────┼─────────┐
▼ ▼ ▼
NYC-01 LHR-01 NRT-01
Simple, but the coordinator is a single point of failure. Latency from coordinator to distant PoPs can be 100–200 ms, meaning stale content is served during the propagation window.
2. Gossip-based purge (Lab 14)
Purge messages propagate using epidemic (gossip) protocol. Each node tells a random subset of peers about the purge. Within O(log N) rounds, all nodes are notified. At scale (100+ nodes), gossip is more robust than central broadcast.
3. Versioned cache keys
Embed a content version in the cache key:
cache key = normalize(url) + "|v=" + content_version
When content changes, increment content_version at the application
layer. Old entries never get accessed again (they’re naturally evicted).
No explicit purge needed. Purge becomes a no-op for versioned content.
This is how Google Cloud CDN and most “CDN for static assets” setups
work: immutable assets with fingerprinted URLs (main.abc123.js).
Real-World Usage: CMS Integration
WordPress → publishes article update
→ WP plugin fires: POST /cdn/purge {"tags": ["post-42", "category-8", "tag-php"]}
→ CDN removes:
- /2025/01/article-about-php
- /category/php/
- /tag/php/
- / ← homepage (if it shows recent posts)
- /sitemap.xml
- RSS feed entries
Drupal, WordPress, and most CMSs have plugins for Fastly, Cloudflare, and Akamai that fire these purges on content save/publish events.
At The New York Times, Fastly Surrogate-Key purge is used to invalidate all representations of an article simultaneously — the canonical URL, AMP version, app API response, and share preview — with a single purge call containing the article’s surrogate key.
Tag Design Best Practices
| Pattern | Example | Notes |
|---|---|---|
| Entity ID | article-42 | Always tag with entity type + ID |
| Entity type | articles | Purge all articles in one call |
| Author | author-123 | Invalidate author profile changes |
| Category | cat-tech | Category page + all articles in it |
| Layout template | template-homepage | If homepage template changes |
| API version | api-v1 | Deprecating an API endpoint |
Don’t create tags with high cardinality as single values — e.g., a tag per user session is meaningless for shared CDN cache.
Try It
make lab-09
# Tag gets automatically set on origin responses
curl http://localhost:8080/article/42 -v
# Look for Surrogate-Key in origin response (stripped from CDN response to client)
# Article served from cache (HIT)
curl http://localhost:8080/article/42
# Purge by tag → invalidates all tagged entries
curl -X POST http://localhost:8080/cache/purge \
-H "Content-Type: application/json" \
-d '{"tags": ["article-42"]}'
# Next request should be a MISS (content re-fetched from origin)
curl http://localhost:8080/article/42