Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 09 · Cache Tags & Bulk Purge

Run it: make lab-09
Source: labs/lab-09-cache-tags/main.go


The Problem

When content changes, you need to invalidate the cached copies. The naive approach is to invalidate by URL:

DELETE /cache/article/42

But /article/42 might appear under many URLs:

/article/42
/article/42?format=mobile
/api/v1/articles/42
/api/v2/articles/42
/feed/latest      ← includes article 42's content
/user/123/posts   ← includes article 42 if user 123 wrote it
/search?q=keyword ← search results containing article 42

You can’t enumerate every affected URL. Content relationships are graph-shaped, not path-shaped. You need a way to say: “invalidate everything tagged with article-42.”


Surrogate Keys / Cache Tags

A cache tag (also: surrogate key, soft purge tag) is a label you apply to one or more cache entries. When content changes, you purge by tag, and all tagged entries are invalidated simultaneously.

The origin sets tags via a response header:

Surrogate-Key: article-42 author-123 category-tech

or the equivalent headers used by different vendors:

VendorHeader
FastlySurrogate-Key
CloudflareCache-Tag
AkamaiEdge-Control: tag
VarnishX-Tags
AWS CloudFront(custom Lambda@Edge)

The CDN strips this header from responses sent to browsers (it’s a CDN-internal directive) and maintains a tag → URL mapping internally.


Data Structure: Tag → URL Mapping

type TagStore struct {
    mu       sync.RWMutex
    tagToURLs map[string]map[string]struct{}  // tag → set of URLs
    urlToTags map[string][]string             // URL → list of tags
}

func (s *TagStore) Tag(url string, tags []string) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.urlToTags[url] = tags
    for _, tag := range tags {
        if s.tagToURLs[tag] == nil {
            s.tagToURLs[tag] = make(map[string]struct{})
        }
        s.tagToURLs[tag][url] = struct{}{}
    }
}

func (s *TagStore) PurgeByTag(tag string) []string {
    s.mu.Lock()
    defer s.mu.Unlock()
    urls := s.tagToURLs[tag]
    purged := make([]string, 0, len(urls))
    for url := range urls {
        purged = append(purged, url)
        // Remove reverse mapping
        for _, t := range s.urlToTags[url] {
            delete(s.tagToURLs[t], url)
        }
        delete(s.urlToTags, url)
    }
    delete(s.tagToURLs, tag)
    return purged
}

The Purge API

POST /cache/purge
Content-Type: application/json

{"tags": ["article-42", "author-123"]}

Response:

{
  "purged_urls": [
    "/article/42",
    "/article/42?format=mobile",
    "/api/v1/articles/42",
    "/feed/latest"
  ],
  "count": 4
}

The CDN removes those entries from L1 and L2 storage. New requests will trigger origin fetches to repopulate.


Consistency Challenge: Distributed Purge

In a single-node setup, purge is a local operation. In a multi-PoP CDN, a purge request must propagate to every node that may have cached the tagged content.

Approaches:

1. Central purge broadcast

Application → Purge API → Central coordinator
                              │
                    ┌─────────┼─────────┐
                    ▼         ▼         ▼
                 NYC-01    LHR-01    NRT-01

Simple, but the coordinator is a single point of failure. Latency from coordinator to distant PoPs can be 100–200 ms, meaning stale content is served during the propagation window.

2. Gossip-based purge (Lab 14)

Purge messages propagate using epidemic (gossip) protocol. Each node tells a random subset of peers about the purge. Within O(log N) rounds, all nodes are notified. At scale (100+ nodes), gossip is more robust than central broadcast.

3. Versioned cache keys

Embed a content version in the cache key:

cache key = normalize(url) + "|v=" + content_version

When content changes, increment content_version at the application layer. Old entries never get accessed again (they’re naturally evicted). No explicit purge needed. Purge becomes a no-op for versioned content.

This is how Google Cloud CDN and most “CDN for static assets” setups work: immutable assets with fingerprinted URLs (main.abc123.js).


Real-World Usage: CMS Integration

WordPress → publishes article update
   → WP plugin fires: POST /cdn/purge {"tags": ["post-42", "category-8", "tag-php"]}
   → CDN removes:
      - /2025/01/article-about-php
      - /category/php/
      - /tag/php/
      - /                          ← homepage (if it shows recent posts)
      - /sitemap.xml
      - RSS feed entries

Drupal, WordPress, and most CMSs have plugins for Fastly, Cloudflare, and Akamai that fire these purges on content save/publish events.

At The New York Times, Fastly Surrogate-Key purge is used to invalidate all representations of an article simultaneously — the canonical URL, AMP version, app API response, and share preview — with a single purge call containing the article’s surrogate key.


Tag Design Best Practices

PatternExampleNotes
Entity IDarticle-42Always tag with entity type + ID
Entity typearticlesPurge all articles in one call
Authorauthor-123Invalidate author profile changes
Categorycat-techCategory page + all articles in it
Layout templatetemplate-homepageIf homepage template changes
API versionapi-v1Deprecating an API endpoint

Don’t create tags with high cardinality as single values — e.g., a tag per user session is meaningless for shared CDN cache.


Try It

make lab-09

# Tag gets automatically set on origin responses
curl http://localhost:8080/article/42 -v
# Look for Surrogate-Key in origin response (stripped from CDN response to client)

# Article served from cache (HIT)
curl http://localhost:8080/article/42

# Purge by tag → invalidates all tagged entries
curl -X POST http://localhost:8080/cache/purge \
  -H "Content-Type: application/json" \
  -d '{"tags": ["article-42"]}'

# Next request should be a MISS (content re-fetched from origin)
curl http://localhost:8080/article/42