Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 07 · Stale Content & RFC 5861

Run it: make lab-07
Source: labs/lab-07-stale-content/main.go


The Problem

When a cache entry expires, the CDN must go to the origin to get a fresh copy. During that fetch (typically 80–500 ms), what should the CDN do with incoming requests for that resource?

Two bad options:

  1. Block: Hold all requests until the origin responds. Adds 80–500 ms latency to the first request after every expiry. Under load, hundreds of requests pile up.
  2. Return 503: Refuse to serve. This is almost never acceptable.

The right option: serve the stale response while revalidating in the background. Users get a response immediately; the cache updates asynchronously.

This is the core idea of RFC 5861: HTTP Cache-Control Extensions for Stale Content.


RFC 5861: stale-while-revalidate

Cache-Control: max-age=60, stale-while-revalidate=30

Semantics:

  • Fresh (first 60s): serve from cache without consulting origin
  • Stale-while-revalidate (seconds 61–90): serve stale immediately, and trigger a background revalidation
  • Hard stale (after 90s): must wait for fresh copy
Time (seconds)
0         60        90        ∞
|─ fresh ──|─ SWR  ──|─ stale ─|

t=0:  First fetch. Cache stores response.
t=30: Request → cache HIT (fresh)
t=65: Request → cache MISS (expired, within SWR window)
       → Serve stale immediately (0ms latency spike!)
       → Background goroutine fires origin request
       → When origin responds, update cache entry
t=95: Request → beyond SWR window → must fetch fresh before responding

The user at t=65 receives a response that is 5 seconds old — invisible difference. The user at t=95 waits ~80ms for the origin response.


RFC 5861: stale-if-error

Cache-Control: max-age=60, stale-if-error=86400

If the origin returns a 5xx error, or is unreachable (connection refused, timeout), serve the stale cached copy for up to 86400 seconds (24 hours) beyond the original expiry.

This is the “graceful degradation” directive. Your CDN continues serving content even when the origin is completely down, for up to the specified duration.

t=0:   Origin healthy. Cache populated.
t=100: Cache entry expired (max-age=60 passed)
t=101: Origin request → 500 Internal Server Error
       → stale-if-error: serve cached copy from t=0
       → Client sees their article, not a 500 error

t=86460: stale-if-error window expired
          → If origin still down → return 502 Bad Gateway

The Background Revalidation Pattern

type cacheEntry struct {
    response    *http.Response
    body        []byte
    expiry      time.Time
    swrDeadline time.Time  // expiry + stale-while-revalidate
    sieDeadline time.Time  // expiry + stale-if-error
    revalidating atomic.Bool
}

func (c *Cache) get(key string) (*cacheEntry, string) {
    entry := c.store[key]
    now := time.Now()

    if now.Before(entry.expiry) {
        return entry, "HIT"
    }

    if now.Before(entry.swrDeadline) {
        // Serve stale, kick off background refresh
        if entry.revalidating.CompareAndSwap(false, true) {
            go c.revalidate(key, entry)
        }
        return entry, "STALE-REVALIDATING"
    }

    return nil, "EXPIRED"  // must wait for fresh copy
}

func (c *Cache) revalidate(key string, old *cacheEntry) {
    defer old.revalidating.Store(false)
    
    resp, err := fetch(key)
    if err != nil {
        // Background refresh failed; stale-if-error logic handles it
        return
    }
    c.store[key] = newEntry(resp)
}

The atomic.Bool on revalidating prevents multiple background goroutines for the same key — the first one wins, subsequent SWR requests see revalidating=true and skip launching another goroutine.


Stale-if-Error in Practice

The lab simulates origin failure with a configurable error rate flag. With stale-if-error, the sequence is:

1. Normal operation: cache populated, served fresh
2. Origin starts returning 503 (simulated)
3. CDN: entry is expired + origin erroring
   → Is there a stale-if-error window?
   → Yes → serve stale, log "STALE-ERROR"
4. Origin recovers
   → Next background revalidation succeeds
   → Cache entry updated
5. Normal operation resumes, no user-visible error

This is origin availability decoupled from user experience. For most content (articles, product pages, media), brief staleness is far preferable to a visible error.


When Not to Use stale-while-revalidate

Some content must always be fresh:

Content typeUse SWR?Reason
News articles✓ (short window)Mild staleness acceptable
Product pagesPrice/stock staleness OK for seconds
Authentication stateMust be current
Payment/checkoutCannot serve stale price
Medical informationAccuracy is legal requirement
Real-time scores/feeds✗ (or very short)Value proposition is freshness

Use Cache-Control: no-cache combined with conditional requests (Lab 04) instead of SWR for content where freshness is the product.


Production Detail: Cloudflare’s Default SWR

Cloudflare silently applies always online (a form of stale-if-error) for all cached content by default, serving a cached copy for up to 10 minutes if the origin returns a 5xx error. This is enabled by default and can be disabled per-zone.

Cloudflare also exposes stale-while-revalidate support since 2022, honoring the directive from origin responses.

Fastly’s equivalent is Shielding + Grace period: cached objects can be served from shield for up to a configurable grace period while background revalidation happens.


Timeline Annotation (What the Lab Prints)

t=0s:   /article/1 → MISS → fetch → store (TTL=30s, SWR=15s, SIE=300s)
t=5s:   /article/1 → HIT (25s remaining)
t=30s:  /article/1 → HIT (0s remaining, within SWR window)
         → background revalidation started
t=31s:  /article/1 → HIT (background revalidation complete, reset TTL)
[origin down]
t=50s:  /article/1 → expires → origin 503
         → STALE-IF-ERROR (250s remaining in SIE window)
[origin up]
t=60s:  background revalidation succeeds → back to normal HIT

Try It

make lab-07

# Normal behavior: articles served stale during background revalidation
curl http://localhost:8080/article/1
sleep 35
curl http://localhost:8080/article/1     # served stale, triggers background refresh
curl http://localhost:8080/article/1     # now fresh again

# Simulate origin down:
# Restart lab with --error-rate 1.0 to see stale-if-error in action