Lab 07 · Stale Content & RFC 5861
Run it:
make lab-07
Source:labs/lab-07-stale-content/main.go
The Problem
When a cache entry expires, the CDN must go to the origin to get a fresh copy. During that fetch (typically 80–500 ms), what should the CDN do with incoming requests for that resource?
Two bad options:
- Block: Hold all requests until the origin responds. Adds 80–500 ms latency to the first request after every expiry. Under load, hundreds of requests pile up.
- Return 503: Refuse to serve. This is almost never acceptable.
The right option: serve the stale response while revalidating in the background. Users get a response immediately; the cache updates asynchronously.
This is the core idea of RFC 5861: HTTP Cache-Control Extensions for Stale Content.
RFC 5861: stale-while-revalidate
Cache-Control: max-age=60, stale-while-revalidate=30
Semantics:
- Fresh (first 60s): serve from cache without consulting origin
- Stale-while-revalidate (seconds 61–90): serve stale immediately, and trigger a background revalidation
- Hard stale (after 90s): must wait for fresh copy
Time (seconds)
0 60 90 ∞
|─ fresh ──|─ SWR ──|─ stale ─|
t=0: First fetch. Cache stores response.
t=30: Request → cache HIT (fresh)
t=65: Request → cache MISS (expired, within SWR window)
→ Serve stale immediately (0ms latency spike!)
→ Background goroutine fires origin request
→ When origin responds, update cache entry
t=95: Request → beyond SWR window → must fetch fresh before responding
The user at t=65 receives a response that is 5 seconds old — invisible difference. The user at t=95 waits ~80ms for the origin response.
RFC 5861: stale-if-error
Cache-Control: max-age=60, stale-if-error=86400
If the origin returns a 5xx error, or is unreachable (connection refused, timeout), serve the stale cached copy for up to 86400 seconds (24 hours) beyond the original expiry.
This is the “graceful degradation” directive. Your CDN continues serving content even when the origin is completely down, for up to the specified duration.
t=0: Origin healthy. Cache populated.
t=100: Cache entry expired (max-age=60 passed)
t=101: Origin request → 500 Internal Server Error
→ stale-if-error: serve cached copy from t=0
→ Client sees their article, not a 500 error
t=86460: stale-if-error window expired
→ If origin still down → return 502 Bad Gateway
The Background Revalidation Pattern
type cacheEntry struct {
response *http.Response
body []byte
expiry time.Time
swrDeadline time.Time // expiry + stale-while-revalidate
sieDeadline time.Time // expiry + stale-if-error
revalidating atomic.Bool
}
func (c *Cache) get(key string) (*cacheEntry, string) {
entry := c.store[key]
now := time.Now()
if now.Before(entry.expiry) {
return entry, "HIT"
}
if now.Before(entry.swrDeadline) {
// Serve stale, kick off background refresh
if entry.revalidating.CompareAndSwap(false, true) {
go c.revalidate(key, entry)
}
return entry, "STALE-REVALIDATING"
}
return nil, "EXPIRED" // must wait for fresh copy
}
func (c *Cache) revalidate(key string, old *cacheEntry) {
defer old.revalidating.Store(false)
resp, err := fetch(key)
if err != nil {
// Background refresh failed; stale-if-error logic handles it
return
}
c.store[key] = newEntry(resp)
}
The atomic.Bool on revalidating prevents multiple background
goroutines for the same key — the first one wins, subsequent SWR requests
see revalidating=true and skip launching another goroutine.
Stale-if-Error in Practice
The lab simulates origin failure with a configurable error rate flag.
With stale-if-error, the sequence is:
1. Normal operation: cache populated, served fresh
2. Origin starts returning 503 (simulated)
3. CDN: entry is expired + origin erroring
→ Is there a stale-if-error window?
→ Yes → serve stale, log "STALE-ERROR"
4. Origin recovers
→ Next background revalidation succeeds
→ Cache entry updated
5. Normal operation resumes, no user-visible error
This is origin availability decoupled from user experience. For most content (articles, product pages, media), brief staleness is far preferable to a visible error.
When Not to Use stale-while-revalidate
Some content must always be fresh:
| Content type | Use SWR? | Reason |
|---|---|---|
| News articles | ✓ (short window) | Mild staleness acceptable |
| Product pages | ✓ | Price/stock staleness OK for seconds |
| Authentication state | ✗ | Must be current |
| Payment/checkout | ✗ | Cannot serve stale price |
| Medical information | ✗ | Accuracy is legal requirement |
| Real-time scores/feeds | ✗ (or very short) | Value proposition is freshness |
Use Cache-Control: no-cache combined with conditional requests (Lab 04)
instead of SWR for content where freshness is the product.
Production Detail: Cloudflare’s Default SWR
Cloudflare silently applies always online (a form of stale-if-error)
for all cached content by default, serving a cached copy for up to 10
minutes if the origin returns a 5xx error. This is enabled by default
and can be disabled per-zone.
Cloudflare also exposes stale-while-revalidate support since 2022,
honoring the directive from origin responses.
Fastly’s equivalent is Shielding + Grace period: cached objects can be served from shield for up to a configurable grace period while background revalidation happens.
Timeline Annotation (What the Lab Prints)
t=0s: /article/1 → MISS → fetch → store (TTL=30s, SWR=15s, SIE=300s)
t=5s: /article/1 → HIT (25s remaining)
t=30s: /article/1 → HIT (0s remaining, within SWR window)
→ background revalidation started
t=31s: /article/1 → HIT (background revalidation complete, reset TTL)
[origin down]
t=50s: /article/1 → expires → origin 503
→ STALE-IF-ERROR (250s remaining in SIE window)
[origin up]
t=60s: background revalidation succeeds → back to normal HIT
Try It
make lab-07
# Normal behavior: articles served stale during background revalidation
curl http://localhost:8080/article/1
sleep 35
curl http://localhost:8080/article/1 # served stale, triggers background refresh
curl http://localhost:8080/article/1 # now fresh again
# Simulate origin down:
# Restart lab with --error-rate 1.0 to see stale-if-error in action