Lab 15 · Geographic Routing & PoP Failover
Run it:
make lab-15
Source:labs/lab-15-geo-routing/main.go
The Problem
A CDN node in Singapore is useless to a user in Berlin. Latency on a Singapore → Berlin path is ~160 ms one-way. A Frankfurt PoP would serve Berlin in ~5 ms.
Geographic routing — directing each user to the nearest CDN PoP — is one of the most impactful optimizations in CDN infrastructure. The difference between 160 ms and 5 ms TTFB is the difference between a bounced visitor and a retained one.
Routing Mechanisms
1. Anycast BGP (used by Cloudflare, Fastly)
The same IP address is announced from every PoP via BGP. Internet routing automatically directs packets to the topologically nearest PoP:
209.91.64.22 announced from:
- Frankfurt PoP → European users reach Frankfurt
- Tokyo PoP → Asian users reach Tokyo
- Chicago PoP → US Midwest users reach Chicago
BGP anycast routing is handled entirely by the internet’s routing infrastructure. CDN operator’s job: configure BGP announcements correctly and monitor AS path lengths.
Advantage: Zero application-level routing logic. Failover is automatic (BGP withdraws the broken PoP’s announcement).
Disadvantage: BGP convergence is slow (~30–180 seconds for a prefix withdrawal to propagate globally). A PoP that goes down may continue receiving traffic for minutes.
DNS-level failover is faster (~30 seconds with low TTL), but requires additional coordination.
2. GeoDNS (used by many second-tier CDNs)
DNS returns different IP addresses based on the client’s IP’s geographic region:
User from Germany resolves cdn.example.com:
→ DNS returns 203.0.113.10 (Frankfurt PoP)
User from Japan resolves cdn.example.com:
→ DNS returns 203.0.113.20 (Tokyo PoP)
Advantage: Simple to implement; works with any CDN infrastructure.
Disadvantage: DNS caching (TTL 60s–300s) means failover is slow. During failover, users who cached the old IP get routed to a dead PoP. NXDOMAIN or connection refused until TTL expires.
3. Application-Layer Routing (HTTP Redirect)
User → cdn.example.com → Routing server
→ 302 Redirect to "ams01.cdn.example.com"
This lab implements application-layer routing. A routing server receives all requests, calculates the optimal PoP, and either redirects or proxies to it.
Haversine Distance Calculation
The lab computes geographic distance using the haversine formula, which gives the great-circle distance between two points on a sphere:
func haversine(lat1, lon1, lat2, lon2 float64) float64 {
const R = 6371 // Earth radius in km
φ1 := lat1 * math.Pi / 180
φ2 := lat2 * math.Pi / 180
Δφ := (lat2 - lat1) * math.Pi / 180
Δλ := (lon2 - lon1) * math.Pi / 180
a := math.Sin(Δφ/2)*math.Sin(Δφ/2) +
math.Cos(φ1)*math.Cos(φ2)*
math.Sin(Δλ/2)*math.Sin(Δλ/2)
c := 2 * math.Atan2(math.Sqrt(a), math.Sqrt(1-a))
return R * c // distance in km
}
Given client location, find the closest PoP:
func nearestPoP(clientLat, clientLon float64, pops []PoP) PoP {
var nearest PoP
minDist := math.MaxFloat64
for _, pop := range pops {
if !pop.healthy.Load() { continue } // skip unhealthy PoPs
d := haversine(clientLat, clientLon, pop.Lat, pop.Lon)
if d < minDist {
minDist = d
nearest = pop
}
}
return nearest
}
The 5 PoPs
The lab simulates 5 geographically distributed PoPs:
| PoP | City | Coords | Port |
|---|---|---|---|
| NYC | New York | 40.71°N, 74.00°W | :9010 |
| LHR | London | 51.51°N, 0.13°W | :9011 |
| NRT | Tokyo | 35.65°N, 139.76°E | :9012 |
| SYD | Sydney | 33.87°S, 151.21°E | :9013 |
| GRU | São Paulo | 23.43°S, 46.47°W | :9014 |
Health Checking & Failover
Each PoP exposes a /health endpoint. The router runs periodic health
checks:
type PoP struct {
Name string
Addr string
Lat float64
Lon float64
healthy atomic.Bool
}
func (r *Router) healthCheckLoop() {
ticker := time.NewTicker(5 * time.Second)
for range ticker.C {
for i := range r.pops {
pop := &r.pops[i]
go func() {
resp, err := http.Get(pop.Addr + "/health")
healthy := err == nil && resp.StatusCode == 200
pop.healthy.Store(healthy)
}()
}
}
}
atomic.Bool for the health state means reads in the routing hot path
require no lock. Health checks run concurrently with requests; a false
health state is propagated within one health-check interval.
When the nearest PoP is unhealthy, routing falls back to the next-nearest healthy PoP automatically.
Real-World PoP Selection
Geographic distance is a proxy for network latency, but not a perfect one. BGP path length, network peering relationships, and inter-AS latency can cause a geographically farther PoP to have lower latency.
Production CDNs use active latency measurements:
- Cloudflare Argo: routes traffic based on real-time network telemetry measured across the actual internet paths between PoPs
- Fastly: uses Anycast BGP (network handles routing) plus performance-based override for known poor paths
- AWS CloudFront: uses latency-based routing in Route 53
The haversine approach in this lab is a good approximation (within ~20% of actual latency in most cases) and zero-overhead at runtime.
Client Location Detection
In production, client location comes from:
- IP geolocation: MaxMind GeoLite2 database or IP-API, maps IP → country/city/coords
- CDN headers: Cloudflare adds
CF-IPCountry,CF-IPCity,CF-IPLatitude,CF-IPLongitudeto every request automatically - GPS/browser API: browser can provide precise location (user permission required)
- CDN PoP metadata: the PoP itself knows its geographic location; route users to the PoP they connected to
The lab accepts lat/lon as query parameters for testability.
PoP Infrastructure Design
When selecting where to locate PoPs, the key criteria are:
- Internet Exchange Points (IXPs): co-locate at major IXPs (DE-CIX Frankfurt, AMS-IX Amsterdam, LINX London) for direct peering with hundreds of ISPs, reducing latency and cost
- Traffic density: PoPs near large populations (NYC, London, Tokyo, São Paulo, Mumbai) serve the most users
- Data center tier: Tier 3+ (99.999% uptime, redundant power/cooling)
- Network diversity: multiple transit providers per PoP prevents single-provider outages from taking down the PoP
Try It
make lab-15
# Route a request from NYC (40.71, -74.00) — should go to NYC PoP
curl "http://localhost:8080/?lat=40.71&lon=-74.00" -v
# Route from London (51.51, -0.13) — should go to LHR PoP
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v
# Route from Tokyo — should go to NRT PoP
curl "http://localhost:8080/?lat=35.65&lon=139.76" -v
# Simulate LHR failure — London user should reroute to nearest healthy PoP
curl -X DELETE "http://localhost:8080/pops/LHR"
curl "http://localhost:8080/?lat=51.51&lon=-0.13" -v
# Should now route to NYC or GRU (next closest)