Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lab 14 · Gossip Cluster & Distributed Purge

Run it: make lab-14
Source: labs/lab-14-gossip-cluster/main.go


The Problem

You have 100 CDN edge nodes. Content changes. You need every node to know about the invalidation within seconds.

Why Not a Central Coordinator?

A single “purge coordinator” that notifies all nodes:

Application → Coordinator → [Node1, Node2, ..., Node100]

Problems:

  1. Single point of failure: coordinator down = no purges propagate
  2. O(N) work per purge: coordinator sends 100 messages
  3. N connection overhead: coordinator maintains 100 persistent connections
  4. Thundering herd on coordinator: during deployments, thousands of purges
  5. Partitioned PoPs: nodes behind a network partition miss purges silently

The solution used by Cassandra, CockroachDB, and Cloudflare’s edge network: gossip protocol (epidemic dissemination).


Gossip Protocol: Epidemic Dissemination

Named by analogy to biological epidemics: one infected node tells a few others, who each tell a few more. Within O(log N) rounds, all nodes are informed.

Algorithm:

Round 1: Node A has new information
  → A tells: B, E

Round 2: B and E spread:
  → B tells: C, F
  → E tells: G, D

Round 3: C, F, G, D spread:
  → C tells: H, I
  → F tells: J, K
  → G tells: L, M
  → D tells: N, A (A already knows)

With 100 nodes and fanout=3: propagates to all in log₃(100) ≈ 4.2 rounds

Each round is a small fixed-cost message. Total network messages per gossip cycle: O(N log N). Compare to broadcast: O(N). Gossip is slightly more expensive per event but infinitely more resilient.


hashicorp/memberlist

The memberlist library implements the SWIM protocol (Scalable Weakly-consistent Infection-style Membership protocol) with enhancements from “Lifeguard”:

  • Member discovery: nodes find each other via gossip
  • Failure detection: probe + indirect probe to detect crashes
  • Broadcast: attach arbitrary data to membership messages (e.g., cache purge events)
import "github.com/hashicorp/memberlist"

// Configure the local node
config := memberlist.DefaultLocalConfig()
config.Name     = "edge-node-1"
config.BindAddr = "0.0.0.0"
config.BindPort = 7946
config.Delegate = &myDelegate{}  // receives user data
config.Events   = &myEventDelegate{}  // membership change callbacks

list, err := memberlist.Create(config)

// Join an existing cluster
list.Join([]string{"edge-node-2:7946", "edge-node-3:7946"})

// Broadcast a message to all nodes
list.LocalNode().Meta = []byte("hello")  // meta is broadcast with membership
list.UpdateNode(5 * time.Second)

// Or use the TransmitLimitedQueue for arbitrary messages
queue := &memberlist.TransmitLimitedQueue{
    NumNodes:       func() int { return list.NumMembers() },
    RetransmitMult: 3,
}
queue.QueueBroadcast(&purgeMessage{tag: "article-42"})

Implementing Distributed Cache Purge

1. Purge message format

type PurgeMessage struct {
    ID        string    `json:"id"`        // UUID for deduplication
    Tags      []string  `json:"tags"`
    URLs      []string  `json:"urls"`
    Origin    string    `json:"origin"`    // which node originated the purge
    Timestamp time.Time `json:"ts"`
}

func (m *PurgeMessage) Invalidates() bool { return true }
func (m *PurgeMessage) Message() []byte   { return mustMarshal(m) }
func (m *PurgeMessage) Finished()         {}

2. The Delegate

The memberlist.Delegate interface is how you plug in custom logic:

type cacheDelegate struct {
    queue *memberlist.TransmitLimitedQueue
    seen  sync.Map  // deduplication: message ID → struct{}
}

func (d *cacheDelegate) NotifyMsg(b []byte) {
    var msg PurgeMessage
    json.Unmarshal(b, &msg)
    
    // Deduplication: skip messages we've already processed
    if _, loaded := d.seen.LoadOrStore(msg.ID, struct{}{}); loaded {
        return
    }
    
    // Apply the purge locally
    for _, tag := range msg.Tags { localCache.PurgeByTag(tag) }
    for _, url := range msg.URLs { localCache.Delete(url) }
}

func (d *cacheDelegate) GetBroadcasts(overhead, limit int) [][]byte {
    return d.queue.GetBroadcasts(overhead, limit)
}

3. Gossip anti-entropy

Beyond event-driven purge, gossip implements anti-entropy: nodes periodically compare state with a random peer and reconcile differences. This catches missed messages (due to network partitions, node restarts, message drops under load).

Every 30s:
  → Node A picks random peer B
  → A sends a digest of its cache state (Bloom filter or version vectors)
  → B responds with any items A is missing
  → A applies the delta

This ensures eventual consistency: even if a purge message is dropped, the anti-entropy scan will catch the discrepancy within 30 seconds.


SWIM Protocol: Failure Detection

SWIM’s failure detection is probabilistic but fast:

1. Every T_probe seconds: node A probes random node B with a ping
2. If B doesn't respond within T_timeout:
   → A asks K other random nodes to probe B indirectly (indirect probe)
3. If no indirect probe succeeds:
   → A marks B as SUSPECT, gossips the suspicion
4. If B doesn't refute (send alive message) within T_suspect:
   → B is declared DEAD, gossipped as such
5. Dead members are removed from the ring

This gives O(1) probe messages per node and detects failures in ~3–5 seconds with default settings. Compare to a central heartbeat system: O(N) messages per probe cycle.


Push-Pull Gossip for State Synchronization

memberlist also implements push-pull gossip:

Node A pushes its full local state to random node B
Node B pushes its full local state back to A
Both reconcile differences

This is more expensive (full state exchange) but faster convergence for new nodes joining the cluster. Frequency: once per 30–60 seconds.

For cache invalidation: push-pull can sync the full set of currently-valid cache tags, ensuring a node that was offline for 5 minutes catches up on all purges it missed.


Real-World: Cloudflare’s Cache Purge

Cloudflare’s cache purge propagates across 300+ PoPs using a gossip-adjacent system. Their 2022 blog post describes how a purge request:

  1. Hits Cloudflare’s API endpoint
  2. Is distributed via their internal notification system (similar to gossip)
  3. Reaches all PoPs within 150ms for 95th percentile

At Cloudflare scale, this requires highly optimized serialization (Protocol Buffers), binary gossip protocols, and infrastructure tuned for low-latency small-message delivery.


Try It

make lab-14

# Three nodes form a gossip cluster automatically
# Look for "Cluster formed: 3 members" in the output

# Issue a purge on node 1
curl -X POST http://localhost:8080/cache/purge \
  -H "Content-Type: application/json" \
  -d '{"tags": ["article-42"]}'

# Within ~100ms, the purge propagates to nodes 2 and 3
# Verify by checking their cache state:
curl http://localhost:8081/cache/stats
curl http://localhost:8082/cache/stats
# Both should show article-42 as purged