March 5, 2026

Thundering Herd

A concurrency bug in my Notion Image CDN caused a “thundering herd” cache stampede that could OOM-kill the service under real traffic—fixed by implementing a Go-style singleflight request deduplication.

How a Thundering Herd Bug Almost Killed My Image CDN#

A pre-production review of my Notion Image CDN uncovered a concurrency bug that would OOM-kill the service under real traffic. Here's the deep dive into what happened, why it's invisible in dev, and the fix I borrowed from Go's standard library.

The Setup#

I built Notion Image CDN, a self-hosted image proxy sitting between Notion's expiring S3 URLs and your frontend. It fetches an image from Notion, runs it through Sharp (resize, WebP/AVIF conversion), caches the result, and serves it through a clean /img/:workspaceId/:blockId/:filename URL.

The architecture on paper:

blog image

During a pre-production hardening pass, I asked one question: "What happens when 100 users request the same uncached image at the same time?"

The answer wasn't great.

Death by a Thousand Fetches#

Here's the original pipeline code (simplified):

export async function runImagePipeline(
  cacheBaseUrl,
  request,
  reply,
  config,
  storage,
  edgeCache
) {
  const cacheKey = generateCacheKey(cacheBaseUrl, transformOptions);

  // Check L2 edge cache
  const l2Hit = await edgeCache.get(cacheKey);
  if (l2Hit) {
    sendResponse(reply, l2Hit);
    return;
  }

  // Check L3 persistent storage
  const l3Hit = await storage.get(cacheKey);
  if (l3Hit) {
    sendResponse(reply, l3Hit);
    return;
  }

  // 🔴 Cache MISS - fetch from Notion
  const image = await fetchUpstreamImage(url, config);
  const optimized = await optimizeImage(image.data, transform); // Sharp: CPU-heavy
  await storage.put(cacheKey, optimized); // Write to S3/disk
  sendResponse(reply, optimized);
}

The bug is invisible when you're the only user. But when a newsletter drops and 500 readers hit the same blog post within one second:

blog image

Every single request:

Misses the cache — Request 1 hasn't finished storing yet, so everyone sees a cold cache
Fetches the full image from Notion — N duplicate HTTP requests to the same URL
Runs Sharp — N CPU-bound processes fighting over a single Node.js thread pool
Writes to storage — N redundant puts to S3

With N=100 and a 5MB source image: 500MB of raw buffers in memory, plus Sharp's internal allocations. On a $5/month 512MB container, that's an instant OOM kill.

What the Thundering Herd Actually Is#

The Thundering Herd (also called "Cache Stampede" or "Dog Pile") happens when a large number of processes simultaneously try to access a shared resource that isn't available yet, causing a resource spike that overwhelms the system.

The term originates from the Unix kernel. When a lock was released, all sleeping processes would wake up; the herd thunders awake; even though only one could acquire the lock. The rest burned CPU and context-switch time just to go back to sleep.

In web services, the pattern shows up around caches:

blog image

Where this bites you in the wild:

System	The "Herd"	The "Thunder"
CDN (Cloudflare, Fastly)	Cache key expiration	All edge nodes fetch from origin
Database (PostgreSQL)	Lock release	All blocking transactions wake up
DNS	TTL expiration	All resolvers query authoritative server
Image CDN (this case)	First request for new asset	All concurrent requests bypass cache
Load Balancer	New backend becomes healthy	All queued connections route to it

The common thread: a brief window where a shared resource is unavailable, and everyone independently decides to go fetch it themselves.

The Fix: Singleflight#

The pattern comes from Go's standard library: singleflight. The idea: if work for a key is already in-flight, don't start new work, wait for the leader and share the result.

blog image

Before: N requests → N fetches, N optimizations, N storage writes After: N requests → 1 fetch, 1 optimization, 1 storage write + N response sends

The Code (70 lines)#

The actual Singleflight class:

// packages/service/src/lib/singleflight.ts

export interface FlightResult<T> {
  value: T; // The resolved value from the leader
  coalesced: boolean; // true if this caller joined an existing flight
}

export class Singleflight<T> {
  private readonly flights = new Map<string, Promise<T>>();

  async do(key: string, fn: () => Promise<T>): Promise<FlightResult<T>> {
    const existing = this.flights.get(key);
    if (existing) {
      // Follower: just wait for the leader's promise
      const value = await existing;
      return { value, coalesced: true };
    }

    // Leader: execute and register the promise
    const promise = fn();
    this.flights.set(key, promise);

    try {
      const value = await promise;
      return { value, coalesced: false };
    } finally {
      this.flights.delete(key); // Always clean up
    }
  }
}

The trick: a JavaScript Promise can be await-ed by multiple callers. Store the leader's promise in a Map, every follower for the same key awaits that same promise. When the leader resolves, all followers resolve simultaneously. Zero extra work.

The finally block matters, if the leader throws, the key still gets cleaned up. The next request after a failure starts a fresh flight. No permanent poisoning of the map.

Pipeline Integration#

The pipeline wraps the expensive fetch+optimize+store path inside the singleflight gate:

// image-pipeline.ts (simplified)

const originFlight = new Singleflight<PipelineOutcome>();

async function runImagePipeline(...) {
  // L2 and L3 cache checks happen BEFORE the gate
  // (cache hits are already fast — no coalescing needed)

  const { value: outcome, coalesced } = await originFlight.do(cacheKey, async () => {
    const image = await fetchUpstreamImage(url);    // ← Only ONE request
    const optimized = await optimizeImage(image);    // ← Only ONE Sharp instance
    await storage.put(cacheKey, optimized);           // ← Only ONE write
    return { data: optimized.data, contentType: optimized.contentType };
  });

  if (coalesced) {
    request.log.info({ cacheKey }, 'Request coalesced - served from in-flight leader');
  }

  // Each request sends its OWN HTTP response using the shared result
  sendImageResponse(reply, outcome.data, outcome.contentType, ...);
}

Cache checks live outside the gate, they're already fast. Only the expensive origin fetch path gets coalesced.

Why You'll Never Catch This in Dev#

In development:

One user. Requests arrive sequentially.
Cache stays warm after the first hit.
16GB+ of RAM on your laptop.

In production:

500 users hit the same URL within 50ms.
Node.js event loop interleaves their cache checks, all miss before any write completes.
512MB container.
Notion S3 rate-limits you after ~50 req/s.

This is the classic gap between "works on my machine" and "works under real load." Load testing bridges it. Hardening reviews catch the rest.

The Numbers#

Metric	Before (N=100)	After (N=100)	Change
Upstream fetches	100	1	99% fewer
Sharp instances	100	1	99% fewer
Storage writes	100	1	99% fewer
Peak memory	~500MB	~10MB	98% reduction
Upstream bandwidth	500MB	5MB	99% reduction
Response time (P99)	timeout	<200ms	orders of magnitude

The cost of the singleflight itself is negligible; one Map.get() + one Map.set() per flight, plus the promise resolution fan-out. The overhead lives entirely in memory for the promise reference, which is a few bytes per follower.

Edge Cases Worth Thinking About#

Leader failure propagation. The finally block deletes the key, so the next request starts a fresh flight. Errors propagate to all followers, they all get the same rejection. This is correct behavior: if upstream is down, everyone should know.

Different transform parameters. The cacheKey includes query params like w=800&fmt=webp, so ?w=800 and ?w=400 produce different keys with independent flights. No cross-contamination.

Memory leaks. The Map only holds entries for active flights. Once a flight resolves or rejects, the entry is removed. Map size equals the count of unique in-progress cache keys, typically single digits in normal operation.

Multi-instance deployments. Singleflight is per-process. Three instances behind a load balancer means up to three flights for the same key. For global deduplication, you'd need distributed locking (Redis SET NX with TTL), which adds latency and failure modes. Three is a lot better than three hundred, so the per-process approach is often good enough.

Long leader processing time. If the leader takes 30 seconds (huge image, slow upstream), every follower is blocked for 30 seconds too. This is still better than 100 independent 30-second requests, but you should pair singleflight with upstream timeouts. If the leader doesn't finish within a budget, followers should fail fast rather than stacking up indefinitely.

Where Else This Applies#

Singleflight isn't limited to image CDNs. Anywhere you have:

A cache miss path that's expensive
High concurrency on the same key
A tolerance for slightly stale follower responses

You can apply this. DNS resolvers use it. Kubernetes API server uses it. Cloudflare's edge cache uses a variant of it.

The pattern also naturally pairs with:

Stale-while-revalidate: Serve the expired cached version while one leader refreshes it in the background.
Probabilistic early expiration: Randomly refresh cache entries before they expire, reducing the chance of a stampede at TTL boundaries.
Mutex/locking approaches: Redis SET NX with a short TTL to elect a single leader across distributed nodes.

Each has trade-offs. Stale-while-revalidate adds complexity around what "stale" means for your content. Probabilistic expiration wastes some upstream bandwidth on unnecessary refreshes. Redis locking introduces a network hop and a failure mode if Redis is down.

For a single-instance image CDN, in-process singleflight is the right tool. Simple, zero external dependencies, handles the exact failure mode.

Takeaways#

Cache misses are the danger zone. Cold deploys, TTL expirations, first requests for new content — these windows are where thundering herds form. If your cache miss path scales linearly with concurrent requests, you have this bug. It'll just wait for enough traffic to show itself.

The fix is small. 70 lines of utility code, one wrapper around the expensive path. The hard part is asking the right question: "What happens when a thousand users do this at once?"

Found this useful? Star the Notion Image CDN repo and follow me for more production backend deep dives.

Support my work

If this post was useful, consider supporting my open source work and independent writing.

Sponsor on GitHub Buy me a coffee

Back to Blogs

March 5, 2026

Thundering Herd

How a Thundering Herd Bug Almost Killed My Image CDN#

A pre-production review of my Notion Image CDN uncovered a concurrency bug that would OOM-kill the service under real traffic. Here's the deep dive into what happened, why it's invisible in dev, and the fix I borrowed from Go's standard library.

The Setup#

The architecture on paper:

blog image

During a pre-production hardening pass, I asked one question: "What happens when 100 users request the same uncached image at the same time?"

The answer wasn't great.

Death by a Thousand Fetches#

Here's the original pipeline code (simplified):

export async function runImagePipeline(
  cacheBaseUrl,
  request,
  reply,
  config,
  storage,
  edgeCache
) {
  const cacheKey = generateCacheKey(cacheBaseUrl, transformOptions);

  // Check L2 edge cache
  const l2Hit = await edgeCache.get(cacheKey);
  if (l2Hit) {
    sendResponse(reply, l2Hit);
    return;
  }

  // Check L3 persistent storage
  const l3Hit = await storage.get(cacheKey);
  if (l3Hit) {
    sendResponse(reply, l3Hit);
    return;
  }

  // 🔴 Cache MISS - fetch from Notion
  const image = await fetchUpstreamImage(url, config);
  const optimized = await optimizeImage(image.data, transform); // Sharp: CPU-heavy
  await storage.put(cacheKey, optimized); // Write to S3/disk
  sendResponse(reply, optimized);
}

The bug is invisible when you're the only user. But when a newsletter drops and 500 readers hit the same blog post within one second:

blog image

Every single request:

Misses the cache — Request 1 hasn't finished storing yet, so everyone sees a cold cache
Fetches the full image from Notion — N duplicate HTTP requests to the same URL
Runs Sharp — N CPU-bound processes fighting over a single Node.js thread pool
Writes to storage — N redundant puts to S3

With N=100 and a 5MB source image: 500MB of raw buffers in memory, plus Sharp's internal allocations. On a $5/month 512MB container, that's an instant OOM kill.

What the Thundering Herd Actually Is#

In web services, the pattern shows up around caches:

blog image

Where this bites you in the wild:

System	The "Herd"	The "Thunder"
CDN (Cloudflare, Fastly)	Cache key expiration	All edge nodes fetch from origin
Database (PostgreSQL)	Lock release	All blocking transactions wake up
DNS	TTL expiration	All resolvers query authoritative server
Image CDN (this case)	First request for new asset	All concurrent requests bypass cache
Load Balancer	New backend becomes healthy	All queued connections route to it

The common thread: a brief window where a shared resource is unavailable, and everyone independently decides to go fetch it themselves.

The Fix: Singleflight#

The pattern comes from Go's standard library: singleflight. The idea: if work for a key is already in-flight, don't start new work, wait for the leader and share the result.

blog image

Before: N requests → N fetches, N optimizations, N storage writes After: N requests → 1 fetch, 1 optimization, 1 storage write + N response sends

The Code (70 lines)#

The actual Singleflight class:

// packages/service/src/lib/singleflight.ts

export interface FlightResult<T> {
  value: T; // The resolved value from the leader
  coalesced: boolean; // true if this caller joined an existing flight
}

export class Singleflight<T> {
  private readonly flights = new Map<string, Promise<T>>();

  async do(key: string, fn: () => Promise<T>): Promise<FlightResult<T>> {
    const existing = this.flights.get(key);
    if (existing) {
      // Follower: just wait for the leader's promise
      const value = await existing;
      return { value, coalesced: true };
    }

    // Leader: execute and register the promise
    const promise = fn();
    this.flights.set(key, promise);

    try {
      const value = await promise;
      return { value, coalesced: false };
    } finally {
      this.flights.delete(key); // Always clean up
    }
  }
}

The finally block matters, if the leader throws, the key still gets cleaned up. The next request after a failure starts a fresh flight. No permanent poisoning of the map.

Pipeline Integration#

The pipeline wraps the expensive fetch+optimize+store path inside the singleflight gate:

// image-pipeline.ts (simplified)

const originFlight = new Singleflight<PipelineOutcome>();

async function runImagePipeline(...) {
  // L2 and L3 cache checks happen BEFORE the gate
  // (cache hits are already fast — no coalescing needed)

  const { value: outcome, coalesced } = await originFlight.do(cacheKey, async () => {
    const image = await fetchUpstreamImage(url);    // ← Only ONE request
    const optimized = await optimizeImage(image);    // ← Only ONE Sharp instance
    await storage.put(cacheKey, optimized);           // ← Only ONE write
    return { data: optimized.data, contentType: optimized.contentType };
  });

  if (coalesced) {
    request.log.info({ cacheKey }, 'Request coalesced - served from in-flight leader');
  }

  // Each request sends its OWN HTTP response using the shared result
  sendImageResponse(reply, outcome.data, outcome.contentType, ...);
}

Cache checks live outside the gate, they're already fast. Only the expensive origin fetch path gets coalesced.

Why You'll Never Catch This in Dev#

In development:

One user. Requests arrive sequentially.
Cache stays warm after the first hit.
16GB+ of RAM on your laptop.

In production:

500 users hit the same URL within 50ms.
Node.js event loop interleaves their cache checks, all miss before any write completes.
512MB container.
Notion S3 rate-limits you after ~50 req/s.

This is the classic gap between "works on my machine" and "works under real load." Load testing bridges it. Hardening reviews catch the rest.

The Numbers#

Metric	Before (N=100)	After (N=100)	Change
Upstream fetches	100	1	99% fewer
Sharp instances	100	1	99% fewer
Storage writes	100	1	99% fewer
Peak memory	~500MB	~10MB	98% reduction
Upstream bandwidth	500MB	5MB	99% reduction
Response time (P99)	timeout	<200ms	orders of magnitude

Edge Cases Worth Thinking About#

Different transform parameters. The cacheKey includes query params like w=800&fmt=webp, so ?w=800 and ?w=400 produce different keys with independent flights. No cross-contamination.

Where Else This Applies#

Singleflight isn't limited to image CDNs. Anywhere you have:

A cache miss path that's expensive
High concurrency on the same key
A tolerance for slightly stale follower responses

You can apply this. DNS resolvers use it. Kubernetes API server uses it. Cloudflare's edge cache uses a variant of it.

The pattern also naturally pairs with:

Stale-while-revalidate: Serve the expired cached version while one leader refreshes it in the background.
Probabilistic early expiration: Randomly refresh cache entries before they expire, reducing the chance of a stampede at TTL boundaries.
Mutex/locking approaches: Redis SET NX with a short TTL to elect a single leader across distributed nodes.

For a single-instance image CDN, in-process singleflight is the right tool. Simple, zero external dependencies, handles the exact failure mode.

Takeaways#

The fix is small. 70 lines of utility code, one wrapper around the expensive path. The hard part is asking the right question: "What happens when a thousand users do this at once?"

Found this useful? Star the Notion Image CDN repo and follow me for more production backend deep dives.

Support my work

If this post was useful, consider supporting my open source work and independent writing.

Sponsor on GitHub Buy me a coffee

Back to Blogs