A concurrency bug in my Notion Image CDN caused a “thundering herd” cache stampede that could OOM-kill the service under real traffic—fixed by implementing a Go-style singleflight request deduplication.
A pre-production review of my Notion Image CDN uncovered a concurrency bug that would OOM-kill the service under real traffic. Here's the deep dive into what happened, why it's invisible in dev, and the fix I borrowed from Go's standard library.
I built Notion Image CDN — a self-hosted image proxy sitting between Notion's expiring S3 URLs and your frontend. It fetches an image from Notion, runs it through Sharp (resize, WebP/AVIF conversion), caches the result, and serves it through a clean /img/:workspaceId/:blockId/:filename URL.
The architecture on paper:
During a pre-production hardening pass, I asked one question: "What happens when 100 users request the same uncached image at the same time?"
The answer wasn't great.
Here's the original pipeline code (simplified):
export async function runImagePipeline(
cacheBaseUrl,
request,
reply,
config,
storage,
edgeCache
) {
const cacheKey = generateCacheKey(cacheBaseUrl, transformOptions);
// Check L2 edge cache
const l2Hit = await edgeCache.get(cacheKey);
if (l2Hit) {
sendResponse(reply, l2Hit);
return;
}
// Check L3 persistent storage
const l3Hit = await storage.get(cacheKey);
if (l3Hit) {
sendResponse(reply, l3Hit);
return;
}
// 🔴 Cache MISS — fetch from Notion
const image = await fetchUpstreamImage(url, config);
const optimized = await optimizeImage(image.data, transform); // Sharp: CPU-heavy
await storage.put(cacheKey, optimized); // Write to S3/disk
sendResponse(reply, optimized);
}
The bug is invisible when you're the only user. But when a newsletter drops and 500 readers hit the same blog post within one second:
Every single request:
With N=100 and a 5MB source image: 500MB of raw buffers in memory, plus Sharp's internal allocations. On a $5/month 512MB container, that's an instant OOM kill.
The Thundering Herd (also called "Cache Stampede" or "Dog Pile") happens when a large number of processes simultaneously try to access a shared resource that isn't available yet, causing a resource spike that overwhelms the system.
The term originates from the Unix kernel. When a lock was released, all sleeping processes would wake up — the herd thunders awake — even though only one could acquire the lock. The rest burned CPU and context-switch time just to go back to sleep.
In web services, the pattern shows up around caches:
Where this bites you in the wild:
| System | The "Herd" | The "Thunder" |
|---|---|---|
| CDN (Cloudflare, Fastly) | Cache key expiration | All edge nodes fetch from origin |
| Database (PostgreSQL) | Lock release | All blocking transactions wake up |
| DNS | TTL expiration | All resolvers query authoritative server |
| Image CDN (this case) | First request for new asset | All concurrent requests bypass cache |
| Load Balancer | New backend becomes healthy | All queued connections route to it |
The common thread: a brief window where a shared resource is unavailable, and everyone independently decides to go fetch it themselves.
The pattern comes from Go's standard library: singleflight. The idea: if work for a key is already in-flight, don't start new work — wait for the leader and share the result.
Before: N requests → N fetches, N optimizations, N storage writes After: N requests → 1 fetch, 1 optimization, 1 storage write + N response sends
The actual Singleflight class:
// packages/service/src/lib/singleflight.ts
export interface FlightResult<T> {
value: T; // The resolved value from the leader
coalesced: boolean; // true if this caller joined an existing flight
}
export class Singleflight<T> {
private readonly flights = new Map<string, Promise<T>>();
async do(key: string, fn: () => Promise<T>): Promise<FlightResult<T>> {
const existing = this.flights.get(key);
if (existing) {
// Follower: just wait for the leader's promise
const value = await existing;
return { value, coalesced: true };
}
// Leader: execute and register the promise
const promise = fn();
this.flights.set(key, promise);
try {
const value = await promise;
return { value, coalesced: false };
} finally {
this.flights.delete(key); // Always clean up
}
}
}
The trick: a JavaScript Promise can be await-ed by multiple callers. Store the leader's promise in a Map, every follower for the same key awaits that same promise. When the leader resolves, all followers resolve simultaneously. Zero extra work.
The finally block matters — if the leader throws, the key still gets cleaned up. The next request after a failure starts a fresh flight. No permanent poisoning of the map.
The pipeline wraps the expensive fetch+optimize+store path inside the singleflight gate:
// image-pipeline.ts (simplified)
const originFlight = new Singleflight<PipelineOutcome>();
async function runImagePipeline(...) {
// L2 and L3 cache checks happen BEFORE the gate
// (cache hits are already fast — no coalescing needed)
const { value: outcome, coalesced } = await originFlight.do(cacheKey, async () => {
const image = await fetchUpstreamImage(url); // ← Only ONE request
const optimized = await optimizeImage(image); // ← Only ONE Sharp instance
await storage.put(cacheKey, optimized); // ← Only ONE write
return { data: optimized.data, contentType: optimized.contentType };
});
if (coalesced) {
request.log.info({ cacheKey }, 'Request coalesced — served from in-flight leader');
}
// Each request sends its OWN HTTP response using the shared result
sendImageResponse(reply, outcome.data, outcome.contentType, ...);
}
Cache checks live outside the gate — they're already fast. Only the expensive origin fetch path gets coalesced.
In development:
In production:
This is the classic gap between "works on my machine" and "works under real load." Load testing bridges it. Hardening reviews catch the rest.
| Metric | Before (N=100) | After (N=100) | Change |
|---|---|---|---|
| Upstream fetches | 100 | 1 | 99% fewer |
| Sharp instances | 100 | 1 | 99% fewer |
| Storage writes | 100 | 1 | 99% fewer |
| Peak memory | ~500MB | ~10MB | 98% reduction |
| Upstream bandwidth | 500MB | 5MB | 99% reduction |
| Response time (P99) | timeout | <200ms | orders of magnitude |
The cost of the singleflight itself is negligible — one Map.get() + one Map.set() per flight, plus the promise resolution fan-out. The overhead lives entirely in memory for the promise reference, which is a few bytes per follower.
Leader failure propagation. The finally block deletes the key, so the next request starts a fresh flight. Errors propagate to all followers — they all get the same rejection. This is correct behavior: if upstream is down, everyone should know.
Different transform parameters. The cacheKey includes query params like w=800&fmt=webp, so ?w=800 and ?w=400 produce different keys with independent flights. No cross-contamination.
Memory leaks. The Map only holds entries for active flights. Once a flight resolves or rejects, the entry is removed. Map size equals the count of unique in-progress cache keys — typically single digits in normal operation.
Multi-instance deployments. Singleflight is per-process. Three instances behind a load balancer means up to three flights for the same key. For global deduplication, you'd need distributed locking (Redis SET NX with TTL), which adds latency and failure modes. Three is a lot better than three hundred, so the per-process approach is often good enough.
Long leader processing time. If the leader takes 30 seconds (huge image, slow upstream), every follower is blocked for 30 seconds too. This is still better than 100 independent 30-second requests, but you should pair singleflight with upstream timeouts. If the leader doesn't finish within a budget, followers should fail fast rather than stacking up indefinitely.
Singleflight isn't limited to image CDNs. Anywhere you have:
You can apply this. DNS resolvers use it. Kubernetes API server uses it. Cloudflare's edge cache uses a variant of it.
The pattern also naturally pairs with:
SET NX with a short TTL to elect a single leader across distributed nodes.Each has trade-offs. Stale-while-revalidate adds complexity around what "stale" means for your content. Probabilistic expiration wastes some upstream bandwidth on unnecessary refreshes. Redis locking introduces a network hop and a failure mode if Redis is down.
For a single-instance image CDN, in-process singleflight is the right tool. Simple, zero external dependencies, handles the exact failure mode.
Cache misses are the danger zone. Cold deploys, TTL expirations, first requests for new content — these windows are where thundering herds form. If your cache miss path scales linearly with concurrent requests, you have this bug. It'll just wait for enough traffic to show itself.
The fix is small. 70 lines of utility code, one wrapper around the expensive path. The hard part is asking the right question: "What happens when a thousand users do this at once?"
Found this useful? Star the Notion Image CDN repo and follow me for more production backend deep dives.
If this post was useful, consider supporting my open source work and independent writing.