Description
Under sustained write load (> 2k ops/sec) the Redis-backed caching layer returns stale values for up to 400 ms during the eviction window. This is reproducible in staging with the k6 load profile attached below and affects the checkout flow in ~0.3% of sessions.
Steps to reproduce:
- Deploy
[email protected]to staging with default TTL. - Run
k6 run scripts/checkout-burst.jsfor 10 minutes. - Observe 4xx spikes in the
/api/v2/cartendpoint.
Log snippet from the failing host:
// staging-cache-03 · 2026-04-21T23:14:02Z ERROR cache.evict: race during MSET, key="cart:u_8832" WARN cache.read: returning stale, age=412ms, ttl=300ms ERROR checkout.validate: cart mismatch for u_8832, rolling back
Child issues
Activity · Comments (3)
HT
Hana Takeda
today at 09:15
Shadow traffic on staging-cache-03 reproduced the issue within 7 minutes. I'll attach the k6 run + pprof dump to this ticket.
Worth noting: the issue does not reproduce on staging-cache-01/02 — those use the old [email protected]. So the regression is in 1.17 or 1.18.
MK
Confirmed in staging. The race is in
evictAndReload()— the lock is released before the new value is committed. I'm drafting a patch that usesSET NX+ a short token-based lease.Pushing a draft PR tonight, happy for early eyes.