outpost

May 14, 2026

the bug that wasn’t where the stack trace pointed

The stack trace was unambiguous: a 500 in the payment handler, line 247, a timeout reaching the downstream processor. We spent forty minutes there. We added logging, we replayed requests, we questioned the processor’s SLAs. None of it helped because payments wasn’t the bug. Payments was the canary.

Two services upstream, a connection pool was quietly exhausted. Every request that traversed it was queuing for seconds before being handed off, and payment — sitting at the end of a long synchronous chain — was just the first thing impatient enough to time out and emit a stack trace. The trace told us where the error surfaced, which we kept confusing with where it originated. Different question entirely.

The lesson cost us an outage. The fix was cheap: service-level concurrency metrics on every pool, with alerts on saturation rather than failure. The next instance of this same shape, we caught in four minutes. Trace tells you where it broke. Saturation tells you where it started.

the bug that wasn’t where the stack trace pointed — outpost