Digression: Backpressure — When Systems Learn to Say No

March 25, 2026 scalability backpressure rate-limiting resilience digression

There have been traffic lights on German motorway on-ramps for some years now, regulating the flow of traffic onto the motorway. When the motorway is full, the lights turn red and only allow one car through every few seconds. Cars on the slip road wait. Meanwhile, the cars on the motorway keep moving.

It’s not elegant. It is also irritating for people who haven’t experienced this before. But it is effective.

The principle behind it is backpressure: an overloaded system signals to the previous stage that it cannot accept any more. Rather than accepting all requests and collapsing under the load, it rejects some, meaning the rest can continue at speed.

Three flavours

In software systems, there are three ways to implement this principle:

Reject. The service returns an HTTP 503 error. ‘Service unavailable, try again later.’ Brutal, but honest. The client immediately knows where they stand and can retry after a short pause. This is better than waiting 30 seconds for a timeout and still getting an error message.

Delay. Requests land in a queue with limited capacity. As long as the queue isn’t full, requests are buffered and processed in order. Once the queue overflows, variant one kicks in again. The advantage is that short load spikes are smoothed out. The disadvantage is that the queue itself consumes resources, and the waiting time adds to the response time.

Throttle. Rate limiting caps the number of requests per unit of time, typically per client or API key. A token bucket algorithm allows short bursts but enforces a fixed limit on average. This protects the service from individual clients generating disproportionate load — whether due to misconfiguration, aggressive crawlers, or a bug in retry logic.

When do you need it?

At the saturation point. Little’s Law shows what happens when a system reaches its capacity limit: waiting times increase disproportionately. Without backpressure, the service continues to accept requests that it can no longer process in time. Threads block on database connections, the queue grows and response times rise for all requests — including those that the service could still have handled.

With backpressure, some requests are rejected. However, the remaining ones receive a response in a normal amount of time. This is not a compromise — it is triage.

Connection pools: backpressure by accident

The connection pool is a textbook example of backpressure that was not planned as such. When all connections to the database are in use, the next thread either waits for a free connection — or receives an exception after a timeout. The pool limits the amount of load is passed on to the database.

This is backpressure. It is just not explicitly designed, but arises as a side effect of resource limitation. This is exactly why it often performs poorly: the timeout is set too high (30-second default in some frameworks), there is no useful error message for the client, and the blocked threads occupy their place in the thread pool until it is also exhausted.

Anyone who deliberately configures connection pool timeouts — short enough not to block the thread pool and long enough to buffer brief spikes — is essentially practising backpressure design. Most teams do this without realising.

The uncomfortable questions

In theory, backpressure sounds clean. In practice, however, it raises uncomfortable questions.

Who gets rejected? An HTTP 503 error affects all requests equally — including those from users show are about to pay, and bots that are crawling the sitemap for the third time. Load shedding — selectively discarding low-priority requests to preserve capacity for critical functions — is the answer, but it is also considerably more complex. What constitutes an ‘important’ request? Who decides, and how quickly must the decision be made?

What happens afterwards? The rejected request doesn’t simply disappear. Either the client retries — generating additional traffic at precisely the moment when the system is overloaded — or the user gives up. Retries without backoff and jitter can turn a brief overload into a prolonged one.

Is the request repeatable? Retries only work for idempotent operations. A search query can be safely repeated. An order? That depends on whether the system has duplicate detection. Without idempotency, every retry poses a risk.

All three questions have one thing in common: backpressure shifts complexity from the server to the client. The service becomes simpler — it just says ‘no’ and that’s it. However, the client must now decide how long to wait. How often should it retry? What backoff should be used? What information should be shown to the user? Retry logic, idempotency, timeout handling and fallbacks all land on the caller. In a microservice architecture, where every service is a client of another, this complexity multiplies across the entire call chain. Backpressure introduces enormous complexity — anyone who introduces it without considering the client side is merely shifting the problem, not solving it.

Sources

Hopp & Spearman (2011) — Factory Physics. 3rd ed. Waveland Press.
Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.

Three flavours

When do you need it?

Connection pools: backpressure by accident

The uncomfortable questions

Sources

Comments