Fifty Users
Around 2005, a portal that I had helped to build went live. It was based on Java EE, a relational database and a portal server — enterprise technology for enterprise requirements. However, shortly afterwards, the system crashed with just 50 concurrent users. Fifty. Not five hundred, not five thousand — just fifty.
Everything had gone well in the tests. Clean response times and fast page loading. The problem was that nobody had tested the load limits. A colleague managed to get the problem under some control with caffeine, larger caches and a few database optimisations. We eventually found better solutions, too, but one question kept nagging at me afterwards: what do companies serving millions of users know that I don’t?
I have been thinking about this topic for over twenty years. In this series, I’d like to document what I’ve learnt along the way, from the mathematical fundamentals used to estimate the performance of a single instance, to how to configure a thread pool and organisational structures, and how to scale them.
Factor 2 is optimisation, factor 10 is architecture
If you can tweak a system that can handle 100 requests per second to handle 200 requests per second, you haven’t solved a scalability problem. The system still had plenty of headroom. Creating indexes, optimising queries, configuring the garbage collector, or renting a larger instance — these are necessary tasks, but none of them push the boundaries of the architecture.
Factor 10 is a different category. And caching is not the solution here (digression).
From 100 to 1,000 requests per second—you can’t achieve that through tuning alone. What works at 100 requests—sessions on the server, joins across five tables, a single database server—breaks down at 1,000. The connection pool fills up, locks on the database become a bottleneck, and response times skyrocket. Not because anything was poorly built, but because the architecture was designed for a different scale.
Every step beyond that brings new qualitative problems. It is then no longer enough to speed up the same system—you need a different system.
The mistake we made in 2005 was not our choice of technology. Rather, we underestimated the limits of the architecture’s scalability and how quickly we would reach them.
Two ways to fail
However, there is another way to reach limits.
When I joined OTTO in 2009, its e-commerce platform was based on a commercial shop solution which required 200 instances to handle 200 page views per second. Technically, it worked. However, the costs were absurd and adding more instances would not have solved the problem; it would only have increased the expense.
This is the second type of scaling limit: a system can handle the load but is not scalable because the cost is grossly disproportionate to the result.
Therefore, the real question is not ‘Is my system scalable?’. — but rather: By what factor does it need to grow? What will the next doubling cost relative to the current situation? And how much lead time do I need to act in time?’
“More servers” is not an answer
The first response to scaling problems is to use a larger instance. The second is to use more instances. While both reactions are understandable, they are both insufficient.
Simply adding instances only scales along a single dimension. This approach only works if the instances are independent and have no shared database, state or deployment pipeline. In practice, however, at least one of these conditions is usually violated. In that case, it is not the system that is being scaled, but the bottleneck.
A system can be scaled horizontally along three dimensions:
- Run identical copies,
- split functionally into independent parts, or
- partitioning by data, customer segments, regions, or other criteria.
Most systems that undergo serious scaling combine at least two of these dimensions. ‘Design for at least two axes of scale’ is the recommendation from Abbott and Fisher in The Art of Scalability. It’s one of the most useful pieces of advice on this topic that I know of.
The same laws, a different level
What I find most fascinating about scalability is that the principles can be applied to all kinds of situations. The laws that explain why a service collapses under load also explain why 50 developers working on a shared codebase achieve less per person than 20 developers.
Shared resources, serial bottlenecks and quadratically growing communication overhead are not just metaphors, but they are real phenomena. They are the same mathematical relationships applied to a different system. The solution is the same, but we’ll get to that.
Firstly, it is important to understand the potential impact of a single instance.
What a single instance reveals
This series doesn’t start with microservices, Kubernetes or cloud autoscaling; it starts with a single instance of a simple service. How many requests can it handle? What is its limit? Why does this limit often come as a surprise when it could easily be calculated?
There is a tool for that. Dating from 1961, it fits on a single line and would have saved us a great deal of trouble in 2005.
Sources
- Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.
Comments