Glossary
A reference of key terms used across the scalability series.
A
ACID
Acronym for Atomicity, Consistency, Isolation, Durability — the four guarantees of relational database transactions. Isolation enforces the serialisation of concurrent write accesses and is thus one of the main causes of Lock Contention under load.
See also:
- Härder & Reuter (1983) — Principles of Transaction-Oriented Database Recovery. ACM Computing Surveys, 15(4).
Amdahl’s Law
Describes the theoretical upper limit of speedup through parallelisation. The serial fraction of a process limits the maximum possible speedup — with 10% serial fraction, the maximum is 10x, regardless of how many parallel units are added. Applies equally to hardware, distributed systems, and organisations.
See also:
- Amdahl (1967) — Validity of the single processor approach. AFIPS Spring Joint Computer Conference Proceedings, Vol. 30.
Autoscaling
Automatic addition or removal of instances based on current load. Bridges the gap between “Implement” and “Deploy” in the DID Principle. Without upper limits, costs can grow uncontrollably.
See also:
- Abbott & Fisher (2015b) — Scalability Rules. 2nd ed. Addison-Wesley.
B
Backpressure
A mechanism by which an overloaded component signals the caller to send more slowly. The operational response to exponential queue growth near saturation.
See also:
Bottleneck
The component that limits the throughput of the overall system. In most web applications, this is the database. The German series consistently uses “Engpass” instead of “Bottleneck”.
See also:
Brooks’ Law
“Adding manpower to a late software project makes it later.” The communication overhead between n people grows with n×(n-1)/2 — quadratically. This is the organisational equivalent of the USL: more people generate more coordination overhead, which consumes the productivity gain.
See also:
- Brooks (1975) — The Mythical Man-Month. Addison-Wesley.
Bulkhead Pattern
Isolates resources (thread pools, connections) per consumer or service, so that an overloaded component does not bring down all the others. Named after the bulkheads of a ship.
See also:
C
CAP Theorem
During network partitions, a distributed system must choose between Consistency and Availability — both simultaneously is not possible. Partition tolerance is not an option, because networks will be partitioned. Extended by PACELC.
See also:
- Brewer (2000) — Towards Robust Distributed Systems. ACM PODC Keynote.
Circuit Breaker
A protection mechanism for remote calls: after several consecutive failures, the circuit breaker opens and lets calls fail immediately, instead of waiting for timeouts. Prevents cascading failures in microservice architectures.
See also:
Compound Failure Probability
Each service in a synchronous call chain reduces overall availability multiplicatively: 5 services with 99.9% availability each result in 99.5% — not 99.9%. The quantitative argument for asynchronous integration and coarser granularity (SCS rather than fine-grained Microservices).
See also:
Concurrency
The average number of requests simultaneously in the system — denoted as L in formulas. When this value exceeds the pool size, requests begin to wait.
See also:
Connection Pool
A pool of pre-established database connections shared by threads. The pool size is often the first bottleneck under load — a pool that is too small (like the framework default of 10) can bring a service to its knees under moderate traffic.
See also:
Connection Pool Exhaustion
A state in which all database connections in the pool are occupied. Threads block waiting for free connections, occupying their thread pool slot in the process, and the system collapses in a cascade. Little’s Law calculates the exact limit.
See also:
Conway’s Law
“Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” Team structure and system structure are isomorphic. The “Inverse Conway Maneuver” reverses the causality: optimise the team structure first, then the architecture follows.
See also:
- Loose Coupling, Y-Axis
- Conway (1968) — How Do Committees Invent? Datamation, 14(5).
CQRS
Command Query Responsibility Segregation — separation of write and read models. The query model is optimised for fast access (e.g. Elasticsearch, materialised views), while the command model ensures consistency. Frequently combined with Event Sourcing and asynchronous integration.
See also:
D
DID Principle
Design–Implement–Deploy: Design for 20x current demand, Implement for 3–20x, Deploy for 1.5–3x. The separation reflects the asymmetric costs of change — a retrospective architectural change is orders of magnitude more expensive than a deployment adjustment. The economic anchor of the entire series.
See also:
- Scale Cube, Autoscaling
- Abbott & Fisher (2015b) — Scalability Rules. 2nd ed. Addison-Wesley. Rule 2.
Diseconomies of Scale
Software has negative economies of scale: larger teams produce less output per person. The opposite of manufacturing, where doubling capacity costs less than double. The economic core of the argument for small, autonomous teams.
See also:
- Brooks’ Law, Metcalfe’s Law
- Scholtes et al. (2016) — From Aristotle to Ringelmann. Empirical Software Engineering, 21(2). Kelly: Diseconomies of Scale.
DORA (DevOps Research and Assessment)
A research programme that empirically measures software delivery performance. Four core metrics: Deployment Frequency, Lead Time for Changes, Mean Time to Recovery, Change Failure Rate. Elite teams deploy multiple times daily with under one hour recovery. The research shows: industry and company size do not predict performance — architecture, team structure, and culture do. The 2024 finding on AI coding assistants (-7.2% Stability through larger batches) empirically confirms Reinertsen’s batch-size theory.
See also:
- Lead Time, Kanban
- Forsgren et al. (2018) — Accelerate. IT Revolution. DORA (2024) — State of DevOps Report 2024. Google Cloud.
E
Eventual Consistency
A consistency model in which read operations do not immediately see the most recent write value — but the system converges within a short time window. The unavoidable price for data partitioning and asynchronous integration.
See also:
Event Sourcing
Stores state changes as an append-only sequence of immutable events instead of mutable records. Eliminates write contention, because appending to a log requires no locks.
See also:
F
Functional Decomposition
Breaking down a system along business boundaries — vertical slices by domain, not horizontal layers by technology. Each vertical represents a complete business capability. The core idea behind the Y-Axis of the Scale Cube.
See also:
G
Goodhart’s Law
“When a measure becomes a target, it ceases to be a good measure.” Directly relevant for governance in scaled organisations: when metrics (deployment frequency, velocity) become a steering instrument, teams optimise for the metric instead of the outcome.
See also:
- Team-of-Teams
- Goodhart (1975) / Strathern (1997) — Goodhart’s Law. European Review, 5(3).
Graceful Degradation
Instead of complete failure, the system reduces functionality — cached data, disabled recommendations, static fallbacks. The safety net when Autoscaling or Circuit Breaker are not enough.
See also:
Gustafson’s Law
Counter-perspective to Amdahl’s Law: with growing problem size, near-linear speedup is possible, as long as the serial fraction remains absolutely constant. Holds for HPC workloads, but breaks down for organisations, because communication overhead grows quadratically.
See also:
- Brooks’ Law
- Gustafson (1988) — Reevaluating Amdahl’s Law. Communications of the ACM, 31(5).
H
High Cohesion
Everything that belongs together is together — no responsibility spread across multiple modules or teams. At the team level: one team, one business domain, full product ownership. The counterpart to Loose Coupling.
See also:
Hockey-Stick Curve
The characteristic shape of the response time curve under increasing load: flat for a long time, then steeply rising. Caused by the non-linear utilisation factor ρ/(1-ρ) in Kingman’s Formula. The variability of service times determines how steep the curve becomes.
See also:
K
Kanban
A method for managing work processes based on explicit WIP limits. David Anderson showed in 2010 how Little’s Law provides the foundation: Lead Time = WIP / Throughput. From this it directly follows: to reduce Lead Time, you must limit WIP — not increase throughput. Kanban boards make WIP visible; the limits enforce that new work is only started when current work is completed.
See also:
- WIP (Work in Progress), Slack Time, Lead Time
- Anderson (2010) — Kanban. Blue Hole Press.
Kingman’s Formula
An approximation for waiting time in a queue with variable service times: Waiting time ≈ V × ρ/(1-ρ) × S. Three factors — Variability (V), utilisation factor, and Service Time (S) — explain why systems break down well before the theoretical capacity limit.
See also:
- Saturation Point
- Hockey-Stick Curve
- Kingman (1961) — The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, 57(4).
L
Lead Time
The total time from when a task is received to its delivery — the organisational counterpart to response time (W) in Little’s Law. Includes waiting time and processing time. Little’s Law applies directly: Lead Time = WIP / Throughput. Kingman explains why Lead Time increases disproportionately at high utilisation — not because processing slows down, but because waiting time explodes.
See also:
- WIP (Work in Progress), Kanban, Slack Time
- Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing.
Lehman’s Laws
Empirically established laws of software evolution. Particularly relevant: Law II (Increasing Complexity) — the complexity of a system grows disproportionately unless actively countered. Gives the argument for functional decomposition formal backing.
See also:
- Y-Axis
- Lehman & Belady (1985) — Program Evolution. Academic Press.
Little’s Law
L = λ × W — the number of concurrent requests in the system (L) equals the throughput (λ) multiplied by the response time (W). Holds for any stable queuing system, regardless of the arrival distribution. The foundation of capacity planning in this series.
See also:
- Concurrency
- Little (1961) — A Proof for the Queuing Formula: L = λW. Operations Research, 9(3).
Load Balancer
Distributes incoming requests across multiple instances of a service. Strategies: Round-Robin, Least-Connection, Sticky Sessions (though the latter undermines Stateless Design). A prerequisite for X-Axis scaling.
See also:
Load Shedding
Deliberately discarding lower-priority requests under overload, in order to preserve capacity for critical functions. Unlike Rate Limiting, Load Shedding triages by priority.
See also:
Load Test
Empirical verification of a system’s capacity limits under simulated load. Little’s Law tells you where to start — the load test shows what happens in practice. Discrepancies between calculation and measurement are a signal for hidden dependencies.
See also:
Lock Contention
Concurrent write accesses to the same records must be serialised — the resulting waits for locks grow non-linearly with load. Lock Contention is the prototypical “serial fraction” in the sense of Amdahl’s Law.
See also:
Long Tail
The large number of rare queries that individually hardly matter, but in aggregate account for a significant share of traffic. Zipf’s Law explains the pattern: a few items dominate, the rest is spread across a huge variety. Long-tail queries cannot be cached and generate high variability — with the consequences that Kingman’s Formula predicts.
See also:
- → digression: Zipf’s Law and the Limits of Caching
Loose Coupling
Minimal dependencies between modules, services, or teams. A module can be changed without breaking others. The technical answer to Amdahl’s diagnosis: whoever drives the serial fraction towards zero achieves near-linear scalability.
See also:
M
Metcalfe’s Law
The number of possible communication channels in a network with N nodes grows with N×(N-1)/2 — quadratically. The formula behind Brooks’ Law and the κ-term of the USL. With 5 teams there are 10 channels, with 10 teams 45, with 50 teams 1,225.
See also:
Microservices
Small, independently deployable services with their own data storage and clearly defined APIs. More finely grained than SCS, without their own frontend. The operational overhead is higher — premature decomposition is expensive.
See also:
N
N+1 Query Problem
A code anti-pattern where a list of N objects triggers N+1 database accesses instead of a single JOIN or IN clause. For 50 orders: 51 queries instead of 2. Under load, this overhead multiplies with the number of concurrent requests.
See also:
NoSQL
Database systems that trade ACID guarantees and JOIN capability for natural partitionability. At their core, Z-Axis databases: Cassandra, DynamoDB, MongoDB use partition keys as a sharding mechanism. The price is almost always Eventual Consistency.
See also:
P
PACELC
Extension of the CAP Theorem: even without a network partition, a choice must be made between Latency and Consistency. This trade-off applies always, not just in failure cases — making it the sharper compromise. Example: Cassandra chooses availability and low latency (PA/EL), HBase chooses consistency (PC/EC).
See also:
- Eventual Consistency
- Abadi (2012) — Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2).
Percentile
Describes the threshold below which a certain proportion of all measured values fall. p95 = 250 ms means: 95% of requests are faster, but every twentieth takes longer. The higher percentiles are the more important monitoring values, because — unlike the average — they reveal what the slowest users experience.
See also:
- Response Time,
- Variability
- → digression: Zipf’s Law and the Limits of Caching
Q
Queue
A waiting line where requests sit until a resource (thread, connection, CPU) becomes available. Queue length grows non-linearly with utilisation — the central theme of queuing theory in this series.
See also:
Queuing Theory
The mathematical discipline that describes the behaviour of queuing systems. Little’s Law and Kingman’s Formula are the two central tools of this series. The core message: systems near saturation behave fundamentally differently from systems under moderate load.
See also:
R
Rate Limiting
Limiting the request frequency per client or API. Token Bucket allows bursts, Leaky Bucket enforces steady throughput. Protects the service from overload, but — unlike Load Shedding — does not differentiate by priority.
See also:
Read Replica
A replicated database copy that serves read requests while the primary node processes writes. The simplest form of read offloading — at the cost of replication lag (Eventual Consistency).
See also:
Response Time
The total time from sending a request to receiving the response — denoted as W in formulas. Includes service time, waiting time in the pool, database accesses, and external calls. The series consistently uses “response time” rather than “latency”.
See also:
S
Saturation Point
The point at which a system’s response times begin to increase disproportionately — typically at 70–80% utilisation, not at 100%. The series consistently uses “saturation point” rather than “knee of the curve”, because the term conveys the cause.
See also:
Scalability
The ability of a system to increase performance proportionally to the resources deployed. Not synonymous with performance — a system can be fast and still not scale, if additional resources bring no proportional gain.
See also:
- Amdahl’s Law, Scale Cube
- → Fifty Users
- Bondi (2000) — Characteristics of scalability and their impact on performance. WOSP ‘00.
Scale Cube
A model by Abbott & Fisher with three dimensions of scaling: X-Axis (duplication), Y-Axis (functional decomposition), and Z-Axis (data partitioning). Most systems that scale seriously combine at least two axes.
See also:
- → Fifty Users
- Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley. Ch. 22.
Scale-Up
Vertical scaling through a larger machine — more CPU, more RAM, faster storage. Works in the short term, but costs grow disproportionately and there is a hard physical upper limit. For databases, often the pragmatic first step.
See also:
SCS (Self-Contained System)
A self-contained web application with its own frontend, its own database, and its own deployment cycle. More coarsely grained than Microservices and with fewer synchronous dependencies. Each SCS can fulfil its primary use cases without requiring other systems to be available.
See also:
- Y-Axis, Functional Decomposition
- SCS Architecture — scs-architecture.org
Serial Fraction
- In Amdahl’s Law: the portion of a process that cannot be executed in parallel.
- In distributed systems: any shared resource (database, message broker, deployment pipeline, human). Determines the theoretical upper limit of scalability.
See also:
Service Time
The pure processing time of a request — without waiting time in the queue. Denoted as S in Kingman’s Formula. Every millisecond saved in service time has a double effect: once directly and once multiplicatively through the reduced waiting time.
See also:
Sharding
Horizontal partitioning of data by a sharding key (e.g. customer ID, region). The Z-Axis of the Scale Cube at the data layer. The choice of key determines the quality of the distribution — poor keys create hot spots.
See also:
Shared-Nothing
An architectural principle in which instances, services, or teams share no resources — no shared database, no shared cache, no shared code. A prerequisite for linear scaling. Applies at the technical level (service architecture) as well as at the organisational level (team structure).
See also:
Slack Time
Deliberately reserved capacity — the organisational equivalent of capacity buffers in thread pools or connection pools. Kingman provides the mathematical justification: the difference between 80% and 95% utilisation is not 15 percentage points, but a factor of 3–4 in waiting time. Slack time is not idle time — it is the investment in stable lead times and the ability to respond to the unexpected.
See also:
- WIP (Work in Progress), Saturation Point
- DeMarco (2001) — Slack. Broadway Books. Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing.
Stateless Design
An architectural principle in which application instances hold no state — all state lives externally (database, cache, cookies). A prerequisite for X-Axis scaling, because every instance becomes interchangeable.
See also:
T
Team Topologies
A model by Skelton & Pais with four team types: Stream-Aligned Teams, Platform Teams, Enabling Teams, and Complicated-Subsystem Teams. Platform Teams provide tools and infrastructure; Enabling Teams transfer know-how — both without creating dependencies.
See also:
- Team-of-Teams, Loose Coupling
- Skelton & Pais (2019) — Team Topologies. IT Revolution.
Team-of-Teams
An organisational model after McChrystal: loosely coupled groups of teams with high internal cohesion and defined interfaces to the outside. Breaks the quadratic communication topology into a hierarchical one — direct communication within a group, between groups only through defined interfaces.
See also:
- Metcalfe’s Law, Loose Coupling
- McChrystal (2015) — Team of Teams. Portfolio/Penguin.
Theory of Constraints
Every system has exactly one bottleneck that limits throughput. Optimising anything else is waste. The Five Focusing Steps — Identify, Exploit, Subordinate, Elevate, Repeat — provide a method, not just an observation.
See also:
- Lock Contention
- Goldratt (1984) — The Goal. North River Press.
Thread Pool
A pool of pre-created threads that process incoming requests. Most of the time, threads are blocked waiting for I/O — “thread waiting room” would be the more honest name. Little’s Law sizes the pool: maximum throughput = pool size / response time.
See also:
Throughput
The average number of requests completed per second — denoted as λ (Lambda) in formulas. Describes what actually gets through, not what is requested. The series consistently uses “throughput” as the standard term.
See also:
U
USL (Universal Scalability Law)
Extension of Amdahl’s Law with a coherence parameter κ (crosstalk): C(n) = n / (1 + σ(n-1) + κ×n×(n-1)). The κ-term grows quadratically — past a certain point, throughput decreases rather than merely stagnating. Explains retrograde scaling in database lock contention and in organisations (Brooks’ Law).
See also:
- Gunther (1993) — Practical Performance Analyst. McGraw-Hill.
Utilization
The proportion of a system’s capacity in use — denoted as ρ (Rho) in formulas. At 70% utilisation, a system typically runs stably; from 80–90%, waiting times explode non-linearly.
See also:
V
Variability
A measure of how much service times vary around their mean — denoted as V in Kingman’s Formula. V = 0 means constant times (no queue), V = 1 exponentially distributed times (worst case). The mix of cache hits and expensive queries produces high variability — and thus phantom traffic jams.
See also:
- Hockey-Stick Curve, Percentile
- → digression: Zipf’s Law and the Limits of Caching
Virtual Threads
Available since Java 21: lightweight threads that decouple processing capacity from the number of blocked OS threads. Changes the optimisation approach for thread pools, but not the fundamental message of Little’s Law — and nothing about the connection pool as bottleneck.
See also:
W
WIP (Work in Progress)
The number of simultaneously started but uncompleted tasks — the organisational counterpart to Concurrency (L) in Little’s Law. Teams fill waiting time with new projects, WIP rises, utilisation rises, waiting times explode — a vicious cycle. “Stop Starting, Start Finishing” breaks the cycle.
See also:
- Saturation Point
- Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing. Anderson (2010) — Kanban. Blue Hole Press.
X
X-Axis
First dimension of the Scale Cube: duplication. Multiple identical instances behind a Load Balancer. Requires Stateless Design and scales linearly — as long as no shared resource becomes the bottleneck.
See also:
- Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.
Y
Y-Axis
Second dimension of the Scale Cube: Functional Decomposition. Breaking down a system into independent services along business boundaries — each with its own data storage, its own deployment, its own team ownership. The cleanest transfer to the organisational level (Conway’s Law).
See also:
- SCS, Microservices
- Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.
You Build It, You Run It
The principle that the team which develops a service also operates it. Eliminates the dependency between Dev and Ops and shortens feedback loops. Requires organisational investment in tooling, on-call support, and cultural change.
See also:
- Loose Coupling, Team Topologies
- Vogels (2006) — A Conversation with Werner Vogels. ACM Queue, 4(4).
Z
Z-Axis
Third dimension of the Scale Cube: data partitioning (Sharding). Requests or data are distributed across different partitions by a key (customer ID, region). Well-established at the data layer (NoSQL), rare at the application layer — because Z-Axis requires stateful routing, which contradicts Stateless Design (a prerequisite for X-Axis).
See also:
- Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.
Zipf’s Law
The frequency of the kth most popular item is proportional to 1/k^α — a few items dominate, most are rare. Explains why caching works so spectacularly well (the top 1% generate 80% of traffic) and why it eventually isn’t enough: the Long Tail grows with traffic.
See also:
- Variability, Percentile
- → digression: Zipf’s Law and the Limits of Caching
- Breslau et al. (1999) — Web Caching and Zipf-like Distributions. IEEE INFOCOM 1999.