Glossary

A reference of key terms used across the scalability series.

A

ACID

Acronym for Atomicity, Consistency, Isolation, Durability — the four guarantees of relational database transactions. Isolation enforces the serialisation of concurrent write accesses and is thus one of the main causes of Lock Contention under load.

See also:

Härder & Reuter (1983) — Principles of Transaction-Oriented Database Recovery. ACM Computing Surveys, 15(4).

Amdahl’s Law

Describes the theoretical upper limit of speedup through parallelisation. The serial fraction of a process limits the maximum possible speedup — with 10% serial fraction, the maximum is 10x, regardless of how many parallel units are added. Applies equally to hardware, distributed systems, and organisations.

See also:

Amdahl (1967) — Validity of the single processor approach. AFIPS Spring Joint Computer Conference Proceedings, Vol. 30.

Autoscaling

Automatic addition or removal of instances based on current load. Bridges the gap between “Implement” and “Deploy” in the DID Principle. Without upper limits, costs can grow uncontrollably.

See also:

Abbott & Fisher (2015b) — Scalability Rules. 2nd ed. Addison-Wesley.

B

Backpressure

A mechanism by which an overloaded component signals the caller to send more slowly. The operational response to exponential queue growth near saturation.

See also:

Saturation Point, Rate Limiting

Bottleneck

The component that limits the throughput of the overall system. In most web applications, this is the database. The German series consistently uses “Engpass” instead of “Bottleneck”.

See also:

Theory of Constraints, Serial Fraction

Brooks’ Law

“Adding manpower to a late software project makes it later.” The communication overhead between n people grows with n×(n-1)/2 — quadratically. This is the organisational equivalent of the USL: more people generate more coordination overhead, which consumes the productivity gain.

See also:

Brooks (1975) — The Mythical Man-Month. Addison-Wesley.

Bulkhead Pattern

Isolates resources (thread pools, connections) per consumer or service, so that an overloaded component does not bring down all the others. Named after the bulkheads of a ship.

See also:

Circuit Breaker, Graceful Degradation

C

CAP Theorem

During network partitions, a distributed system must choose between Consistency and Availability — both simultaneously is not possible. Partition tolerance is not an option, because networks will be partitioned. Extended by PACELC.

See also:

Brewer (2000) — Towards Robust Distributed Systems. ACM PODC Keynote.

Circuit Breaker

A protection mechanism for remote calls: after several consecutive failures, the circuit breaker opens and lets calls fail immediately, instead of waiting for timeouts. Prevents cascading failures in microservice architectures.

See also:

Bulkhead Pattern, Graceful Degradation

Compound Failure Probability

Each service in a synchronous call chain reduces overall availability multiplicatively: 5 services with 99.9% availability each result in 99.5% — not 99.9%. The quantitative argument for asynchronous integration and coarser granularity (SCS rather than fine-grained Microservices).

See also:

Concurrency

The average number of requests simultaneously in the system — denoted as L in formulas. When this value exceeds the pool size, requests begin to wait.

See also:

Little’s Law, Thread Pool

Connection Pool

A pool of pre-established database connections shared by threads. The pool size is often the first bottleneck under load — a pool that is too small (like the framework default of 10) can bring a service to its knees under moderate traffic.

See also:

Little’s Law, Connection Pool Exhaustion

Connection Pool Exhaustion

A state in which all database connections in the pool are occupied. Threads block waiting for free connections, occupying their thread pool slot in the process, and the system collapses in a cascade. Little’s Law calculates the exact limit.

See also:

Connection Pool, Lock Contention

Conway’s Law

“Any organization that designs a system will produce a design whose structure is a copy of the organization’s communication structure.” Team structure and system structure are isomorphic. The “Inverse Conway Maneuver” reverses the causality: optimise the team structure first, then the architecture follows.

See also:

Loose Coupling, Y-Axis
Conway (1968) — How Do Committees Invent? Datamation, 14(5).

CQRS

Command Query Responsibility Segregation — separation of write and read models. The query model is optimised for fast access (e.g. Elasticsearch, materialised views), while the command model ensures consistency. Frequently combined with Event Sourcing and asynchronous integration.

See also:

Eventual Consistency, Z-Axis

D

DID Principle

Design–Implement–Deploy: Design for 20x current demand, Implement for 3–20x, Deploy for 1.5–3x. The separation reflects the asymmetric costs of change — a retrospective architectural change is orders of magnitude more expensive than a deployment adjustment. The economic anchor of the entire series.

See also:

Scale Cube, Autoscaling
Abbott & Fisher (2015b) — Scalability Rules. 2nd ed. Addison-Wesley. Rule 2.

Diseconomies of Scale

Software has negative economies of scale: larger teams produce less output per person. The opposite of manufacturing, where doubling capacity costs less than double. The economic core of the argument for small, autonomous teams.

See also:

Brooks’ Law, Metcalfe’s Law
Scholtes et al. (2016) — From Aristotle to Ringelmann. Empirical Software Engineering, 21(2). Kelly: Diseconomies of Scale.

DORA (DevOps Research and Assessment)

A research programme that empirically measures software delivery performance. Four core metrics: Deployment Frequency, Lead Time for Changes, Mean Time to Recovery, Change Failure Rate. Elite teams deploy multiple times daily with under one hour recovery. The research shows: industry and company size do not predict performance — architecture, team structure, and culture do. The 2024 finding on AI coding assistants (-7.2% Stability through larger batches) empirically confirms Reinertsen’s batch-size theory.

See also:

Lead Time, Kanban
Forsgren et al. (2018) — Accelerate. IT Revolution. DORA (2024) — State of DevOps Report 2024. Google Cloud.

E

Eventual Consistency

A consistency model in which read operations do not immediately see the most recent write value — but the system converges within a short time window. The unavoidable price for data partitioning and asynchronous integration.

See also:

CAP Theorem, PACELC, Read Replica

Event Sourcing

Stores state changes as an append-only sequence of immutable events instead of mutable records. Eliminates write contention, because appending to a log requires no locks.

See also:

CQRS, Lock Contention

F

Functional Decomposition

Breaking down a system along business boundaries — vertical slices by domain, not horizontal layers by technology. Each vertical represents a complete business capability. The core idea behind the Y-Axis of the Scale Cube.

See also:

G

Goodhart’s Law

“When a measure becomes a target, it ceases to be a good measure.” Directly relevant for governance in scaled organisations: when metrics (deployment frequency, velocity) become a steering instrument, teams optimise for the metric instead of the outcome.

See also:

Team-of-Teams
Goodhart (1975) / Strathern (1997) — Goodhart’s Law. European Review, 5(3).

Graceful Degradation

Instead of complete failure, the system reduces functionality — cached data, disabled recommendations, static fallbacks. The safety net when Autoscaling or Circuit Breaker are not enough.

See also:

Gustafson’s Law

Counter-perspective to Amdahl’s Law: with growing problem size, near-linear speedup is possible, as long as the serial fraction remains absolutely constant. Holds for HPC workloads, but breaks down for organisations, because communication overhead grows quadratically.

See also:

Brooks’ Law
Gustafson (1988) — Reevaluating Amdahl’s Law. Communications of the ACM, 31(5).

H

High Cohesion

Everything that belongs together is together — no responsibility spread across multiple modules or teams. At the team level: one team, one business domain, full product ownership. The counterpart to Loose Coupling.

See also:

Y-Axis

Hockey-Stick Curve

The characteristic shape of the response time curve under increasing load: flat for a long time, then steeply rising. Caused by the non-linear utilisation factor ρ/(1-ρ) in Kingman’s Formula. The variability of service times determines how steep the curve becomes.

See also:

Saturation Point

K

Kanban

A method for managing work processes based on explicit WIP limits. David Anderson showed in 2010 how Little’s Law provides the foundation: Lead Time = WIP / Throughput. From this it directly follows: to reduce Lead Time, you must limit WIP — not increase throughput. Kanban boards make WIP visible; the limits enforce that new work is only started when current work is completed.

See also:

WIP (Work in Progress), Slack Time, Lead Time
Anderson (2010) — Kanban. Blue Hole Press.

Kingman’s Formula

An approximation for waiting time in a queue with variable service times: Waiting time ≈ V × ρ/(1-ρ) × S. Three factors — Variability (V), utilisation factor, and Service Time (S) — explain why systems break down well before the theoretical capacity limit.

See also:

Saturation Point
Hockey-Stick Curve
Kingman (1961) — The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, 57(4).

L

Lead Time

The total time from when a task is received to its delivery — the organisational counterpart to response time (W) in Little’s Law. Includes waiting time and processing time. Little’s Law applies directly: Lead Time = WIP / Throughput. Kingman explains why Lead Time increases disproportionately at high utilisation — not because processing slows down, but because waiting time explodes.

See also:

WIP (Work in Progress), Kanban, Slack Time
Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing.

Lehman’s Laws

Empirically established laws of software evolution. Particularly relevant: Law II (Increasing Complexity) — the complexity of a system grows disproportionately unless actively countered. Gives the argument for functional decomposition formal backing.

See also:

Y-Axis
Lehman & Belady (1985) — Program Evolution. Academic Press.

Little’s Law

L = λ × W — the number of concurrent requests in the system (L) equals the throughput (λ) multiplied by the response time (W). Holds for any stable queuing system, regardless of the arrival distribution. The foundation of capacity planning in this series.

See also:

Concurrency
Little (1961) — A Proof for the Queuing Formula: L = λW. Operations Research, 9(3).

Load Balancer

Distributes incoming requests across multiple instances of a service. Strategies: Round-Robin, Least-Connection, Sticky Sessions (though the latter undermines Stateless Design). A prerequisite for X-Axis scaling.

See also:

Load Shedding

Deliberately discarding lower-priority requests under overload, in order to preserve capacity for critical functions. Unlike Rate Limiting, Load Shedding triages by priority.

See also:

Backpressure, Graceful Degradation

Load Test

Empirical verification of a system’s capacity limits under simulated load. Little’s Law tells you where to start — the load test shows what happens in practice. Discrepancies between calculation and measurement are a signal for hidden dependencies.

See also:

Lock Contention

Concurrent write accesses to the same records must be serialised — the resulting waits for locks grow non-linearly with load. Lock Contention is the prototypical “serial fraction” in the sense of Amdahl’s Law.

See also:

ACID

Long Tail

The large number of rare queries that individually hardly matter, but in aggregate account for a significant share of traffic. Zipf’s Law explains the pattern: a few items dominate, the rest is spread across a huge variety. Long-tail queries cannot be cached and generate high variability — with the consequences that Kingman’s Formula predicts.

See also:

→ digression: Zipf’s Law and the Limits of Caching

Loose Coupling

Minimal dependencies between modules, services, or teams. A module can be changed without breaking others. The technical answer to Amdahl’s diagnosis: whoever drives the serial fraction towards zero achieves near-linear scalability.

See also:

High Cohesion,
Shared-Nothing,
Conway’s Law

M

Metcalfe’s Law

The number of possible communication channels in a network with N nodes grows with N×(N-1)/2 — quadratically. The formula behind Brooks’ Law and the κ-term of the USL. With 5 teams there are 10 channels, with 10 teams 45, with 50 teams 1,225.

See also:

Microservices

Small, independently deployable services with their own data storage and clearly defined APIs. More finely grained than SCS, without their own frontend. The operational overhead is higher — premature decomposition is expensive.

See also:

Y-Axis,
Functional Decomposition

N

N+1 Query Problem

A code anti-pattern where a list of N objects triggers N+1 database accesses instead of a single JOIN or IN clause. For 50 orders: 51 queries instead of 2. Under load, this overhead multiplies with the number of concurrent requests.

See also:

Bottleneck,
Connection Pool Exhaustion.

NoSQL

Database systems that trade ACID guarantees and JOIN capability for natural partitionability. At their core, Z-Axis databases: Cassandra, DynamoDB, MongoDB use partition keys as a sharding mechanism. The price is almost always Eventual Consistency.

See also:

P

PACELC

Extension of the CAP Theorem: even without a network partition, a choice must be made between Latency and Consistency. This trade-off applies always, not just in failure cases — making it the sharper compromise. Example: Cassandra chooses availability and low latency (PA/EL), HBase chooses consistency (PC/EC).

See also:

Eventual Consistency
Abadi (2012) — Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2).

Percentile

Describes the threshold below which a certain proportion of all measured values fall. p95 = 250 ms means: 95% of requests are faster, but every twentieth takes longer. The higher percentiles are the more important monitoring values, because — unlike the average — they reveal what the slowest users experience.

See also:

Response Time,
Variability
→ digression: Zipf’s Law and the Limits of Caching

Q

Queue

A waiting line where requests sit until a resource (thread, connection, CPU) becomes available. Queue length grows non-linearly with utilisation — the central theme of queuing theory in this series.

See also:

Kingman’s Formula, Little’s Law

Queuing Theory

The mathematical discipline that describes the behaviour of queuing systems. Little’s Law and Kingman’s Formula are the two central tools of this series. The core message: systems near saturation behave fundamentally differently from systems under moderate load.

See also:

R

Rate Limiting

Limiting the request frequency per client or API. Token Bucket allows bursts, Leaky Bucket enforces steady throughput. Protects the service from overload, but — unlike Load Shedding — does not differentiate by priority.

See also:

Backpressure

Read Replica

A replicated database copy that serves read requests while the primary node processes writes. The simplest form of read offloading — at the cost of replication lag (Eventual Consistency).

See also:

Scale-Up

Response Time

The total time from sending a request to receiving the response — denoted as W in formulas. Includes service time, waiting time in the pool, database accesses, and external calls. The series consistently uses “response time” rather than “latency”.

See also:

S

Saturation Point

The point at which a system’s response times begin to increase disproportionately — typically at 70–80% utilisation, not at 100%. The series consistently uses “saturation point” rather than “knee of the curve”, because the term conveys the cause.

See also:

Kingman’s Formula, Hockey-Stick Curve

Scalability

The ability of a system to increase performance proportionally to the resources deployed. Not synonymous with performance — a system can be fast and still not scale, if additional resources bring no proportional gain.

See also:

Amdahl’s Law, Scale Cube
→ Fifty Users
Bondi (2000) — Characteristics of scalability and their impact on performance. WOSP ‘00.

Scale Cube

A model by Abbott & Fisher with three dimensions of scaling: X-Axis (duplication), Y-Axis (functional decomposition), and Z-Axis (data partitioning). Most systems that scale seriously combine at least two axes.

See also:

→ Fifty Users
Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley. Ch. 22.

Scale-Up

Vertical scaling through a larger machine — more CPU, more RAM, faster storage. Works in the short term, but costs grow disproportionately and there is a hard physical upper limit. For databases, often the pragmatic first step.

See also:

Bottleneck, X-Axis

SCS (Self-Contained System)

A self-contained web application with its own frontend, its own database, and its own deployment cycle. More coarsely grained than Microservices and with fewer synchronous dependencies. Each SCS can fulfil its primary use cases without requiring other systems to be available.

See also:

Y-Axis, Functional Decomposition
SCS Architecture — scs-architecture.org

Serial Fraction

In Amdahl’s Law: the portion of a process that cannot be executed in parallel.
In distributed systems: any shared resource (database, message broker, deployment pipeline, human). Determines the theoretical upper limit of scalability.

See also:

Lock Contention, Loose Coupling

Service Time

The pure processing time of a request — without waiting time in the queue. Denoted as S in Kingman’s Formula. Every millisecond saved in service time has a double effect: once directly and once multiplicatively through the reduced waiting time.

See also:

Response Time

Sharding

Horizontal partitioning of data by a sharding key (e.g. customer ID, region). The Z-Axis of the Scale Cube at the data layer. The choice of key determines the quality of the distribution — poor keys create hot spots.

See also:

NoSQL

Shared-Nothing

An architectural principle in which instances, services, or teams share no resources — no shared database, no shared cache, no shared code. A prerequisite for linear scaling. Applies at the technical level (service architecture) as well as at the organisational level (team structure).

See also:

Loose Coupling, Serial Fraction

Slack Time

Deliberately reserved capacity — the organisational equivalent of capacity buffers in thread pools or connection pools. Kingman provides the mathematical justification: the difference between 80% and 95% utilisation is not 15 percentage points, but a factor of 3–4 in waiting time. Slack time is not idle time — it is the investment in stable lead times and the ability to respond to the unexpected.

See also:

WIP (Work in Progress), Saturation Point
DeMarco (2001) — Slack. Broadway Books. Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing.

Stateless Design

An architectural principle in which application instances hold no state — all state lives externally (database, cache, cookies). A prerequisite for X-Axis scaling, because every instance becomes interchangeable.

See also:

Shared-Nothing

T

Team Topologies

A model by Skelton & Pais with four team types: Stream-Aligned Teams, Platform Teams, Enabling Teams, and Complicated-Subsystem Teams. Platform Teams provide tools and infrastructure; Enabling Teams transfer know-how — both without creating dependencies.

See also:

Team-of-Teams, Loose Coupling
Skelton & Pais (2019) — Team Topologies. IT Revolution.

Team-of-Teams

An organisational model after McChrystal: loosely coupled groups of teams with high internal cohesion and defined interfaces to the outside. Breaks the quadratic communication topology into a hierarchical one — direct communication within a group, between groups only through defined interfaces.

See also:

Metcalfe’s Law, Loose Coupling
McChrystal (2015) — Team of Teams. Portfolio/Penguin.

Theory of Constraints

Every system has exactly one bottleneck that limits throughput. Optimising anything else is waste. The Five Focusing Steps — Identify, Exploit, Subordinate, Elevate, Repeat — provide a method, not just an observation.

See also:

Lock Contention
Goldratt (1984) — The Goal. North River Press.

Thread Pool

A pool of pre-created threads that process incoming requests. Most of the time, threads are blocked waiting for I/O — “thread waiting room” would be the more honest name. Little’s Law sizes the pool: maximum throughput = pool size / response time.

See also:

Connection Pool, Virtual Threads

Throughput

The average number of requests completed per second — denoted as λ (Lambda) in formulas. Describes what actually gets through, not what is requested. The series consistently uses “throughput” as the standard term.

See also:

Little’s Law, Response Time

U

USL (Universal Scalability Law)

Extension of Amdahl’s Law with a coherence parameter κ (crosstalk): C(n) = n / (1 + σ(n-1) + κ×n×(n-1)). The κ-term grows quadratically — past a certain point, throughput decreases rather than merely stagnating. Explains retrograde scaling in database lock contention and in organisations (Brooks’ Law).

See also:

Gunther (1993) — Practical Performance Analyst. McGraw-Hill.

Utilization

The proportion of a system’s capacity in use — denoted as ρ (Rho) in formulas. At 70% utilisation, a system typically runs stably; from 80–90%, waiting times explode non-linearly.

See also:

Saturation Point, Kingman’s Formula

V

Variability

A measure of how much service times vary around their mean — denoted as V in Kingman’s Formula. V = 0 means constant times (no queue), V = 1 exponentially distributed times (worst case). The mix of cache hits and expensive queries produces high variability — and thus phantom traffic jams.

See also:

Hockey-Stick Curve, Percentile
→ digression: Zipf’s Law and the Limits of Caching

Virtual Threads

Available since Java 21: lightweight threads that decouple processing capacity from the number of blocked OS threads. Changes the optimisation approach for thread pools, but not the fundamental message of Little’s Law — and nothing about the connection pool as bottleneck.

See also:

W

WIP (Work in Progress)

The number of simultaneously started but uncompleted tasks — the organisational counterpart to Concurrency (L) in Little’s Law. Teams fill waiting time with new projects, WIP rises, utilisation rises, waiting times explode — a vicious cycle. “Stop Starting, Start Finishing” breaks the cycle.

See also:

Saturation Point
Reinertsen (2009) — Principles of Product Development Flow. Celeritas Publishing. Anderson (2010) — Kanban. Blue Hole Press.

X

X-Axis

First dimension of the Scale Cube: duplication. Multiple identical instances behind a Load Balancer. Requires Stateless Design and scales linearly — as long as no shared resource becomes the bottleneck.

See also:

Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.

Y

Y-Axis

Second dimension of the Scale Cube: Functional Decomposition. Breaking down a system into independent services along business boundaries — each with its own data storage, its own deployment, its own team ownership. The cleanest transfer to the organisational level (Conway’s Law).

See also:

SCS, Microservices
Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.

You Build It, You Run It

The principle that the team which develops a service also operates it. Eliminates the dependency between Dev and Ops and shortens feedback loops. Requires organisational investment in tooling, on-call support, and cultural change.

See also:

Loose Coupling, Team Topologies
Vogels (2006) — A Conversation with Werner Vogels. ACM Queue, 4(4).

Z

Z-Axis

Third dimension of the Scale Cube: data partitioning (Sharding). Requests or data are distributed across different partitions by a key (customer ID, region). Well-established at the data layer (NoSQL), rare at the application layer — because Z-Axis requires stateful routing, which contradicts Stateless Design (a prerequisite for X-Axis).

See also:

Abbott & Fisher (2015a) — The Art of Scalability. 2nd ed. Addison-Wesley.

Zipf’s Law

The frequency of the kth most popular item is proportional to 1/k^α — a few items dominate, most are rare. Explains why caching works so spectacularly well (the top 1% generate 80% of traffic) and why it eventually isn’t enough: the Long Tail grows with traffic.

See also:

Variability, Percentile
→ digression: Zipf’s Law and the Limits of Caching
Breslau et al. (1999) — Web Caching and Zipf-like Distributions. IEEE INFOCOM 1999.