Digression: G/G/1 vs. G/G/c — Kingman and Multiple Servers
Kingman’s formula describes the waiting time in a queue with a single server — formally, a G/G/1 system. However, a connection pool with 50 connections is not a single server. Neither is a thread pool with 200 threads. So why do we use the G/G/1 formula?
Why the overestimation doesn’t matter
Kingman’s predictions of waiting times for systems with multiple servers are not just a little out; they are way off the mark. With 50 parallel connections and utilisation at 80%, the actual waiting time is far below Kingman’s prediction.
However, two factors put this into perspective.
The bottleneck is almost always G/G/1. A connection pool with 50 connections is a parallel system, but the database behind it has a single writer thread, a single lock on the hot row and a single buffer pool. While the system has many parallel servers around the bottleneck, the bottleneck itself is a single shared resource. This is the core insight of the Theory of Constraints: The throughput of the system is determined by its narrowest point, which is typically G/G/1.
In Part 1, the service doesn’t collapse because the thread pool is too small (c = 200), but because the connection pool with its ten connections is exhausted — as is the database with its shared resources behind it.
The levers remain the same. Whether G/G/1 or G/G/c: reduce variability, limit utilisation and optimise service time. The hockey-stick curve has the same shape for G/G/c, just shifted to the right — the steep rise begins at higher utilisation. Adding more parallel servers increases capacity, but does not change the fundamental behaviour. Halving variability halves the waiting time for both G/G/1 and G/G/c.
Two tools, two roles
Kingman doesn’t provide an exact millisecond estimate for a thread pool with 200 threads. However, that’s not what we want the estimate to be based on.
Little’s Law calculates. $L = \lambda \times W$ holds for every stable system, whether it has one server or fifty. There is no approximation or restriction. If you want to know how many database connections a service needs, perform the calculation: 500 requests per second × 70 ms service time = 35 concurrent connections. Exact.
Kingman explains. Why do response times increase so much at 85% utilisation? Because the factor, which is given by the formula $\frac{\rho}{1-\rho}$, grows non-linearly: at 50% utilisation, the factor is 1; at 85% utilisation, it’s 5.7; and at 95% utilisation, it’s 19. What does halving variability achieve? It reduces waiting time by exactly half. Kingman demonstrates which lever has the stronger effect and why.
Little provides the arithmetic — how much exactly. the exact amount. Kingman provides the mechanics — explaining why, in which direction and in what proportions. Together, they provide all the information needed for capacity decisions: Little provides the dimensioning and Kingman provides the understanding.
Sources
- Kingman (1961) — The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, 57(4).
- Erlang (1917) — Solution of some problems in the theory of probabilities of significance in automatic telephone exchanges. Elektroteknikeren, 13.
- Hopp & Spearman (2011) — Factory Physics. 3rd ed. Waveland Press. Ch. 8.
Comments