Digression: G/G/1 vs. G/G/c — Kingman and Multiple Servers

Kingman’s formula describes the waiting time in a queue with a single server — formally, a G/G/1 system. However, a connection pool with 50 connections is not a single server. Neither is a thread pool with 200 threads. So why do we use the G/G/1 formula?

Why the overestimation doesn’t matter

Kingman’s predictions of waiting times for systems with multiple servers are not just a little out; they are way off the mark. With 50 parallel connections and utilisation at 80%, the actual waiting time is far below Kingman’s prediction.

However, two factors put this into perspective.

The bottleneck is almost always G/G/1. A connection pool with 50 connections is a parallel system, but the database behind it has a single writer thread, a single lock on the hot row and a single buffer pool. While the system has many parallel servers around the bottleneck, the bottleneck itself is a single shared resource. This is the core insight of the Theory of Constraints: The throughput of the system is determined by its narrowest point, which is typically G/G/1.

In Part 1, the service doesn’t collapse because the thread pool is too small (c = 200), but because the connection pool with its ten connections is exhausted — as is the database with its shared resources behind it.

The levers remain the same. Whether G/G/1 or G/G/c: reduce variability, limit utilisation and optimise service time. The hockey-stick curve has the same shape for G/G/c, just shifted to the right — the steep rise begins at higher utilisation. Adding more parallel servers increases capacity, but does not change the fundamental behaviour. Halving variability halves the waiting time for both G/G/1 and G/G/c.

Two tools, two roles

Kingman doesn’t provide an exact millisecond estimate for a thread pool with 200 threads. However, that’s not what we want the estimate to be based on.

Little’s Law calculates. $L = \lambda \times W$ holds for every stable system, whether it has one server or fifty. There is no approximation or restriction. If you want to know how many database connections a service needs, perform the calculation: 500 requests per second × 70 ms service time = 35 concurrent connections. Exact.

Kingman explains. Why do response times increase so much at 85% utilisation? Because the factor, which is given by the formula $\frac{\rho}{1-\rho}$, grows non-linearly: at 50% utilisation, the factor is 1; at 85% utilisation, it’s 5.7; and at 95% utilisation, it’s 19. What does halving variability achieve? It reduces waiting time by exactly half. Kingman demonstrates which lever has the stronger effect and why.

Little provides the arithmetic — how much exactly. the exact amount. Kingman provides the mechanics — explaining why, in which direction and in what proportions. Together, they provide all the information needed for capacity decisions: Little provides the dimensioning and Kingman provides the understanding.


Sources

  • Kingman (1961) — The single server queue in heavy traffic. Mathematical Proceedings of the Cambridge Philosophical Society, 57(4).
  • Erlang (1917) — Solution of some problems in the theory of probabilities of significance in automatic telephone exchanges. Elektroteknikeren, 13.
  • Hopp & Spearman (2011)Factory Physics. 3rd ed. Waveland Press. Ch. 8.

Comments