Gustafson's Law – When the Problem Grows With the Resources

A deep dive accompanying the Scalability series.

Amdahl’s Law is common knowledge in computer science: parallelisation has a hard ceiling. Its more optimistic counterpart is Gustafson’s Law, which interprets the same calculation differently. But first things first.

The obvious hope is that more processors will lead to proportionally more speed. Anyone distributing a computation from one core to ten would want it back ten times faster. In practice, however, the speedup almost always falls short of this, and beyond a certain point, each additional core brings hardly any benefit. The reason for this lies in the serial part of the work, the part that simply cannot be divided up.

This is precisely what Amdahl’s Law captures, delivering an uncomfortable message: the serial fraction caps the maximum acceleration. With a 10% serial fraction, the maximum is 10, no matter how many processors are added (covered in detail in Amdahl’s Law and the Limits to Growth). This may sound depressing, but in 1988, John Gustafson argued that, while the calculation is correct, it is misleading.

The core critique

Gustafson worked at the Sandia National Laboratories on massively parallel machines with over 1,000 processors, and he often achieved greater acceleration than Amdahl’s law would allow.

Amdahl’s model applies to a fixed problem: a developer has existing code and the problem size is fixed; more processors are intended to reduce execution time. It is precisely at this point that the cap imposed by the serial fraction takes hold.

However, Gustafson had a different problem in mind. Those running weather simulations do not want to solve the same problem faster with more processors, but rather a larger problem: a finer grid, more data points and higher resolution in the same computation time. The problem size is not fixed; it grows with the resources available. Under this assumption, scaling looks entirely different.

The formula

Gustafson formulated this observation as a measure of scaled speedup, which is defined as the amount of work managed by a parallel system compared to a single processor:

\[S(n) = n - s \times (n - 1) \quad\text{or}\quad S(n) = s + n \times (1 - s)\]
  • $n$ is the number of processors.
  • $s$ is the time fraction the serial part takes up on the parallel system – not the share of the overall problem, but the share of the actual execution time.
  • $S(n)$ is the scaled speedup.

The crucial detail lies in the definition of $s$. For Amdahl, $s$ is the proportion of the problem that must remain serial, which is an inherent property of the algorithm independent of the number of processors. In contrast, for Gustafson, $s$ is the proportion of the measured runtime taken up by the serial code. As the problem and the parallel part grow proportionally, $s$ shrinks as $n$ increases.

A climate simulation, for example, requires an initialisation period of one hour (serial), followed by the actual computation (parallel). With 100 processors, the computation also takes one hour, meaning the serial time fraction is $s = 0.5$, and the scaled speedup is $S(100) = 100 - 0.5 \times 99 = 50.5$. Increasing the number of processors tenfold to 1,000 still results in a one-hour computation, but now on a grid with ten times as many points. $s$ remains 0.5 and $S(1000) = 1000 - 0.5 \times 999 = 500.5$. The speedup grows almost linearly with the number of processors.

For the same $s = 0.5$, Amdahl would have stopped at a factor of $1/s = 2$. This contradiction is resolved through the assumption: Amdahl’s factor of two applies when the initialisation time of one hour is a fixed proportion of the overall problem. However, as soon as the parallel part grows with the number of processors, that same hour becomes relatively smaller and smaller – and the scaled speedup pulls away.

Where the calculation works out

The clearest examples of this come from high-performance computing. For example, a weather simulation uses more processors to achieve a finer spatial grid: the number of grid points increases linearly with the number of processors, while the time taken for initialisation (loading data and setting boundary conditions) remains almost constant. This results in almost linear speedup, as predicted by Gustafson.

The same pattern repeats with rendering. A render farm does not distribute one frame across all nodes; rather, each frame is sent to its own node. More nodes mean that the same film can be produced in a higher resolution or with more detail in the same time, rather than a single frame in a fraction of the time. In genome sequencing, more compute nodes do not accelerate the sequencing of a single genome; rather, they process more genomes in parallel. Each node works independently on a different dataset, without coordinating with the others. Such problems, which require no coordination between units, are called ‘embarrassingly parallel’.

The pattern is the same in all three cases: the parallel part grows proportionally to the number of processors, while the serial part – initialisation, coordination and merging the results – either remains absolutely constant or grows much more slowly.

Where it breaks down

The formula hinges on an easily overlooked assumption: the serial part must remain absolutely constant as the number of processors increases. In HPC examples, this is plausible – one hour of initialisation remains one hour, regardless of whether 100 or 10,000 processors are computing.

However, this assumption breaks down as soon as the coordination between units grows with $n$. If each new processor brings additional communication overhead as well as its parallel part, then $s$ is no longer constant – it rises with $n$, and the optimistic forecast collapses.

This is precisely what occurs in distributed systems that do not employ the shared-nothing pattern. Cache invalidation, distributed locks and consensus algorithms all create coordination costs that grow with the number of nodes. Gustafson’s approach is therefore of no further help here; Gunther’s Universal Scalability Law (see Amdahl’s Law and the Limits to Growth) models this case, with its quadratically growing coherency term κ capturing precisely the costs that Gustafson ignores.

Amdahl or Gustafson – when does which apply?

The two laws are not competitors; they answer different questions. Amdahl’s law applies when the problem size is fixed, such as a search service designed to handle 1,000 requests per second with a response time of 50 ms. In this case, the key question is how much parallelism can help. Gustafson applies when the problem size grows alongside the available resources, such as in a data analysis on a cluster where more nodes mean more data rather than faster processing.

For most web systems, Amdahl is clearly the better option. The load comes from outside and cannot be freely chosen. No one says, ‘We now have ten servers, so let’s double the number of pages.’ Users either come or they don’t – the question is always whether capacity is sufficient. This scaling under an externally imposed load is precisely what my series on the scalability of web systems is about.

However, Gustafson’s optimistic forecast hinges on a constant serial fraction, which does not stay constant under load. More concurrent users mean more coordination around shared resources, such as a shared database or locks on the same records, and this serial fraction grows with the load instead of shrinking as more servers are added. Therefore, Gustafson is not a counterargument to Amdahl, but rather a reminder of its limits and of conditions that are rarely met in practice on their own.


References

  • Gustafson (1988) – Reevaluating Amdahl’s Law. Communications of the ACM, 31(5), 532–533.
  • Amdahl (1967) – Validity of the single processor approach to achieving large scale computing capabilities. AFIPS Spring Joint Computer Conference Proceedings, Vol. 30, 483–485.
  • Gunther (2007)Guerrilla Capacity Planning. Springer.

Comments