Modeling Moore's Law:
two models of faster than exponential growth

Hermann Brunner

01/2005

Historical computing speed data, covering the time period from 1891 to 2004, were fit to models of (1), hyperbolic and (2), double-exponential growth: Model (1) predicts a mathematical singularity around the year 2041 (90 % confidence range: 2032-2052), with doubling times of computing speed of currently approximately one year, decreasing in proportion to the time left until the singularity (Figs. 1, 3, 5). Model (2) predicts exponentially decreasing doubling times with an e-folding time of about 68 years (Figs. 2, 4). Both models fit the data about equally well. An exponential model with constant doubling times, the classical Moore's Law, can be excluded with very high statistical significance. It is expected that it will be possible, within the next several years, to exclude either model (1) or model (2) on statistical grounds, as additional computing speed data become available.

Introduction

Moore's Law, in its original 1965 incarnation, referred to the doubling time of the number of transistors that can be put onto one square inch of silicon chip. Over the years, it has taken on a variety of different meanings, with the emphasis shifting from transistors to computing speed. The following discussion is based on a generalized version of Moore's Law which studies the increase of the amount of computing speed (expressed as MIPS) that can be bought at a fixed price (1000 inflation corrected year 2000 US Dollars). This is the definition used by Hans Moravec in his 1999 Robot book as well as by Ray Kurzweil in his same year book The Age of Spiritual Machines and I make use of the compilation of computing speed and price data from Hans Moravec's web site (post-2003 data added by me). The data were inflation corrected using consumer price index data from the Department Of Labor, Bureau of Labor Statistics. Hans Moravec's compilation covers a wide variety of mechanical and electro-mechanical computing devices, as well as vacuum-tube, discrete electronics, and modern silicon chip based computers. While most versions of Moore's Law claim a time-constant doubling time of either transistors per chip or computing speed, there appears to be evidence of a shortening of the doubling time of computing speed through most of the twentieth century, as technology proceeded, through a sequence of steps, from mechanical computing devices to modern state-of-the-art computers (see plot from Hans Moravec's book). The classical Moore's Law with constant doubling times can be expressed in terms of an exponential law, P(t) ~ exp(t), resulting in a straight line on a plot of logarithmic computing speed over linear time. Different mathematical descriptions are needed to describe the observed faster than exponential growth. I will discuss two models of (1), hyperbolic and (2), double-exponential growth:
(1) P(t) ~ (t0-t)^-n 
and
(2) P(t) ~ exp(exp((t-t0)/n))
Function (1) is a time-honored model of faster than exponential growth. See the seminal 1960 Science paper by Heinz von Foerster, Patricia Mora, and Larry Amiot (vol. 132, pp. 1291-1295) in which they apply their socalled coalition growth model to world population growth. Double-exponential growth is proposed by Ray Kurzweil in his book and on his web site (The Law of Accelerating Returns). The main difference between hyperbolic and double-exponential growth is that, as t approaches t0, P(t) of function (1) grows beyond all bounds and a mathematical singularity is reached at t=t0. Function (2) also steepens faster than exponential, with decreasing doubling times, but continues to do so forever.

The von Foerster et al. model is based on the assumption that the population growth rate, dP/Pdt, is a function of the already achieved population size, dP/Pdt ~ f(P), and there obviously are a number of reasons one can think of, why this might be the case. Similarly, applying this argument to Moore's Law, the assumption is made that the growth rate of computing power is a function of the already available computing power. Different growth models arise, depending on the choice of function f: A power-law dependence, dP/Pdt ~ P^m, has the solution P(t) ~ (t0-t)^-n with n=1/m, i.e., function (1), above. In the special case, m=0, dP/Pdt ~ 1 leads to exponential growth, P(t) ~ exp(t). Assuming a logarithmic dependence, dP/Pdt ~ log(P), instead of a power law, the resulting time dependence is given by function (2), P(t) ~ exp(exp((t-t0)/n). An important distinction between model (1) and model (2) is that the growth law of model (1) is scale-invariant or self-similar, i.e., it has no inherent time scale. Note, that parameter n of model (1) is dimensionless, as opposed to the e-folding time, n, of model (2) which therefore is not scale-invariant.

From dlog(P)/dt = log(2)/T2, the doubling times, T2, may be calculated. In the case of model (1), T2 is proportional to the time left until the singularity is reached, while for model (2) the doubling time decreases exponentially with e-folding time n, resulting in a straight line in a plot of logarithmic doubling time over linear time. If n of model (2) approaches infinity, the limiting case of exponential growth with constant doubling times is reached. See Table 1, below, for an overview of the different functions.

Table 1

model                 differential equation  time dependence            doubling time
-----------------------------------------------------------------------------------------------------
original Moore's Law  dP/Pdt = constant      P(t) ~ exp(t)              T2 = constant
(1)                   dP/Pdt ~ P^m           P(t) ~ (t0-t)^-n; n=1/m    T2 = log(2)/n * (t0-t)
(2)                   dP/Pdt ~ log(P)        P(t) ~ exp(exp((t-t0)/n))  T2 = n*log(2) * exp((t0-t)/n)

Chi-square fitting

With the above definitions out of the way, let us now discuss whether functions (1) or (2) actually fit the observed historical increase of computing speed. For this purpose, I performed chi-square fits of both functions to the data compiled by Hans Moravec. Statistical errors were assigned to the data based on the scatter between neighboring points. For each data point, the errors were set to the standard deviation of groups of 15 neighboring data points, of the fit residuals of an initial fit performed with constant weights. This procedure slightly overestimates the statistical errors and thus underestimates the best-fit chi-square values; the resulting probabilities may thus be regarded as conservative, when considering the exclusion of particular models based on the quality of the fit. Since P(t) covers about 15 orders of magnitude, both the error determination and the fitting were performed on a logarithmic scale (base e logarithm) in order to avoid numerical problems in the chi-square minimization code. A standard Levenberg-Marquardt algorithm was used to minimize the chi-square values for the following fitting functions:
(1) P(t) = C * (t0-t)^-n
(2) P(t) = C * exp(exp((t-t0)/n)
with the free parameters C, t0, and n. Results are summarizes in Table 2 (best-fit values of scaling parameter C are omitted).

Table 2

model  best-fit    degrees of  probability  best-fit parameters    doubling time
       chi-square  freedom     (a)          t0 [yrs]  n            [yrs] (b)
-------------------------------------------------------------------------------------
(1)    143.82      133         0.246        2041      26.5         (2041-t)/38.2 
(2)    143.21      133         0.257        1747      68.2 [yrs]   exp((2010-t)/68.2)

(a) probability of the model fitting the data
(b) calculated from best-fit parameters
Since most of the statistical weight is provided by the data points after 1950, qualitatively similar results are obtained if one restricts the analysis to electronic computers (vacuum-tube, discrete transitor, and silicon chip based), neglecting the early data points of mechanical and electro-mechanical computing devices.

An attempt to fit the data with a simple exponential, i.e., the classical Moore's Law with constant doubling times, resulted in a best-fit chi-square of 231.21 with 134 degrees of freedom, corresponding to a probability of 3.7e-7, thus excluding this model with very high statistical significance. Limiting the analysis to post-1950 data, constant doubling times are excluded with a statistical significance of 99.92 % (chi-square=161.62 with 109 degrees of freedom).

Statistical errors for the best-fit parameters t0 and n were determined by calculating the chi-square values for a grid of t0 and n values (fitting parameter C at each grid point) and taking the minimum chi-square + 4.61 and minimum chi-square + 9.21 (chi-square for 2 degrees of freedom, corresponding to two free parameters of interest) contours as delimiting the 90 % and 99 % confidence areas. The 90 % and 99 % confidence ranges for t0 and n are listed in Table 3.

Table 3

model  t0 confidence range  n confidence range
----------------------------------------------
(1)    90%: 2032-2052       90%: 22.9-31.3
       99%: 2030-2058       99%: 21.8-34.0
(2)    90%: 1704-1782       90%: 60-78
       99%: 1680-1794       99%: 57-83
Fig. 1 (model 1) and Fig. 2 (model 2) display the range of model curves corresponding to the 99 % confidence area of the model parameters, plotted on top of the data points and error bars. The corresponding doubling times are shown in Fig. 3 (model 1) and Fig. 4 (model 2). Fig. 5 shows the 90 % and 99 % error contours for model (1). Since parameter t0 and n are strongly correlated, resulting in elongated (banana shaped) error contours, n is expressed in terms of the largely model-independent 1980 doubling time, T2_1980 = log(2)/n * (t0-1980), instead.

Discussion

Based on the best-fit chi-square values, the fits of both function (1) and (2) are statistically acceptable, with function (2) fitting the data slightly better. However, the difference is not large enough to favor either function on purely statistical grounds. Exponential growth with constant doubling times can be excluded with very high statistical significance. According to model (1), computing power reaches a singularity somewhere between 2030 and 2058, with a best-fit value of 2041. Parameter t0 of model (2) roughly coincides with the beginning of the industrial age, with most of the faster than exponential growth occuring after that time. The e-folding time, n, of the doubling time is about 68 years - one human lifetime. Model (2) thus seems to capture some of the expected behaviour of technological growth.

Are computing speeds really on track towards a singularity in the near future ? Since one can construct arbitrarily steep functions, which do not have a singularity (e.g. by going from exp(exp(t)) to exp(exp(exp(t))) and so on, adding an arbitrary number of additional exponential functions), it will not be possible to decide this question through model-fitting. However, it would still be possible that, as time proceeds, function (1) will turn out to be the only reasonably simple function fitting the data, thus favoring model (1) on the ground of invoking Occam's Razor. The self-similar growth of model (1), as well as scale-invariant power-laws in general, arises under a wide range of conditions in nature, thus perhaps also favoring model (1) over model (2). World population closely followed the von Foerster et al. model for millennia. It stopped to do so around 1975, presumably because population density had reached levels where the simple model assumptions no longer applied. Computing speeds may repeat this pattern, closely following model (1) until final physical limits or economic constraints force a break of the trend. Close inspection of the data shows a slight flattening of the curve after about 1998. It remains to be seen whether this is a temporary deviation from the long-term trend or a first indication of the end of faster than exponential growth.

If due to less steeply rising computing speeds, model (1) will cease to be an acceptable fit to the data, with the data perhaps following model (2), we would be saved from the embarrassment of near infinite computing speeds within mere decades. Figs. 1, 2, 3, and 4 suggest that it should be possible to distinguish between model (1) and (2) within the next several years, probably not later than 2010. I intend to update the above model fits as additional data become available in coming years. So, come back to this page to see how the story unfolds ...