Compute & Moore's Law

From 2,300 transistors in the 1971 Intel 4004 to 100 billion+ in a modern AI accelerator, transistor counts have doubled roughly every 2–3 years. FLOPS per dollar of supercomputing have improved ~100× per decade for 70 years. AI training compute has been growing ~4–5× per year since 2010 — far faster than Moore's Law alone can deliver.

2,300
Transistors in Intel 4004 (1971)
100B+
Transistors in modern AI accelerators
~50yr
Years of Moore's-law-style scaling
4–5×/yr
AI training-compute growth (post-2010)

Key insights

📐

Moore's law is slowing, not stopping

From 1965 to ~2010, transistor density doubled every ~24 months (Moore's Law). Density growth has slowed to ~3 years per doubling since 2015 as feature sizes approach atomic limits. But chip performance hasn't slowed proportionally — gains now come from architectural changes (chiplets, 3D stacking, specialised accelerators), not pure scaling. Cost per transistor stopped falling around 2014; performance per dollar continues to improve but more slowly.

💡

AI compute is the new exponent

From 2010 to 2024, the compute used to train frontier AI models grew ~4–5× per year — far faster than Moore's Law (~1.4×/year). The gains came from parallel scaling (more GPUs in clusters) and architectural improvements (Transformer, mixed-precision training). GPT-4 training is estimated at ~2.1×10²⁵ FLOPs; Llama 3.1 405B at ~3.8×10²⁵; rumoured GPT-5/Llama-4-scale training at 10²⁶ and above.

🏭

Capex and energy are the new constraints

Top-end fabs (TSMC 3nm/2nm, Samsung 3nm) cost $20–30 billion each. EUV machines from ASML cost $200M per unit. Data-centre electricity demand from AI is doubling every 2 years. The constraint on compute scaling is shifting from chip-density physics to fab construction, machine availability, and grid interconnection — all of which scale slowly even with capital.

Transistor count per chip — landmark processors 1971–2024

Transistors per chip (log scale appropriate)

Key Finding: Approximate 100,000,000× increase over 50 years. Recent flagship AI accelerators reach 100+ billion transistors.

Training compute of notable AI models 2012–2024

Estimated training FLOPs, log scale appropriate

Key Finding: Frontier training compute has grown roughly 4–5× per year for 12 years — vastly outpacing Moore's Law alone.

Methodology & caveats

Moore's Law in its three meanings

(1) The original 1965 statement: transistor counts double every year. (2) The 1975 revision: every two years. (3) The marketing meaning: 'computers get exponentially better'. (1) and (2) refer to a specific count on a specific cost basis; (3) bundles density, frequency, parallelism and architecture. The first two have slowed; the third remains broadly true but at a slower rate.

FLOPS — peak vs sustained

Theoretical peak FLOPS = number of execution units × frequency × ops/cycle. Sustained FLOPS on real workloads is typically 30–70% of peak. AI training uses mixed precision (FP16/BF16/FP8) — quoted FLOPS often refer to the lowest-precision rate, which can be 4–8× higher than FP32 peak. Compare AI compute numbers carefully — apples and oranges are common.

Cost per FLOPS

FLOPS per dollar has improved roughly 4× per decade since 1950 — slower than transistor density because of fixed costs (packaging, system integration, cooling). For AI-relevant FP16/BF16 compute, the rate is faster: ~10× per decade. But total spend on AI compute is rising ~3× per year, far outpacing per-unit cost decline.