Chapter 9

The Room to Run

9 min read

Energy limits speed. Entropy limits memory.
— Seth Lloyd, 'Ultimate Physical Limits to Computation' (2000)

Physics permits computation far more efficient than anything we have built. The Landauer floor sits roughly six orders of magnitude below current practice. Seth Lloyd's ultimate physical limits lie thirty orders of magnitude beyond.(Lloyd 2000) But these numbers belong to physics. The relevant ceiling is set by engineering constraints that bind long before thermodynamics does, and by economic constraints that bind before engineering. The question is whether the headroom translates into cheaper computation or merely into more expensive capability.

The more useful benchmark is biological. The human brain performs on the order of $10^{16}$ to $10^{18}$ operations per second while consuming roughly 20 watts, an energy cost per operation on the order of $10^{-15}$ to $10^{-17}$ joules.(Laughlin 2001)(Carlsmith 2020) A modern GPU performs $10^{14}$ to $10^{15}$ floating-point operations per second while consuming 300 to 700 watts, an energy cost per operation on the order of $10^{-12}$ to $10^{-11}$ joules. The mapping from synaptic events to floating-point multiplies is not well-defined, so the comparison is imprecise. But the order-of-magnitude difference in energy scales is real: evolution has achieved efficiencies that silicon has not approached, and the brain demonstrates that useful computation can occur at energy scales on the order of $10^{-15}$ joules per operation.

A millionfold gap sounds impossible to close, but it represents only about twenty doublings of efficiency. The historical trajectory delivered a doubling every 1.57 years through 2011, a pace that has since slowed to roughly every 2.29 years as gains shift from process scaling to architectural innovation.(Koomey 2011) The pace has slowed as transistors approach atomic scales and as the dominant energy costs have shifted from switching logic to moving data: shuttling bits between memory and processor, keeping caches coherent, driving signals across chips and boards. But substantial headroom remains. The path is constrained less by thermodynamics than by interconnect, memory movement, fabrication, and the economics of deployment.

The data-movement wall

The six-order-of-magnitude gap between current silicon and the Landauer floor is misleading as a measure of headroom. Most of the energy in modern computation is not dissipated in logic gates. It is dissipated in moving bits. Mark Horowitz's 2014 Stanford data quantified the asymmetry: in a 45nm process, a 32-bit floating-point multiply costs approximately 3.7 picojoules, but fetching the operand from off-chip DRAM costs approximately 640 picojoules.(Horowitz 2014) The ratio is roughly 170 to one. The wall is data movement, not arithmetic.

Intel Labs' 2021 analysis of large DNN inference workloads found that data movement accounts for 62 percent of total energy. The figure has not improved with process-node advances. NVIDIA's H100 and B200 architectures address the bottleneck through HBM3e stacking, which shortens the path between memory and compute. The improvement is real but bounded. DRAM access remains orders of magnitude more expensive than logic, and stacking does not eliminate the cost of moving data across the memory hierarchy. Closing the logic gap without closing the memory gap yields diminishing returns. A processor that approaches Landauer efficiency in its arithmetic units still dissipates most of its energy in interconnect and memory access. The headroom implied by the Landauer comparison overstates what architectural improvements can deliver.

Engineering the gap

Architectural responses to the data-movement wall are emerging. IBM's NorthPole chip, announced in 2023, achieves roughly 12 times the energy efficiency of comparable 12nm architectures for neural inference by eliminating off-chip memory access entirely. Weights are stored on-chip. Intel's Loihi 2 and other neuromorphic approaches use event-driven computation that fires only when inputs change, reducing idle power. IBM's analog AI research uses phase-change memory devices to perform matrix multiplications in place, avoiding the von Neumann bottleneck. The work remains at the research stage but demonstrates the principle. Lightmatter's photonic interconnects move data between chiplets using light instead of electrons. If photonic I/O reaches production scale, it would address the data-movement wall directly.

The brain-to-silicon efficiency gap has narrowed over two decades. The factor-of- $10^3$ to $10^4$ difference between biological and silicon efficiency today represents a substantial reduction from the roughly $10^6$ gap observed in the early 2000s. The improvement has come primarily from architectural rather than process-node advances. Koomey's Law has bent as well. The original doubling time of approximately 1.57 years through 2011 has slowed to roughly 2.29 years in more recent measurements. Gains continue, but the trajectory has shifted. The room to run remains real. The rate of improvement has declined. None of these architectural innovations has yet reached the scale required to reshape aggregate datacenter energy consumption. NorthPole and Loihi serve niche applications. Analog AI and photonic interconnects remain in development. The dominant inference workload still runs on conventional GPUs with conventional memory hierarchies. The direction of travel is clear. The pace of adoption is uncertain.

The economic question is whether headroom becomes abundance. It does not, at least not automatically. Headroom describes the distance between current practice and physical possibility. It does not describe the relationship between efficiency gains and demand growth. If efficiency improves by a factor of two and demand grows by a factor of ten, total energy consumption rises fivefold even as cost per operation falls.

Headroom is not abundance. Efficiency gains will be absorbed by scale until infrastructure, not logic, binds.

The scaling laws that have driven capability improvement in foundation models are empirical regularities. So far they have shown no sign of saturating. Larger models, trained on more data with more compute, continue to outperform smaller ones across a wide range of tasks. The capability threshold for competitive relevance keeps rising. Last year's frontier model is this year's commodity. Each generation of models consumes the efficiency gains of the previous generation and demands more. GPT-3 required approximately 1,300 MWh to train.(Dean 2021) Subsequent frontier runs are widely reported to be at least an order of magnitude larger, and the next generation will require more still. The cost per token falls; the compute budget required to remain at the frontier rises faster.

Economists have a name for the pattern: Jevons paradox. When efficiency improvements reduce the cost of a resource, consumption often increases enough to offset the savings. Coal consumption did not fall when steam engines became more efficient; it rose, because more applications became economical. The same logic applies to computation. As inference becomes cheaper, more tasks become worth automating; as training becomes cheaper, more experiments become worth running; as models become more capable, more ambitious applications become feasible. In an installation phase, efficiency is rarely banked; it is converted into scope.

The physical evidence is already visible. The International Energy Agency's April 2025 special report on energy and AI estimates that data centers consumed approximately 415 terawatt-hours of electricity in 2024, roughly 1.5 percent of global electricity demand, and projects growth to 945 terawatt-hours by 2030.(Agency 2025) The United States alone consumed 183 terawatt-hours in 2024 and is projected to reach 426 terawatt-hours by 2030. The range across scenarios extends from 700 to 1,700 terawatt-hours by 2035, depending on deployment pace and efficiency gains. The upper bound approaches Japan's annual electricity consumption. Hyperscaler capital expenditure committed for 2025 alone exceeds $320 billion.

ERCOT's large-load interconnection queue illustrates the infrastructure bottleneck. By December 2025, the queue held 238.6 gigawatts of pending requests, a roughly 300 percent increase over the prior year, with more than 70 percent attributable to data centers.(Texas 2025) Only 7,502 megawatts had been approved to energize since January 2022. The gap between requested and delivered capacity is measured in orders of magnitude. Inference has overtaken training as the dominant AI energy consumer, now accounting for 60 to 70 percent of total AI energy expenditure, a reversal from the 70 to 80 percent training share observed in 2020-2022.

The physical constraints detailed in Part II, interconnection queues, transformer lead times, permitting battles, determine who can build capacity and how fast. The gap between announced capacity and deliverable capacity is measured in years. This is the constraint that determines who can scale and who cannot. Algorithms improve, chips shrink, and cost per operation falls, yet total energy consumption rises because the frontier absorbs the gains faster than they accumulate. Running requires power, and power requires physical infrastructure that cannot be conjured by software.

Computation will likely become cheaper per operation for decades to come, as the historical trajectory suggests. The relationship between efficiency and abundance is what matters, and that relationship is not automatic. Information may be copied cheaply. The bits that encode a trained model can be duplicated at negligible cost. But today's digital computation is purchased at the price of irreversibility, and the price, while falling, is not zero. The copy is downstream of an original expenditure that cannot be avoided. Every query to a language model, every frame rendered by a video generator, every decision made by an autonomous system requires joules, and those joules must come from somewhere.

This creates a tension that will shape the economics of the emerging regime. On one hand, the declining cost of computation makes intelligence more abundant and more accessible. Tasks that once required human expertise can be performed by machines at a fraction of the cost, and the range of such tasks is expanding. On the other hand, the physical basis of that intelligence—the energy, the chips, the cooling, the infrastructure—remains scarce and subject to constraints that do not yield to Moore's Law. The cloud is grounded in concrete and copper and silicon. The companies that control that ground will have advantages that do not reduce to software.