Chapter 14

Who Closes the Loop

16 min read

Loss generally occurs when a player overrates his advantage.
— Emanuel Lasker, Lasker's Manual of Chess (1927)

Factor Prime exhibits a recursive property: computation that improves the production of computation. But recursion at any rate is not the same as recursion at an accelerating rate. The question that matters most for the trajectory of the transition is how fast the loop can tighten, and what determines its limits.

The recursive property—the capacity of computation to reduce the cost of producing additional computation—is not categorically new. Previous key inputs also participated in feedback loops. Steam power improved coal extraction via pumps, rail, and mechanized hauling. Electricity and control systems optimized grids. Oil improved oil production via petrochemicals and better drilling rigs. The loops existed. Humans closed them.

What differs is who closes the loop.

The Pacing Function

Recursion matters only insofar as it changes the pacing function. In prior paradigms, the binding constraint was human: observation, hypothesis, design, prototyping, coordination. Factor Prime shifts a growing portion of those steps into computation. If the share becomes large enough, progress becomes compute-paced rather than human-paced. That is the regime change this section tests.

Call recursion strong when a rising share of the innovation pipeline is executed by computation, so that capability gain per calendar month rises with deployed inference throughput. Throughout, "capability gain" means improvement on a fixed, deployment-relevant evaluation basket at fixed inference constraints—latency, cost, reliability—so that the metric cannot be gamed by shifting goalposts. A useful proxy is the recursive share (ρ): the fraction of AI R&D work—code generation, evaluation, data synthesis, architecture search, debugging—performed by models and tools rather than humans. When ρ is low, progress is human-paced. When ρ is high, progress becomes compute-paced. The empirical question is where we are on that spectrum and how fast we are moving.

ρ is not directly observable. Artifacts are proxies for work: we infer "share of work" from verified outputs weighted by the human effort they displace. ρ can be approximated by the share of model-generated artifacts that survive downstream verification without human rewrite: accepted commits, retained evaluation suites, shipped workflow policies, experiment plans executed end-to-end, and PPA improvements validated in tape-out outcomes (power/performance/area). ρ is a weighted share: an accepted commit counts more than a lint fix; a retained eval suite counts more than a prompt rewrite. The weight is verification cost avoided or time-to-ship avoided. For ρ to indicate recursion rather than fashion, it must predict out-of-sample iteration speed and validated improvement rates after controlling for team size and spend.

Ideas as Inputs to Idea Production

Paul Romer formalized the insight that knowledge differs from ordinary goods.(Romer 1990) Physical goods are rivalrous: if one firm uses a machine, another cannot use it at the same time. Knowledge is non-rivalrous: if one firm uses a formula, another can use it too, without diminishing the first firm's use. This non-rivalry means returns to knowledge can be increasing rather than diminishing.

Romer's framework explains why economies that invest in research can grow faster than economies that do not. But it treats the production of knowledge as separate from the production of goods: an R&D sector produces ideas, a goods sector uses them. The two are linked, but they remain distinct, and human researchers occupy both.

Factor Prime compresses this separation. The tools that produce knowledge are themselves computational and increasingly automated. A neural network trained to generate code can write code that trains neural networks. A chip design tool can design the next generation of chip design tools. The output of the process is an input to the process.

The key question is not whether ideas are non-rival, but whether the production function for ideas becomes compute-intensive rather than labor-intensive. Rising ρ is the mechanism. Scale increases the size of training runs; recursion changes the rate at which the process discovers better ways to train, evaluate, and deploy.

Growth Regimes

Charles Jones introduced a framework for distinguishing growth regimes.(Jones 2005) In semi-endogenous models, the rate of idea production depends on the number of researchers and the existing stock of knowledge, but each additional idea is harder to find than the last. Growth converges to a steady rate determined by population growth or other exogenous factors. The feedback loop exists, but it is damped; the system is stable. In fully endogenous models, ideas beget ideas without diminishing returns; the production of knowledge can accelerate until some other constraint binds. The feedback loop is undamped; the system can exhibit superlinear dynamics over a range.

The empirical question is which regime we are in. If Factor Prime production is semi-endogenous, then the current acceleration will eventually level off. The recursion provides a tailwind, but human researchers remain the pacing constraint, and human population growth is slow. If Factor Prime production is closer to fully endogenous over the relevant range, then the acceleration can continue until physical limits—energy, materials, heat dissipation, coordination costs—become binding.

The distinction between these regimes has a formal precursor. Philippe Aghion and Peter Howitt formalized the mechanism through which each innovation renders its predecessor obsolete: creative destruction as a growth engine, with the rate of innovation endogenous to the economy's research intensity.(Howitt 1992) The recursive share ρ is a special case of their research-intensity parameter: when the technology under improvement performs the improvement, the growth rate depends on the fraction of compute allocated to self-improvement versus deployment. The power-law relationship that Kaplan et al. established between compute and capability provides the empirical warrant.(others 2020) Hoffmann et al. subsequently showed that the original scaling was compute-suboptimal, shifting the efficient frontier toward smaller models trained on more data.(others 2022) Together, these results mean that ρ connects to capability gain through a quantified function rather than through the qualitative intuition that more research produces more results.

Why History May Stop Being a Guide

William Nordhaus addressed this question directly in a 2021 paper titled "Are We Approaching an Economic Singularity?"(Nordhaus 2021) He examined the conditions under which technological progress could accelerate without limit. His conclusion: the historical record shows no evidence of such acceleration. Growth rates have fluctuated but have not systematically increased over the past century. The returns to R&D investment have not risen. A regime shift would likely appear first in micro-metrics of discovery and iteration, long before it appears in aggregate GDP growth.

Nordhaus did not rule out the possibility. The key variable, in his analysis, is whether the automation of innovation itself becomes possible. If machines can contribute to invention, and if the machines that invent can be improved by the machines they help create, then the feedback loop can tighten beyond any historical precedent. The historical record would cease to be a reliable guide.

This is the question Factor Prime raises. The recursion is not new. What may be new is the degree to which computation, rather than human cognition, closes the loop.

Previous paradigms preserved a human pacing function. This one may not. The rate of improvement could depend on machine effort, and machine effort can scale in ways that human effort cannot.

If the pacing function shifts, governance lag becomes a first-order variable: institutions trained on human-cycle times will tend to underreact.

Most technological revolutions preserve a human pacing function. The rate of improvement in steam engines depended on how fast human engineers could design, test, and build better engines. The rate of improvement in automobiles depended on how fast human designers and factory workers could iterate on the product. Humans set the tempo, and the tempo was limited by human cognition and coordination.

Factor Prime tightens the computational portions of these loops. Consider a concrete closed loop: a model proposes an architecture change, data augmentation, or evaluation refinement; automated training and evaluation runs execute the proposal; the model ingests the results and updates its proposal distribution toward changes that improve the fixed basket under the constraint envelope. The loop executes end-to-end without a human choosing each experiment. Cycle times compress from months to days or hours where computation substitutes for cognition. Parallelism increases: thousands of experiments can run simultaneously. The feedback bandwidth can rise dramatically, even if key decision points still require human oversight and even if physical portions—fab construction, power plant permitting, grid interconnection—remain human-paced.

Cognitive loops may become machine-paced while physical loops remain human-paced. The observed regime will be a hybrid until infrastructure catches up.

Separating Three Productivity Claims

Three distinct claims about productivity must be separated to avoid confusion:

Compute efficiency: capability per training compute dollar. This measures hardware and algorithmic progress. It has improved rapidly, but it does not directly measure whether recursion is tightening.

Research productivity: capability gain per researcher-hour or per dollar of R&D spending. This measures whether the human bottleneck is loosening. The historical pattern shows declining research productivity across most fields. If AI reverses this trend in AI research itself, that would be evidence for regime shift.

Growth rate dynamics: whether the rate of improvement is itself increasing, stable, or decreasing. This is Nordhaus's ultimate test. Semi-endogenous dynamics produce high but stable growth rates; fully endogenous dynamics produce accelerating growth rates.

The recursion hypothesis predicts a specific pattern: output should shift from being headcount-limited to being compute-limited. If the hypothesis is correct, research productivity per researcher-hour should rise as ρ increases, because computation is substituting for human cognitive bottlenecks.

Empirical Tests

This framing yields specific tests. The claim that Factor Prime is shifting the growth regime implies observable consequences, sequenced from mechanism to macro outcome:

Recursive share (ρ). If the recursion is tightening, ρ should rise over time in frontier labs and should correlate with iteration speed. Organizations with higher ρ should produce more capability gain per calendar month. The claim fails if ρ stays flat, and more compute merely enables bigger training runs rather than faster discovery—in that case, the recursion is not automating the relevant bottleneck.

Decision latency. If computation is substituting for human cognition in the improvement loop, the time from anomaly detection to hypothesis to patch to verified improvement should compress. This is the cycle time of research decisions, not just model generations. If the median time-to-verified-fix does not fall, the human-paced portions of the loop remain dominant.

Compute-to-discovery elasticity. If the recursion is computational rather than human-limited, validated improvement events per unit time (a statistically significant uplift on the fixed basket at constant constraints, surviving an OOD slice) should scale with compute provisioned rather than with senior researcher count. If validated improvements remain proportional to headcount regardless of compute investment, the recursion has not shifted the pacing constraint.

Physical constraint binding. If the recursion is strong, physical constraints—energy, chips, cooling, interconnect—should bind before cognitive constraints. Investment should flow toward infrastructure rather than toward hiring researchers. If talent remains the binding constraint and infrastructure is abundant, the recursion has not shifted the bottleneck.

Growth rate acceleration. Nordhaus's ultimate test. If the recursion is producing fully endogenous dynamics, the growth rate of relevant metrics—capability, efficiency, deployment—should itself be increasing, not merely high. If growth rates are high but stable, the dynamics remain semi-endogenous.

Current Evidence

The evidence points in the predicted direction, but the measurement window is short and confidence is bounded.

Compute efficiency has improved rapidly. Organizations are exploring far more of the design space per unit time. Whether this has already reversed researcher-hour productivity is not yet settled. Cycle times between major model generations have compressed. Parallelism has scaled dramatically; frontier labs run thousands of experiments simultaneously. Physical constraints are increasingly cited as binding; infrastructure investment is outpacing researcher hiring at the margin.

But the growth rate of capability has not obviously accelerated. Scaling laws suggest predictable improvement with compute, not accelerating improvement. The laws could steepen or saturate at any point; they are empirical regularities, not physical necessities. The recursion creates the possibility of acceleration. The evidence does not yet confirm it.

The uncertainty is genuine, and the framework does not claim to resolve it. What can be said is this: Factor Prime introduces a structural possibility that previous paradigms did not. If computation can contribute to its own improvement, then the pacing function that has governed all previous technological transitions may shift. The rate of progress would depend not only on human effort but on machine effort, and machine effort can scale in ways that human effort cannot.

Whether it will scale, and how far, remains an open question. Energy must be generated, transmitted, and dissipated. Chips must be fabricated, packaged, and connected. Data centers must be sited, powered, and cooled. Coordination costs rise with scale. Regulatory and social constraints may bind before physical ones do. The recursive property creates the possibility of acceleration; physics, economics, and institutions will determine the outcome.

Calibrating ρ

The recursive share can now be estimated from primary sources: earnings calls, controlled studies, developer surveys, and frontier lab disclosures. The evidence suggests ρ has grown roughly 5–7× from 2022 to 2025, triangulated across three independent measurement channels: code generation acceptance rates from telemetry(GitHub 2025), developer adoption surveys(Overflow 2025)(JetBrains 2024), and autonomous task-completion benchmarks(METR 2025). The channels track different facets of ρ, so consistency across them is more informative than any single signal in isolation.

Capability benchmarks set the ceiling on what ρ can become: tasks that models cannot complete cannot contribute to the recursive share. On SWE-bench, which measures end-to-end resolution of real GitHub issues, success rates rose from under 2% for unscaffolded GPT-4 (2023) to approximately 49% for Claude 3.5 Sonnet with agent scaffolding (late 2024) to above 60% for frontier agentic systems in early 2025 (Narasimhan 2024). On HumanEval, a function-level code generation benchmark, pass@1 rates climbed from roughly 48% (GPT-3.5, 2023) to 67% (GPT-4, 2023) to above 90% for late-2024 frontier models (Zaremba 2021). The trajectory is steep, and some previously demanding benchmarks now appear to be nearing saturation for top systems.

Code generation dominates the measurable signal. GitHub Copilot generates 46% of code in enabled files (61% for Java), with an 88% retention rate for accepted suggestions.(GitHub 2025) Frontier labs report dramatically higher internal figures. Anthropic's CEO stated at Dreamforce in September 2025 that "90% of code is written by AI models" at Anthropic and partner companies, though engineers remain essential for supervision. Google confirmed over 25% of new code is AI-generated (Pichai, Q3 2024 earnings). OpenAI reports 95% weekly usage among engineers with 70% more pull requests merged.

METR's research suggests task horizons are roughly doubling every seven months. Their data(METR 2025) shows the time horizon at which models complete 50% of tasks has grown from approximately 30 seconds (GPT-2, 2019) to roughly one hour (Claude 3.7 Sonnet, early 2025). The doubling rate has accelerated to approximately four months in 2024–2025. Extrapolation suggests day-long task horizons by 2027 and week-long horizons by 2028, though extrapolation from steep curves is hazardous.

Developer adoption has reached majority penetration. Stack Overflow surveys(Overflow 2025) show developers using or planning to use AI tools rose from 70% (2023) to 84% (2025). Daily usage among professional developers hit 51% in 2025. Enterprise adoption trails but accelerates: Gartner projects 90% of enterprise engineers will use AI code assistants by 2028, up from under 14% in early 2024.

Observed ρ trajectory:

Year	ρ (Lower)	ρ (Central)	ρ (Upper)	Key Evidence
2022	2%	3–5%	8%	Copilot launch, 26% acceptance, limited enterprise
2023	5%	8–10%	15%	44% developer adoption, early enterprise pilots
2024	8%	12–15%	22%	62% adoption, 25%+ Google code AI-generated
2025	12%	18–22%	35%	80% adoption, 90% at frontier labs, 51% daily use

Projected ρ trajectory (if trends continue):

Year	ρ (Lower)	ρ (Central)	ρ (Upper)	Key Evidence
2026	20%	28–35%	45%	Day-long task horizons emerging
2027	30%	40–50%	60%	Week-long task horizons

Central estimates weight code generation share (30–46% × 88% retention × 60% of R&D time per Epoch AI, 2024) plus contributions from architecture search, debugging, and evaluation. Lower bounds assume significant verification overhead; upper bounds include AI assistance for search, documentation, and ideation.

Verification overhead persists. The 88% retention rate implies 12% rejection. Experienced developers work 19% slower with current AI tools on their own repositories(METR 2025), cautioning against conflating benchmark capability with effective automation. The slowdown is not anomalous: experienced developers have deep mental models of their codebases that AI tools cannot access, so the tools add friction rather than removing it in familiar territory. Trust in AI accuracy has declined from 43% (2024) to 29% (2025) even as adoption rises—developers use the tools despite recognizing their limitations. The top frustration: "AI solutions almost right but not quite" (66% of respondents).

Recursive self-improvement is observable but unquantified. Anthropic has stated that Claude is used "to help build products on top of Claude and to help train the next Claude." This represents evidence of ρ feeding into capability improvement, though no quantitative metrics are publicly available and the causal contribution of model-generated work to capability gains has not been isolated from concurrent human-directed improvements. OpenAI has set internal targets: an "automated AI research intern by September 2026" and "true automated AI researcher by March 2028."

The trajectory fits a logistic curve in its steepening phase. Inflection points may arrive when day-long task horizons materialize, and again if automated AI researchers become operational. The uncertainty remains wide—but ρ is rising, and the rate of rise is itself increasing.

If you believe these numbers are wrong, specify which ones and why. That is more useful than dismissing the framework.

A strong confirmation would look like: ρ rising, time-to-verified-improvement falling, output shifting from headcount scaling to compute scaling, binding constraints moving to power, chips, and interconnect, and growth rates in these micro-metrics increasing rather than merely staying high. That is the pattern that would distinguish a regime shift from an unusually productive phase of normal science. The existence proof follows.