Who Closes the Loop
Loss generally occurs when a player overrates his advantage.
Factor Prime exhibits a recursive property: computation that improves the production of computation. But recursion at any rate is not the same as recursion at an accelerating rate. The question that matters most for the trajectory of the transition is how fast the loop can tighten, and what determines its limits.
The recursive property—the capacity of computation to reduce the cost of producing additional computation—is not categorically new. Previous key inputs also participated in feedback loops. Steam power improved coal extraction via pumps, rail, and mechanized hauling. Electricity and control systems optimized grids. Oil improved oil production via petrochemicals and better drilling rigs. The loops existed. Humans closed them.
What differs is who closes the loop.
The Pacing Function
Recursion matters only insofar as it changes the pacing function. In prior paradigms, the binding constraint was human: observation, hypothesis, design, prototyping, coordination. Factor Prime shifts a growing portion of those steps into computation. If the share becomes large enough, progress becomes compute-paced rather than human-paced. That is the regime change this section tests.
Call recursion strong when a rising share of the innovation pipeline is executed by computation, so that capability gain per calendar month rises with deployed inference throughput. Throughout, "capability gain" means improvement on a fixed, deployment-relevant evaluation basket at fixed inference constraints—latency, cost, reliability—so that the metric cannot be gamed by shifting goalposts. A useful proxy is the recursive share (ρ): the fraction of AI R&D work—code generation, evaluation, data synthesis, architecture search, debugging—performed by models and tools rather than humans. When ρ is low, progress is human-paced. When ρ is high, progress becomes compute-paced. The empirical question is where we are on that spectrum and how fast we are moving.
ρ is not directly observable. Artifacts are proxies for work: we infer "share of work" from verified outputs weighted by the human effort they displace. ρ can be approximated by the share of model-generated artifacts that survive downstream verification without human rewrite: accepted commits, retained evaluation suites, shipped workflow policies, experiment plans executed end-to-end, and PPA improvements validated in tape-out outcomes (power/performance/area). ρ is a weighted share: an accepted commit counts more than a lint fix; a retained eval suite counts more than a prompt rewrite. The weight is verification cost avoided or time-to-ship avoided. For ρ to indicate recursion rather than fashion, it must predict out-of-sample iteration speed and validated improvement rates after controlling for team size and spend.
Ideas as Inputs to Idea Production
Paul Romer formalized the insight that knowledge differs from ordinary goods.(Romer 1990)Paul M. Romer, "Endogenous Technological Change," Journal of Political Economy 98, no. 5 (1990): S71–S102.View in bibliography Physical goods are rivalrous: if one firm uses a machine, another cannot use it at the same time. Knowledge is non-rivalrous: if one firm uses a formula, another can use it too, without diminishing the first firm's use. This non-rivalry means returns to knowledge can be increasing rather than diminishing.
Romer's framework explains why economies that invest in research can grow faster than economies that do not. But it treats the production of knowledge as separate from the production of goods: an R&D sector produces ideas, a goods sector uses them. The two are linked, but they remain distinct, and human researchers occupy both.
Factor Prime compresses this separation. The tools that produce knowledge are themselves computational and increasingly automated. A neural network trained to generate code can write code that trains neural networks. A chip design tool can design the next generation of chip design tools. The output of the process is an input to the process.
The key question is not whether ideas are non-rival, but whether the production function for ideas becomes compute-intensive rather than labor-intensive. Rising ρ is the mechanism. Scale increases the size of training runs; recursion changes the rate at which the process discovers better ways to train, evaluate, and deploy.
Growth Regimes
Charles Jones introduced a framework for distinguishing growth regimes.(Jones 2005)Charles I. Jones, "Growth and Ideas," in Handbook of Economic Growth, ed. Philippe Aghion and Steven N. Durlauf (Elsevier, 2005), 1063–1111.View in bibliography In semi-endogenous models, the rate of idea production depends on the number of researchers and the existing stock of knowledge, but each additional idea is harder to find than the last. Growth converges to a steady rate determined by population growth or other exogenous factors. The feedback loop exists, but it is damped; the system is stable. In fully endogenous models, ideas beget ideas without diminishing returns; the production of knowledge can accelerate until some other constraint binds. The feedback loop is undamped; the system can exhibit superlinear dynamics over a range.
The empirical question is which regime we are in. If Factor Prime production is semi-endogenous, then the current acceleration will eventually level off. The recursion provides a tailwind, but human researchers remain the pacing constraint, and human population growth is slow. If Factor Prime production is closer to fully endogenous over the relevant range, then the acceleration can continue until physical limits—energy, materials, heat dissipation, coordination costs—become binding.
Why History May Stop Being a Guide
William Nordhaus addressed this question directly in a 2021 paper titled "Are We Approaching an Economic Singularity?"(Nordhaus 2021)William D. Nordhaus, "Are We Approaching an Economic Singularity? Information Technology and the Future of Economic Growth," American Economic Journal: Macroeconomics 13, no. 1 (2021): 299–332.View in bibliography He examined the conditions under which technological progress could accelerate without limit. His conclusion: the historical record shows no evidence of such acceleration. Growth rates have fluctuated but have not systematically increased over the past century. The returns to R&D investment have not risen. A regime shift would likely appear first in micro-metrics of discovery and iteration, long before it appears in aggregate GDP growth.
Nordhaus did not rule out the possibility. The key variable, in his analysis, is whether the automation of innovation itself becomes possible. If machines can contribute to invention, and if the machines that invent can be improved by the machines they help create, then the feedback loop can tighten beyond any historical precedent. The historical record would cease to be a reliable guide.
This is the question Factor Prime raises. The recursion is not new. What may be new is the degree to which computation, rather than human cognition, closes the loop.
Previous paradigms preserved a human pacing function. This one may not. The rate of improvement could depend on machine effort, and machine effort can scale in ways that human effort cannot.
If the pacing function shifts, governance lag becomes a first-order variable: institutions trained on human-cycle times will tend to underreact.
Most technological revolutions preserve a human pacing function. The rate of improvement in steam engines depended on how fast human engineers could design, test, and build better engines. The rate of improvement in automobiles depended on how fast human designers and factory workers could iterate on the product. Humans set the tempo, and the tempo was limited by human cognition and coordination.
Factor Prime tightens the computational portions of these loops. Consider a concrete closed loop: a model proposes an architecture change, data augmentation, or evaluation refinement; automated training and evaluation runs execute the proposal; the model ingests the results and updates its proposal distribution toward changes that improve the fixed basket under the constraint envelope. The loop executes end-to-end without a human choosing each experiment. Cycle times compress from months to days or hours where computation substitutes for cognition. Parallelism increases: thousands of experiments can run simultaneously. The feedback bandwidth can rise dramatically, even if key decision points still require human oversight and even if physical portions—fab construction, power plant permitting, grid interconnection—remain human-paced.
Cognitive loops may become machine-paced while physical loops remain human-paced. The observed regime will be a hybrid until infrastructure catches up.
Separating Three Productivity Claims
Three distinct claims about productivity must be separated to avoid confusion:
Compute efficiency: capability per training compute dollar. This measures hardware and algorithmic progress. It has improved rapidly, but it does not directly measure whether recursion is tightening.
Research productivity: capability gain per researcher-hour or per dollar of R&D spending. This measures whether the human bottleneck is loosening. The historical pattern shows declining research productivity across most fields. If AI reverses this trend in AI research itself, that would be evidence for regime shift.
Growth rate dynamics: whether the rate of improvement is itself increasing, stable, or decreasing. This is Nordhaus's ultimate test. Semi-endogenous dynamics produce high but stable growth rates; fully endogenous dynamics produce accelerating growth rates.
The recursion hypothesis predicts a specific pattern: output should shift from being headcount-limited to being compute-limited. If the hypothesis is correct, research productivity per researcher-hour should rise as ρ increases, because computation is substituting for human cognitive bottlenecks.
Empirical Tests
This framing yields specific tests. The claim that Factor Prime is shifting the growth regime implies observable consequences, sequenced from mechanism to macro outcome:
Recursive share (ρ). If the recursion is tightening, ρ should rise over time in frontier labs and should correlate with iteration speed. Organizations with higher ρ should produce more capability gain per calendar month. The claim fails if ρ stays flat, and more compute merely enables bigger training runs rather than faster discovery—in that case, the recursion is not automating the relevant bottleneck.
Decision latency. If computation is substituting for human cognition in the improvement loop, the time from anomaly detection to hypothesis to patch to verified improvement should compress. This is the cycle time of research decisions, not just model generations. If the median time-to-verified-fix does not fall, the human-paced portions of the loop remain dominant.
Compute-to-discovery elasticity. If the recursion is computational rather than human-limited, validated improvement events per unit time (a statistically significant uplift on the fixed basket at constant constraints, surviving an OOD slice) should scale with compute provisioned rather than with senior researcher count. If validated improvements remain proportional to headcount regardless of compute investment, the recursion has not shifted the pacing constraint.
Physical constraint binding. If the recursion is strong, physical constraints—energy, chips, cooling, interconnect—should bind before cognitive constraints. Investment should flow toward infrastructure rather than toward hiring researchers. If talent remains the binding constraint and infrastructure is abundant, the recursion has not shifted the bottleneck.
Growth rate acceleration. Nordhaus's ultimate test. If the recursion is producing fully endogenous dynamics, the growth rate of relevant metrics—capability, efficiency, deployment—should itself be increasing, not merely high. If growth rates are high but stable, the dynamics remain semi-endogenous.
Current Evidence
The evidence points in the predicted direction, but the measurement window is short and confidence is bounded.
Compute efficiency has improved rapidly. Organizations are exploring far more of the design space per unit time. Whether this has already reversed researcher-hour productivity is not yet settled. Cycle times between major model generations have compressed. Parallelism has scaled dramatically; frontier labs run thousands of experiments simultaneously. Physical constraints are increasingly cited as binding; infrastructure investment is outpacing researcher hiring at the margin.
But the growth rate of capability has not obviously accelerated. Scaling laws suggest predictable improvement with compute, not accelerating improvement. The laws could steepen or saturate at any point; they are empirical regularities, not physical necessities. The recursion creates the possibility of acceleration. The evidence does not yet confirm it.
The uncertainty is genuine, and the framework does not claim to resolve it. What can be said is this: Factor Prime introduces a structural possibility that previous paradigms did not. If computation can contribute to its own improvement, then the pacing function that has governed all previous technological transitions may shift. The rate of progress would depend not only on human effort but on machine effort, and machine effort can scale in ways that human effort cannot.
Whether it will scale, and how far, remains an open question. Energy must be generated, transmitted, and dissipated. Chips must be fabricated, packaged, and connected. Data centers must be sited, powered, and cooled. Coordination costs rise with scale. Regulatory and social constraints may bind before physical ones do. The recursive property creates the possibility of acceleration; physics, economics, and institutions will determine the outcome.
Calibrating ρ
The recursive share can now be estimated from primary sources: earnings calls, controlled studies, developer surveys, and frontier lab disclosures. The evidence suggests ρ has grown roughly 5–7× from 2022 to 2025, triangulated across three independent measurement channels: code generation acceptance rates from telemetry(GitHub 2025)GitHub, "Copilot Metrics and Research" (2025).View in bibliography, developer adoption surveys(Overflow 2025)Stack Overflow, "Stack Overflow Annual Developer Survey 2025" (2025).View in bibliography(JetBrains 2024)JetBrains, "The State of Developer Ecosystem 2024" (2024).View in bibliography, and autonomous task-completion benchmarks(METR 2025)METR, "Measuring AI Agent Task Horizons" (2025).View in bibliography. The channels track different facets of ρ, so consistency across them is more informative than any single signal in isolation.
Capability benchmarks set the ceiling on what ρ can become: tasks that models cannot complete cannot contribute to the recursive share. On SWE-bench, which measures end-to-end resolution of real GitHub issues, success rates rose from under 2% for unscaffolded GPT-4 (2023) to approximately 49% for Claude 3.5 Sonnet with agent scaffolding (late 2024) to above 60% for frontier agentic systems in early 2025 (Narasimhan 2024)Carlos E. Jimenez and John Yang and Alexander Wettig and Shunyu Yao and Kexin Pei and Ofir Press and Karthik Narasimhan, "SWE," in Proceedings of the Twelfth International Conference on Learning Representations (ICLR).View in bibliography. On HumanEval, a function-level code generation benchmark, pass@1 rates climbed from roughly 48% (GPT-3.5, 2023) to 67% (GPT-4, 2023) to above 90% for late-2024 frontier models (Zaremba 2021)Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba, "Evaluating Large Language Models Trained on Code," arXiv preprint arXiv:2107.03374 (2021).View in bibliography. The trajectory is steep, and some previously demanding benchmarks now appear to be nearing saturation for top systems.
Code generation dominates the measurable signal. GitHub Copilot generates 46% of code in enabled files (61% for Java), with an 88% retention rate for accepted suggestions.(GitHub 2025)GitHub, "Copilot Metrics and Research" (2025).View in bibliography Frontier labs report dramatically higher internal figures. Anthropic's CEO stated at Dreamforce in September 2025 that "90% of code is written by AI models" at Anthropic and partner companies, though engineers remain essential for supervision. Google confirmed over 25% of new code is AI-generated (Pichai, Q3 2024 earnings). OpenAI reports 95% weekly usage among engineers with 70% more pull requests merged.
METR's research suggests task horizons are roughly doubling every seven months. Their data(METR 2025)METR, "Measuring AI Agent Task Horizons" (2025).View in bibliography shows the time horizon at which models complete 50% of tasks has grown from approximately 30 seconds (GPT-2, 2019) to roughly one hour (Claude 3.7 Sonnet, early 2025). The doubling rate has accelerated to approximately four months in 2024–2025. Extrapolation suggests day-long task horizons by 2027 and week-long horizons by 2028, though extrapolation from steep curves is hazardous.
Developer adoption has reached majority penetration. Stack Overflow surveys(Overflow 2025)Stack Overflow, "Stack Overflow Annual Developer Survey 2025" (2025).View in bibliography show developers using or planning to use AI tools rose from 70% (2023) to 84% (2025). Daily usage among professional developers hit 51% in 2025. Enterprise adoption trails but accelerates: Gartner projects 90% of enterprise engineers will use AI code assistants by 2028, up from under 14% in early 2024.
Observed ρ trajectory:
| Year | ρ (Lower) | ρ (Central) | ρ (Upper) | Key Evidence |
|---|---|---|---|---|
| 2022 | 2% | 3–5% | 8% | Copilot launch, 26% acceptance, limited enterprise |
| 2023 | 5% | 8–10% | 15% | 44% developer adoption, early enterprise pilots |
| 2024 | 8% | 12–15% | 22% | 62% adoption, 25%+ Google code AI-generated |
| 2025 | 12% | 18–22% | 35% | 80% adoption, 90% at frontier labs, 51% daily use |
Projected ρ trajectory (if trends continue):
| Year | ρ (Lower) | ρ (Central) | ρ (Upper) | Key Evidence |
|---|---|---|---|---|
| 2026 | 20% | 28–35% | 45% | Day-long task horizons emerging |
| 2027 | 30% | 40–50% | 60% | Week-long task horizons |
Central estimates weight code generation share (30–46% × 88% retention × 60% of R&D time per Epoch AI, 2024) plus contributions from architecture search, debugging, and evaluation. Lower bounds assume significant verification overhead; upper bounds include AI assistance for search, documentation, and ideation.
Verification overhead persists. The 88% retention rate implies 12% rejection. Experienced developers work 19% slower with current AI tools on their own repositories(METR 2025)METR, "Measuring AI Agent Task Horizons" (2025).View in bibliography, cautioning against conflating benchmark capability with effective automation. The slowdown is not anomalous: experienced developers have deep mental models of their codebases that AI tools cannot access, so the tools add friction rather than removing it in familiar territory. Trust in AI accuracy has declined from 43% (2024) to 29% (2025) even as adoption rises—developers use the tools despite recognizing their limitations. The top frustration: "AI solutions almost right but not quite" (66% of respondents).
Recursive self-improvement is observable but unquantified. Anthropic has stated that Claude is used "to help build products on top of Claude and to help train the next Claude." This represents evidence of ρ feeding into capability improvement, though no quantitative metrics are publicly available and the causal contribution of model-generated work to capability gains has not been isolated from concurrent human-directed improvements. OpenAI has set internal targets: an "automated AI research intern by September 2026" and "true automated AI researcher by March 2028."
The trajectory fits a logistic curve in its steepening phase. Inflection points may arrive when day-long task horizons materialize, and again if automated AI researchers become operational. The uncertainty remains wide—but ρ is rising, and the rate of rise is itself increasing.
If you believe these numbers are wrong, specify which ones and why. That is more useful than dismissing the framework.
A strong confirmation would look like: ρ rising, time-to-verified-improvement falling, output shifting from headcount scaling to compute scaling, binding constraints moving to power, chips, and interconnect, and growth rates in these micro-metrics increasing rather than merely staying high. That is the pattern that would distinguish a regime shift from an unusually productive phase of normal science. The existence proof follows.