Chapter 13

Meeting the Perez Criteria

13 min read

Blue comes from the indigo plant, yet it is bluer than indigo. Ice is made of water, yet it is colder than water.
— Xunzi, 'Encouraging Learning' (3rd century BCE)

A definition is not a demonstration. If Factor Prime is to be taken seriously as the key input of a new techno-economic paradigm, it must satisfy the criteria that Carlota Perez established for recognizing such transitions.

Perez's claim is not that the input is "important." It is that the input becomes the cheap, pervasive input that reorganizes production and institutions.(Perez 2002) She identifies four characteristics that distinguish a genuine key input from an ordinary technological improvement. First, the input must exhibit a steep and sustained decline in relative cost. Second, it must have nearly unlimited supply at the new, lower price point. Third, it must be applicable across many sectors of the economy, not confined to a single sector. Fourth, its adoption must reshape organizational forms, business models, and institutional arrangements. A technology that meets only one or two of these criteria is an incremental improvement; a technology that meets all four is a candidate for paradigm status.

The Perez framework is not the only lens available. Schumpeter would direct attention to the clustering of innovations: whether the current wave of AI capabilities constitutes a genuine discontinuity or the tail end of the semiconductor paradigm that Perez herself mapped. Minsky would focus on the financial fragility accumulating around AI investment: the gap between announced capacity and operational returns, the leverage building in hyperscale commitments, the familiar pattern of credit-fueled installation preceding a crash that separates viable deployment from speculative excess.(Minsky 1986) Both readings are plausible. Both would identify different risks than the four criteria isolate. The Perez lens is chosen here because it asks the right question for this volume's purpose: not whether the transition will be smooth or whether the current investment cycle will crash, but whether the underlying input has the structural characteristics to reorganize production across the economy. Perez's criteria test input identity. Minsky's framework tests financial stability. They are not competing answers to the same question; they are answers to different questions, and both matter.

Cost Trajectory

The relevant question is not whether compute has become cheaper (it has, across multiple imperfect measures), but whether the cost has declined at three distinct layers: substrate, capability, and deployment.

Substrate cost is the price of raw computation. Hardware cost per floating-point operation has declined by roughly an order of magnitude every four to five years since the 1970s, a trend documented by Nordhaus and updated by subsequent researchers tracking semiconductor price-performance.(Nordhaus 2007) This measures the physical layer, not the cognitive output.

Capability cost is the compute required to reach a given performance threshold. Epoch AI research indicates that the compute required to achieve a fixed benchmark performance halves roughly every eight months, reflecting both algorithmic improvements and hardware gains.(Sevilla 2024)(Villalobos 2022) This measures training efficiency, not deployment value.

Deployment cost is the sharpest test: inference cost per verified task — the canonical unit for Factor Prime. ("Verified" means accepted by a downstream gate, whether human or automated.) Here the data are more fragmented, but the direction is unambiguous and steeper than early projections suggested. GPT-4 launched in March 2023 at $30/$60 per million tokens (input/output). GPT-4o arrived in May 2024 at $2.50/$10. GPT-4o Mini followed in July 2024 at $0.15/$0.60. Input cost declined 99.5% in sixteen months. DeepSeek R1 entered at $0.55/$2.19 per million tokens — roughly 90% below incumbent frontier pricing — and open-weight alternatives continued to compress the floor. The initial "order of magnitude" has become two orders of magnitude in under two years. For tasks where model outputs directly substitute for human labor, the cost per successful completion has fallen correspondingly. Enterprise deployments report cost reductions of 40–70 percent on ticket resolution and first-draft generation in early pilots, with the caveat that these figures depend heavily on task definition, quality thresholds, and human-review requirements.

In mature deployments, the auditable unit will be cost per verified outcome (ticket resolved to SLA, claim processed with acceptable error rate, code merged passing tests), not tokens. The token price is a necessary input. The verified outcome is the economically relevant output.

The cost criterion fails if inference costs per verified task plateau or rise over the next five years, whether due to capability saturation, regulatory burden, or supply constraints. The relevant measure is not compute price but task-relevant output price.

Perceived Unbounded Supply

The second criterion is more subtle and currently the most uncertain. A key input must appear effectively unlimited at its new price point: not literally infinite, but abundant enough that users do not treat it as a binding constraint during the installation phase.

For Perez, the relevant condition is perceived abundance: incremental demand can be met without chronic rationing and without rising marginal cost. Computation currently occupies an ambiguous position. Inference capacity for deployed models is expanding rapidly: cloud providers offer effectively unlimited API access on demand, and a developer can provision thousands of GPU-hours with a credit card. The constraint is budget, not availability. For routine inference tasks, supply appears unbounded.

Frontier training is a different matter. Physical constraints (interconnection queues, transformer lead times, advanced packaging capacity, water permits) create bottlenecks that money alone cannot clear in the near term. The largest training runs require access to clusters that only a handful of organizations possess. The infrastructure to support such runs takes years to build.

This pattern is consistent with Perez's framework: in the early phases of a paradigm, the key input is scarce precisely because the infrastructure to produce it at scale has not yet been built. Coal was scarce before the mines were dug. Oil was scarce before the refineries were constructed. Semiconductors were scarce before the fabs were erected. The perception of unlimited supply emerges as the infrastructure matures and the installation phase gives way to deployment.

But the current scarcity may not be merely transitional. Unlike coal or oil, compute depends on multiple bottlenecks simultaneously: chips and power and cooling and interconnect and grid access and permits. Carbon and water constraints are not early-phase frictions. They may be enduring. The question is whether infrastructure buildout will outpace demand growth, and the answer is not yet clear.

The supply criterion fails under two scenarios. On the market side: inference remains chronically rationed. Pricing does not trend toward commodity behavior outside frontier niches. Scarcity premia dominate marginal cost rather than capital amortization. On the physical side: grid interconnection queues stay multi-year. Power equipment lead times remain longer than a model-generation cycle. Generation capacity share for data centers does not rise. This is the criterion most likely to bind, and resolution remains uncertain.

Broad Applicability

The evidence here is stronger, though the pattern of adoption is uneven. Because computation operates on patterns rather than matter, it can be directed toward any problem that admits a formal representation. The practical question is whether the cost is low enough to make the application economical, and whether the selection filter is tight enough to ensure that the outputs are useful.

Current AI systems have demonstrated meaningful competence in language-mediated tasks: customer support, code generation, document summarization, translation, search, and retrieval-augmented analysis. They have shown rapidly improving performance in structured domains with clear feedback: code review, test generation, data extraction, and classification tasks with well-defined categories. There is early but uneven traction in scientific workflows (literature synthesis, hypothesis generation, experiment design), with reliability that varies substantially by domain.

Some domains remain resistant: situations requiring physical manipulation, contexts where errors are catastrophic and cannot be caught by downstream review, problems with sparse data or ill-defined objectives, tasks requiring sustained multi-step reasoning over novel domains. The uneven pattern is typical of paradigm transitions. Steam power was not equally applicable to all industries. Electricity took decades to reshape manufacturing. The internet transformed some sectors immediately and others only after years of experimentation.

Domain	Current ROI Status	Autonomy Level	Binding Constraint
Customer support (text)	Positive, deployed at scale	Human-in-the-loop for escalation	Quality variance, edge cases
Code generation	Positive for first drafts	Human review required	Correctness verification
Document summarization	Positive for routine tasks	Human spot-check	Accuracy on specialized content
Scientific research	Early traction, high variance	Human-directed	Domain-specific reliability
Physical manipulation	Negative except narrow cases	Requires full autonomy	Embodiment, real-world feedback
High-stakes decision-making	Negative without oversight	Cannot be autonomous	Liability, error cost

The table distinguishes domains where human-in-the-loop deployment achieves positive ROI from domains that would require full autonomy to be economical. Most current successes are in the former category. The latter remains largely unrealized.

In Perez terms, this is what broad applicability looks like in the installation phase: early ROI appears first where feedback is fast, error is cheap, and verification is routine. The criterion holds so long as the domain list keeps expanding. It weakens if adoption remains confined to the current set of tasks five years from now, with no new categories becoming economical.

Reshaping Organizational Forms

The fourth criterion is the most difficult to assess because organizational change lags technological change. The factory system did not spring fully formed from the first steam engines. The multidivisional corporation did not emerge immediately from the telegraph and railroad. The organizational innovations that will define the Factor Prime era may not yet be visible.

The signal to watch is whether cognition becomes metered like electricity: a utility input budgeted in throughput rather than headcount.

What can be observed is the beginning of experimentation. Three candidate organizational forms are emerging, though none has yet proven dominant:

AI-native micro-enterprises. Small teams achieving output levels previously requiring organizations an order of magnitude larger, by substituting inference for mid-level coordination and production tasks. Midjourney generated approximately $200 million in revenue in 2023 with roughly 40–50 employees — revenue per employee of $4–5 million, among the highest in SaaS globally. Zero external funding. Zero marketing spend. Profitable since August 2022. Cursor (Anysphere) crossed $100 million in annualized recurring revenue in January 2025 and exceeded $1 billion in annualized revenue by November 2025, with revenue doubling approximately every two months and roughly 300 employees by year-end. The distinctive feature is that the firm's marginal cost of cognitive work is determined by API pricing, not by headcount. These are not projections. They are operating results from firms whose economics are structurally different from their predecessors.

Agentic operations stacks. Existing firms embedding AI as an internal control layer across CRM, ERP, support, and workflow systems. Klarna's AI customer-service agent handled 2.3 million conversations in its first month of operation (February 2024) — two-thirds of all customer service interactions, performing work equivalent to 700 full-time agents. Resolution time fell from eleven minutes to under two. The estimated profit improvement: $40 million annually. Then the correction: by May 2025, CEO Sebastian Siemiatkowski acknowledged that the AI-only approach produced "lower quality" outcomes, and Klarna began rehiring human agents. The case is analytically valuable precisely because it illustrates the V/C boundary at work: where verification is cheap (routine billing inquiries, order status, standard refunds), automation succeeds; where it is not (complex disputes, contextual judgment, empathetic resolution), humans return. The distinctive feature of the agentic operations stack is that coordination overhead is reduced by inference — the AI handles routing, summarization, and first-draft generation that previously required human attention — but the human remains where verification cost is high.

Vertical compute-workflow integrators. Firms that own both infrastructure access (compute capacity, model weights, data pipelines) and the domain-specific workflow in which AI is deployed. The distinctive feature is control over the selection filter: owning both the model and the deployment context allows tighter alignment between training objectives and deployment value than firms relying on third-party APIs can achieve.

These forms are speculative. The deeper organizational changes are likely still ahead. If the boundary between human and machine cognition continues to shift at the current pace, the organizational forms that exploit that shift have not yet been invented. It would be surprising if they had. The factory system took decades to emerge from the steam engine. Expecting a stable organizational paradigm three years into the installation phase would repeat the mistake of every previous transition's early observers. The markers worth watching are concrete: whether inference capacity appears as a budget line item rather than an IT expense, whether compensation structures shift from coordinators to workflow engineers, and whether compute procurement begins to resemble electricity contracts. If, five years from now, none of these changes are visible and dominant firm structures remain indistinguishable from 2020, the organizational criterion fails, and with it, the paradigm claim.

On Perez's scoreboard, two criteria are clearly met and two remain open. Cost trajectory is the strongest: the decline at all three layers is steep, sustained, and shows no sign of plateauing. Broad applicability is strong in installation-phase terms, with the expected uneven adoption pattern. Supply is the genuine uncertainty: unlike coal or oil, compute depends on multiple bottlenecks simultaneously, and whether infrastructure buildout outpaces demand growth is the question on which the paradigm's pace depends. Organizational reshaping lags by design: it is the confirming indicator, not the leading one.

Criterion Scoreboard

Criterion	Current Evidence	Falsifier
Cost trajectory	Strong declines at substrate and capability layers; early evidence of deployment-level declines in verified task cost	Cost per verified task plateaus or rises
Unbounded supply	Inference often feels abundant at the point of use; frontier training remains supply-constrained	Inference remains rationed; grid queues don't compress; scarcity rents dominate
Broad applicability	ROI-positive in text support, code, summarization (human-in-loop); uneven elsewhere	Domain list does not expand; adoption confined to current tasks
Organizational reshaping	Early experimentation; candidate forms emerging; no dominant new archetype	Firm structures indistinguishable from 2020 in five years

Perez's criteria do not require certainty. They require evidence of structural potential and explicit tests that would falsify the claim. By that standard, Factor Prime qualifies, not inevitably, not triumphally, but as a candidate whose cost trajectory is already decisive and whose supply constraint is the honest uncertainty that a Minsky-informed reader should watch most carefully. If the installation phase ends in a crash, Perez's own framework predicts that the crash separates speculative excess from viable deployment. It does not reverse the paradigm. What would reverse it is a supply bottleneck that proves structural rather than transitional: grid constraints that persist, fabrication capacity that does not scale, physical limits that bind before the deployment phase begins. That is the falsifiable claim on which this volume stakes its case.