Appendix C

Load-Bearing Empirical Claims

28 min read

The framework rests on empirical claims about physics, infrastructure, markets, and institutions. These claims are load-bearing: if any proves materially wrong, the framework's predictions require revision. This appendix enumerates the claims, their sources, their magnitudes, and the conditions under which they would be falsified.

The purpose is intellectual honesty. A framework that cannot specify what would disprove it is not a framework; it is an ideology. The claims below are ordered by their proximity to the argument's core.

C.1 — Thermodynamic Foundations

These claims establish the physical basis for treating computation as energy conversion.

Claim 1.1: Landauer's limit sets a floor on irreversible computation.

Statement: Erasing one bit of information at temperature $T$ requires dissipating at least $k_B T \ln 2$ joules of energy, where $k_B$ is Boltzmann's constant. At room temperature (300K), this equals approximately $3 \times 10^{-21}$ joules per bit.

Source: Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process." IBM Journal of Research and Development, 5(3), 183–191.

Status: This is a result in statistical mechanics, not an empirical claim subject to revision. It serves as the theoretical anchor for efficiency calculations.

Relevance to framework: Establishes that computation has an irreducible thermodynamic cost. The framework's energy-computation equivalence depends on this floor existing.

Claim 1.2: Current silicon operates approximately 10⁶ above the Landauer floor.

Statement: State-of-the-art processors dissipate roughly $10^{-15}$ joules per logic operation, approximately one million times the theoretical minimum.

Source: Koomey, J. et al. (2011). "Implications of Historical Trends in the Electrical Efficiency of Computing." IEEE Annals of the History of Computing, 33(3), 46–54. Updated estimates from semiconductor industry roadmaps (IRDS 2022). Note: The primary source is 14 years old. The IRDS update is 3 years old. No comprehensive newer survey exists as of this writing, though individual benchmarks (MLPerf, academic papers on specific architectures) provide partial updates.

Magnitude that matters: The gap represents engineering headroom. If the gap were smaller (say, 10² rather than 10⁶), efficiency gains would be nearly exhausted and the "room to run" thesis would be wrong.

Threshold rationale: The 10³ falsification threshold (rather than 10² or 10⁴) reflects approximately 10 doublings of efficiency, roughly 15–20 years at historical Koomey pace. A gap this small would indicate that fundamental limits are approaching within a single investment horizon.

Update frequency: Annual. Track IRDS publications, MLPerf benchmarks, and academic efficiency studies.

Falsifier: If credible measurement shows current silicon operates within 10³ of Landauer, the efficiency trajectory projections require revision.

Claim 1.3: Koomey's Law has historically held but is decelerating.

Statement: The number of computations per kilowatt-hour has roughly doubled every 1.6 years since the 1940s. This pace has slowed since approximately 2010 as Moore's Law decelerated.

Source: Koomey, J. (2011), updated in subsequent publications through 2020.

Magnitude that matters: If the historical pace continued, the 10⁶ gap would close in roughly 30 years (20 doublings at 1.6 years each). Deceleration extends this timeline but does not eliminate the trajectory.

Falsifier: If efficiency gains stall entirely (less than one doubling per decade for a sustained period), the "headroom translates to continued improvement" assumption fails.

C.2 — Infrastructure Constraints

These claims establish that physical bottlenecks, not software capability, currently bind the pace of AI deployment.

Claim 2.1: Interconnection queues exceed available grid capacity.

Statement: The aggregate capacity in U.S. interconnection queues approaches 2,600 GW nationally, with ERCOT alone exceeding 150 GW of requests. Processing timelines for utility-scale projects now average nearly five years from request to commercial operation.(Kahrl 2024)

Source: Rand, Joseph, et al. "Queued Up: 2024 Edition, Characteristics of Power Plants Seeking Transmission Interconnection As of the End of 2023." Lawrence Berkeley National Laboratory, April 2024. ERCOT and PJM public queue data.

Magnitude that matters: If queues cleared in under 18 months and total queue depth were below installed capacity, the bottleneck would not bind. Current depths at multiples of installed capacity indicate binding constraints.

Update frequency: Quarterly. Queue data is publicly available from ISOs and RTOs.

Falsifier: If interconnection timelines compress to under two years for utility-scale projects and queue-to-installed ratios fall below 1:1, the physical bottleneck loosens faster than the framework assumes.

Claim 2.2: Transformer lead times have extended to 2-3 years for large power transformers.

Statement: Lead times for large power transformers (230 kV+) now commonly reach 36 months, with maximum lead times reaching 60 months, up from 12-18 months pre-2020. The constraint reflects manufacturing capacity, not raw material availability.(Energy 2024)

Source: U.S. Department of Energy. "Large Power Transformer Resilience: Report to Congress." July 2024. Utility integrated resource plan filings from PJM, MISO, and CAISO (2024-2025).

Magnitude that matters: Transformers are long-pole items in grid expansion. If lead times returned to 12 months, datacenter capacity additions could accelerate significantly.

Falsifier: If new manufacturing capacity brings lead times below 18 months for standard orders, this specific bottleneck eases.

Claim 2.3: Hyperscale datacenter capacity additions exceed 15 GW through 2027.

Statement: Announced hyperscale capacity additions (AWS, Azure, GCP, Meta, Oracle) exceed 15 GW through 2027, concentrated in Northern Virginia, Texas, and the Pacific Northwest. "Announced" here means publicly disclosed in earnings calls, press releases, or regulatory filings. "Committed load" (a subset) refers to capacity with signed power purchase agreements or utility interconnection agreements.

Source: Company announcements, utility integrated resource plans, datacenter industry publications (DatacenterDynamics, Uptime Institute).

Magnitude that matters: 15 GW represents roughly 10% of current U.S. generating capacity concentrated in specific regions. Absorption at this scale strains transmission and generation infrastructure.

Update frequency: Semi-annual. Track utility filings and company earnings disclosures.

Threshold rationale: The 30% cancellation threshold reflects historical base rates for large infrastructure project completion. Energy infrastructure projects (pipelines, power plants, transmission lines) historically complete 75-85% of announced capacity within initial timelines. The 30% failure rate represents approximately 2× the historical norm, indicating demand softening rather than normal project friction. The 24-month window reflects the typical interval between announcement and committed load status for utility-scale projects.

Falsifier: If announced capacity is cancelled or delayed at scale (more than 30% of announcements fail to reach committed status within 24 months of announcement), demand projections require revision.

C.3 — Bitcoin and Mining Economics

These claims establish the Bitcoin hurdle rate mechanism and its geographic resilience.

Claim 3.1: Bitcoin mining profitability sets a floor on electricity returns.

Statement: At any given moment, electricity can be converted to Bitcoin through mining at a rate determined by hash rate, difficulty, block reward, and electricity price. This creates an opportunity cost that disciplines all other uses of compute attached to that electricity.

Derivation: This claim summarizes the formal derivation in Appendix A.1, which establishes the hurdle rate from first principles. The claim is structural given the protocol's properties, not an empirical conjecture about market behavior.

Magnitude that matters: The floor binds when mining is profitable at marginal electricity costs. If mining becomes persistently unprofitable (hash rate declining for extended periods), the floor mechanism weakens.

Threshold rationale: The 50% decline threshold reflects the magnitude of the China ban disruption (which eliminated roughly half the network hash rate). That precedent showed that 50% decline was recoverable within 18 months. Persistent decline beyond two years at this magnitude would indicate structural rather than transient suppression.

Falsifier: If Bitcoin hash rate declines persistently (more than 50% below peak for more than two years) and does not recover, the mining floor mechanism may not discipline electricity allocation as the framework assumes.

Claim 3.2: The China mining ban resulted in hash rate migration, not elimination.

Statement: China's June 2021 mining prohibition eliminated approximately 50 exahashes per second of capacity (~50% of network total at the time). Hash rate recovered to pre-ban levels within 18 months and continued rising. Mining relocated to the United States, Kazakhstan, Russia, Canada, and other jurisdictions.

Source: Cambridge Centre for Alternative Finance, Cambridge Bitcoin Electricity Consumption Index (CBECI) Mining Map.(Finance 2024) Hash rate data from blockchain explorers; mining pool geographic disclosures.

Magnitude that matters: This precedent establishes that coordinated prohibition in a major jurisdiction achieves relocation rather than elimination. If elimination had occurred (hash rate permanently depressed), the "geographic arbitrage" argument would be weaker.

Update frequency: This is a historical claim. Ongoing relevance depends on whether subsequent prohibitions follow the same pattern.

Falsifier: If a future coordinated prohibition (involving multiple major economies) successfully suppresses hash rate to under 50% of prior peak for more than two years, the geographic resilience assumption fails.

Claim 3.3: Hash rate is geographically distributed across non-coordinating jurisdictions.

Statement: As of 2024, no single jurisdiction hosts more than 40% of global hash rate. Mining occurs in the United States, Russia, Kazakhstan, Canada, Iceland, Paraguay, and other nations with divergent regulatory regimes.

Source: Cambridge Bitcoin Electricity Consumption Index; mining pool geographic estimates.

Magnitude that matters: Distribution matters because coordination requires agreement among parties with divergent interests. If hash rate re-concentrated (more than 70% in a single jurisdiction), coordinated suppression would become more feasible.

Threshold rationale: The 70% threshold reflects the coordination requirement for effective suppression. Historical precedent (OPEC production coordination, EU regulatory harmonization) suggests that coalitions controlling less than 70% of a distributed resource cannot sustain coordinated action against defection incentives. Below this threshold, the remaining 30%+ provides sufficient arbitrage opportunity that defection is profitable.

Falsifier: If hash rate concentrates above 70% in any single jurisdiction for a sustained period (more than 12 months), the coordination-resistance argument weakens.

Claim 3.4: The cost of a sustained 51% attack exceeds $10 billion in combined capital and operational expenditure.

Statement: At current network hash rate (~1 ZH/s as of late 2025), a successful 51% attack sustained for one to two years would require $10-20 billion in combined capital and operational expenditure—hardware acquisition, electricity at $5-10 billion annually at market rates, coordination across jurisdictions—before the attack becomes visible to every intelligence agency on Earth.

Source: Nuzzi, Waters, and Andrade, "Breaking BFT: Quantifying the Cost to Attack Bitcoin and Ethereum," Coin Metrics, SSRN 4727999 (February 2024)(Andrade 2024); Professor Campbell Harvey (Duke, October 2025)(Harvey 2025) estimated a 1-week attack at approximately $6 billion total ($4.6B hardware, $1.34B infrastructure, $0.13B electricity). The calculation for sustained attacks derives from: (required additional hashrate for 51%) × (time duration) × (marginal electricity cost per exahash-hour) + (hardware capital costs).

Magnitude that matters: The attack cost establishes a "security floor"—the minimum price of seizing agent-held assets through protocol manipulation rather than legal process. The framework claims this cost is prohibitive not merely for criminals but for sovereigns. If the cost were an order of magnitude lower (~$1B), the security architecture argument weakens significantly.

Update frequency: The attack cost scales with hash rate and electricity prices. As hash rate increases, security increases. As electricity prices rise, security increases. Track CBECI estimates quarterly.

Threshold rationale: The $1B threshold represents the approximate annual budget of a well-funded state intelligence operation. Below this threshold, the attack becomes feasible for nation-state actors with strategic motivation. Above this threshold (the current ~$10-20B estimate), even state actors face prohibitive costs relative to the assets at risk.

Falsifier: If credible attack cost estimates fall below $1 billion for a sustained attack, the security architecture argument requires qualification. The claim is not that attacks are impossible, but that they are prohibitively expensive relative to alternatives.

C.4 — AI Capability and Cost Trajectories

These claims establish the pace and direction of AI development relevant to economic substitution.

Claim 4.1: Training compute has increased by orders of magnitude across frontier model generations.

Statement: GPT-3 training required approximately 3.14 × 10²³ FLOP(Dean 2021)(Villalobos 2022). Subsequent frontier generations have required dramatically more compute: Epoch AI estimates training compute for frontier models has grown at approximately 4–5× per year since 2010, with GPT-4-class models requiring on the order of 10²⁵ FLOP—roughly two orders of magnitude above GPT-3. Next-generation systems (training on clusters exceeding 100,000 H100-class GPUs) are expected to push into the 10²⁶ FLOP range. The precise figures for any given model remain subject to revision as Epoch AI's methodology and data improve. The claim is order-of-magnitude, not precise.

Source: Patterson et al. (2021)(Dean 2021) for GPT-3 energy estimates. Epoch AI, "Notable AI Models" database for training compute reconstructions across model generations.(Villalobos 2022) Sevilla and Roldán (2024)(Rold\'a 2024) for frontier model compute growth rates. Company disclosures remain sparse. Epoch AI's estimates are the most comprehensive independent reconstruction available.

Verification note: Epoch AI has revised their estimates over time as methodology improves and new information surfaces. Any specific FLOP figure cited here should be verified against the current published dataset at time of reading. The load-bearing claim is the order-of-magnitude increase across generations, not any single model's precise training compute.

Magnitude that matters: If training costs were declining (or stable), the "efficiency absorbed by scale" thesis would be wrong. The claim is that frontier capability requires increasing absolute energy expenditure despite efficiency gains.

Verification standard: This claim would be considered verified if two or more independent sources (company disclosure, Epoch AI estimate, academic reconstruction) agree on order of magnitude for a given model generation.

Update frequency: Per model generation (roughly annual for frontier labs). Epoch AI updates their dataset continuously.

Falsifier: If a frontier-capability model is trained at materially lower energy cost than its predecessor (controlling for capability level), the trajectory reverses.

Claim 4.2: Inference API pricing has improved by roughly an order of magnitude since early 2023.

Statement: The cost per million tokens for frontier model inference has declined from approximately $30-60 (GPT-4, early 2023) to approximately $3-15 (GPT-4o, Claude 3.5, late 2024), depending on model and context length.

Source: OpenAI pricing pages, Anthropic pricing pages, industry benchmarks.

Magnitude that matters: Order-of-magnitude improvements over 18-24 months indicate rapid commoditization of inference. This supports the "cognitive capability commoditizes" thesis.

Cross-reference: The efficiency headroom documented in Claim 1.2 (10⁶ above Landauer) provides physical space for continued improvement. If that headroom persists, order-of-magnitude inference price declines can continue for decades before fundamental limits bind.

Update frequency: Quarterly. Pricing changes are publicly announced.

Falsifier: If inference pricing stabilizes or increases for frontier capability (not merely for legacy models), commoditization may be slower than assumed.

Claim 4.3: Open-weight models approach frontier capability with 12-18 month lag.

Statement: Open-weight models (Llama, Mistral, Qwen) now achieve capability levels that frontier closed models demonstrated 12-18 months prior, as measured by standardized benchmarks.

Source: Benchmark results from HELM, MMLU, HumanEval; academic and industry publications.

Magnitude that matters: The lag duration determines how long frontier labs can extract premium pricing. If the lag compresses to under 6 months, margins erode faster. If the lag extends beyond 24 months, moats may prove more durable.

Update frequency: Per major open-weight release (roughly quarterly).

Falsifier: If the lag extends persistently (>24 months for two or more benchmark generations), the commoditization timeline requires revision.

C.5 — Labor Market and Automation Sequencing

These claims establish the V/C framework's empirical basis and the historical precedents for absorption.

Claim 5.1: Historical automation has been absorbed by reinstatement, Baumol sectors, and policy.

Statement: Over the past 120 years, approximately 60% of U.S. employment growth has occurred in occupations that did not exist at the start of the period. Agricultural employment declined from roughly 40% to under 2%; manufacturing from roughly 25% to under 8%. Aggregate employment has tracked population growth despite these displacements.

Source: Acemoglu and Restrepo (2019)(Restrepo 2019).

Magnitude that matters: The historical pattern establishes that absorption mechanisms have operated at the scale of prior transitions. The framework's claim is that these mechanisms may not scale to the current transition's scope.

Relevance to framework: Section V.E presents the adjustment case. This claim quantifies it.

Claim 5.2: Baumol sectors (healthcare, education, personal services) absorb an increasing share of employment.

Statement: Healthcare and education together account for approximately 20% of U.S. employment, up from approximately 10% in 1970. These sectors have grown as shares of GDP and employment despite (or because of) rising relative costs.

Source: Bureau of Labor Statistics employment statistics; Baumol and Bowen (1966)(Bowen 1966).

Magnitude that matters: The size of the Baumol reservoir determines how much labor displacement these sectors can absorb. If the sectors are saturated (growth slowing or reversing), absorption capacity diminishes.

Update frequency: Annual. BLS data is publicly available.

Falsifier: If Baumol sector employment growth slows to below overall employment growth for a sustained period, the reservoir mechanism weakens.

Claim 5.3: The V/C ratio predicts automation sequence, not raw model capability.

Statement: Tasks with high value-to-verification-cost ratios reach production deployment before tasks with lower ratios, controlling for raw model capability. Code completion and generation reached production deployment before medical diagnosis, despite models demonstrating comparable capability on standardized tests. The ordering reflects verification cost: code compiles or does not; diagnostic accuracy requires months of patient outcomes and carries liability exposure that compounds the effective verification cost.

Observable indicators: Code generation deployed at scale (GitHub Copilot, 2021; Cursor, 2023) before medical diagnosis despite comparable benchmark performance; content moderation automated before legal contract drafting; customer service chatbots deployed before surgical robotics. The pattern holds across independent organizations and jurisdictions.

Source: Deployment timelines from GitHub Copilot (2021), medical AI regulatory filings (ongoing), industry adoption studies. Section V.A develops the V/C framework. Section V.E extends it to the adjustment case.

Magnitude that matters: The V/C thesis predicts sequence, not pace. It is the framework's central empirical prediction regarding labor market effects. If deployments proceed in V/C order (high V/C domains first), the framework is supported. If deployments proceed by raw capability regardless of verification cost, the framework's labor market analysis is wrong.

Update frequency: Continuous. Track deployment announcements and regulatory approvals by domain.

Threshold rationale: "Systematic deployment" means three or more independent organizations achieving production deployment (not pilot or research) in a low-V/C domain while high-V/C domains in the same organization remain at pilot stage. "Saturation" in high-V/C domains means greater than 50% of addressable market penetration as measured by user adoption or transaction volume. The specific low-V/C domains listed (surgery, complex litigation, medical diagnosis) are chosen because their verification costs are structurally high—outcomes require months to years to observe, liability exposure is substantial, and ground truth is contested. A single counterexample could reflect idiosyncratic factors; three independent instances suggest the V/C ordering is wrong.

Falsifier: If systematic deployment occurs in low-V/C domains (surgery, complex litigation, medical diagnosis with liability exposure) before high-V/C domains (code completion, content generation, data analysis) reach saturation, the V/C sequencing thesis fails.

Claim 5.4: The reinstatement lag is compressing.

Statement: The time between task emergence and task automation is shortening. Robotic Process Automation (RPA) configuration—a role category that emerged around 2016–2018 as enterprises adopted tools like UiPath, Automation Anywhere, and Blue Prism—illustrates the compression. RPA specialists achieved significant employment by 2020, with dedicated consulting practices and certification programs at major systems integrators. By 2024–2025, LLM-based automation tools were absorbing the core RPA configuration task: translating business process descriptions into automated workflows. The lifecycle from emergence to automation pressure spans approximately five to seven years. Historical reinstatement categories (typist, telephone operator, elevator operator) persisted for decades before displacement.

Source: UiPath IPO filings (2021) for market sizing and employment growth; Gartner RPA market reports (2020–2024) for adoption trajectories; industry surveys documenting LLM-based process automation tools displacing traditional RPA configuration.

Magnitude that matters: Historical reinstatement operated over decades, providing adjustment windows for labor migration. If reinstatement windows compress to single-digit years, historical adjustment mechanisms may fail to keep pace. The RPA lifecycle is a cleaner example than prompt engineering (which arguably gained value as an embedded skill even as standalone roles declined) because the displacement mechanism is unambiguous: the same LLM capability that RPA specialists were configuring can now be directed through natural language rather than through specialized configuration interfaces.

Falsifier: If new task categories emerging post-2025 persist for more than 5 years before automation pressure appears, the compression claim is wrong and adjustment windows remain open.

Claim 5.5: Inference will dominate energy consumption as deployment scales.

Statement: Training costs are one-time per model generation. Inference costs are ongoing and scale with usage. As deployment expands, aggregate inference energy consumption will exceed aggregate training energy consumption.

Source: Patterson et al. (2021)(Dean 2021) estimated that inference already exceeded training energy for GPT-3-era systems; IEA projections for datacenter energy (2024) assume inference-dominated growth.

Magnitude that matters: If training remains the dominant energy cost (perhaps because model generations turn over faster than deployment scales), infrastructure constraints bind at the training phase rather than the deployment phase, altering where bottlenecks materialize.

Falsifier: If credible measurement shows training energy exceeds inference energy for a given model generation even after 24 months of deployment, the inference-dominance claim fails for that generation.

C.6 — Institutional and Market Behavior

These claims concern how economic actors respond to the transition.

Claim 6.1: Enterprise switching costs for AI services are lower than for traditional software.

Statement: Enterprises can switch between AI providers (model APIs, embedding services, inference endpoints) more readily than between traditional enterprise software, because the integration surface is thinner and data lock-in is weaker.

Source: Industry surveys; earnings call commentary from hyperscalers and AI labs; analyst reports.

Status: This claim is currently conjecture pending observation of enterprise switching behavior across multiple model generations. The framework's commoditization predictions are conditional on this claim proving accurate. Switching behavior at scale has not yet been observed. The first major test will occur when open-weight alternatives achieve parity with frontier closed models in enterprise-relevant benchmarks. Early enterprise evidence cuts against the claim: fine-tuning pipelines, evaluation infrastructure, safety configurations, and prompt libraries create switching costs that are more substantial than the thin API integration surface might suggest. An enterprise that has invested six months in red-teaming, fine-tuning, and evaluation for a specific model faces real cost in replicating that work for a competitor. The claim remains a hypothesis, not a finding, and the framework's margin compression predictions should be read with this caveat.

Magnitude that matters: If switching costs are low, commoditization proceeds rapidly and margins compress. If switching costs are high (due to fine-tuning, workflow integration, or data effects), moats may prove more durable. Note that even if this claim fails, the underlying energy-computation relationship remains valid. Only the pace of margin compression changes.

Update frequency: Track enterprise renewal rates and competitive displacement patterns.

Falsifier: If enterprise renewal rates for frontier AI providers exceed 95% despite competitive alternatives achieving benchmark parity, switching costs are higher than assumed.

Claim 6.2: Stablecoin issuers maintain blacklists and have frozen addresses.

Statement: Major stablecoin issuers (Circle, Tether) maintain address blacklists and have frozen funds at the request of law enforcement or under their own compliance policies.

Quantification: As of Q3 2024, Tether (USDT) has blacklisted approximately 1,800 addresses holding over $1.5 billion in frozen assets. Circle (USDC) has blacklisted approximately 200 addresses. These figures are verifiable via on-chain analysis (the blacklist function calls are recorded on-chain).

Source: On-chain data from Etherscan and other block explorers; issuer transparency reports; legal filings and press releases regarding specific freezing actions.

Magnitude that matters: The framework claims that stablecoin counterparty risk is existential for autonomous agents because freezing is possible and recovery requires human intervention. The scale of freezing (thousands of addresses, billions of dollars) establishes that this is an operational reality, not a theoretical concern.

Status: This is an established fact. The question is whether freezing extends to agent-held collateral at scale, a contingency that depends on how issuers classify and respond to non-human-controlled addresses.

Claim 6.3: Overcollateralization ratios will decline as reputation systems mature.

Statement: Initial agent-to-agent transactions will require overcollateralization ratios of approximately 150% or higher (posting $150 to secure $100 of commitment). As transaction histories accumulate and reputation systems emerge, ratios will decline toward 100% or below for established agent configurations.

Source: Section V.B's Efficiency Membrane discussion. Analogous dynamics observed in DeFi lending protocols (Aave, Compound), where collateral requirements decline for established positions and assets.

Status: This is a structural prediction, not a current observation. The mechanism (reputation substituting for collateral) is established in human commerce. Its extension to agent commerce is conjecture.

Magnitude that matters: The efficiency membrane's location depends on collateral ratios. High ratios (above 150%) restrict agents to high-frequency, low-stakes domains. Lower ratios (approaching 100%) enable expansion into larger commitments.

Threshold rationale: The $1B volume threshold represents the boundary between experimental and economically significant agent commerce—roughly the annual transaction volume of a mid-sized financial intermediary. Below this threshold, high collateral ratios could reflect early-market friction rather than structural failure. The 5-year window reflects DeFi precedent: Aave and Compound achieved significant collateral ratio compression (from 150%+ to 110-130% for established assets) within 3-4 years of launch. If agent commerce cannot match this trajectory at scale, the reputation-substitution mechanism is weaker than analogous human-intermediated systems.

Falsifier: If agent-to-agent transactions at scale (more than $1B aggregate volume) persist with ratios above 150% for more than 5 years without decline, the reputation-substitution mechanism is not operating.

Claim 6.4: Bilateral credit negotiation scales quadratically; benchmark-based coordination scales linearly.

Statement: In the absence of a common benchmark rate, N agents must negotiate bilateral credit terms with every potential counterparty, requiring O(N²) pairwise negotiations. A common benchmark collapses this to O(N): each agent quotes spreads against the benchmark rather than negotiating bespoke curves.

Source: Appendix A.2 provides the formal statement. The mathematical claim is definitional. The empirical claim is that bilateral negotiation becomes "computationally intractable" at modest scale (10,000+ agents implies 50 million pairwise negotiations).

Magnitude that matters: The claim is that the coordination benefit of a common benchmark is quadratic, not merely linear. This justifies treating term structure emergence as a gating factor for agent-mediated commerce.

Falsifier: If large-scale agent coordination emerges without a common benchmark (through bilateral networks, hierarchical aggregation, or alternative coordination mechanisms), the O(N²) framing overstates the problem.

C.7 — Contingent Claims Requiring Monitoring

These claims are forward-looking and will be resolved by events.

Claim 7.1: A Bitcoin term structure will emerge within 5-10 years if the framework's premises hold.

Observable indicators:

Appearance of standardized instruments (options, futures, duration-hedged products) with liquid markets across multiple tenors
Publication of reference rates by credible benchmark administrators
Adoption of term structure rates in agent service pricing

Falsifier: If 10 years pass without emergence of liquid term structure instruments and credible benchmark publication, this infrastructure claim fails.

Claim 7.2: Agent-mediated commerce will achieve material scale within 10 years.

Observable indicators:

Transaction volume attributable to autonomous agents (measured by on-chain activity, API logs, or market disclosures)
Emergence of agent-native services with no human counterparty requirement
Collateral locked in agent-controlled smart contracts

Threshold rationale: The 1% threshold represents the boundary between "novelty" and "material" in financial markets. Asset classes, transaction types, and market segments below 1% are typically ignored in aggregate statistics and institutional allocation. Above 1%, phenomena attract dedicated coverage, specialized infrastructure, and institutional attention. The threshold is conservative. The framework's stronger predictions imply significantly higher penetration.

Falsifier: If 10 years pass without agent-mediated commerce exceeding 1% of relevant transaction categories (payments, API services, content licensing), the "species that buys itself" thesis fails at the scale predicted.

C.7.X — Geopolitical Boundary Conditions

The framework assumes market allocation of compute resources. These claims establish the boundary conditions under which that assumption fails.

Claim 7.3: TSMC fabricates approximately 90% of leading-edge logic chips.

Statement: Taiwan Semiconductor Manufacturing Company fabricates over 90% of chips at the most advanced process nodes (currently 3nm and 5nm, previously 7nm). Samsung fabricates most of the remainder. No other foundry produces leading-edge logic at commercial scale.

Source: TSMC investor presentations and earnings calls; Semiconductor Industry Association data; IC Insights and TrendForce market share estimates.

Magnitude that matters: The concentration creates a single point of failure for frontier AI development. If TSMC's share were below 50%, geographic redundancy would provide resilience. At 90%+, a Taiwan disruption scenario directly impacts the framework's predictions.

Update frequency: Annual. Track quarterly earnings disclosures from TSMC, Samsung, and Intel Foundry Services.

Falsifier: If non-Taiwan foundry capacity at leading-edge nodes exceeds 30% of global production for more than 24 months, the geographic concentration claim requires revision.

Claim 7.4: U.S. export controls treat advanced compute as a strategic asset.

Statement: The October 2022 Bureau of Industry and Security rules restricted export of advanced semiconductors and manufacturing equipment to Chinese end-users. Subsequent rules (October 2023, January 2025) tightened definitions and closed transshipment loopholes. The policy treats leading-edge compute as a controlled technology comparable to weapons systems.

Source: Federal Register entries for BIS rules; Commerce Department public statements; Congressional testimony.

Status: This is an established fact. The question is whether controls remain, tighten, or relax over time.

Relevance to framework: Export controls demonstrate that states treat compute as strategic. If controls expand to include inference hardware, training clusters, or model weights, the market allocation assumption may fail for frontier capability.

Claim 7.5: State capture of apex compute is observable through specific signals.

Statement: If states move to capture apex compute infrastructure, the movement will manifest through observable signals before full capture occurs.

Observable signals:

Licensing requirements for training runs above specified compute thresholds (some jurisdictions have proposed this)
Mandatory government access to frontier model weights before deployment
Nationalization of fabrication or datacenter infrastructure
Compute allocation quotas or priority systems subordinating commercial use to state-designated objectives
Extension of export controls to inference hardware and model deployment

Status: Several signals are present in nascent form (training run reporting requirements, algorithm registration in some jurisdictions). Full capture has not occurred in any major economy.

Magnitude that matters: The framework treats states as friction, not principals. If states become principals—directly allocating compute rather than regulating its allocation—market mechanisms no longer govern deployment sequence or value capture.

Threshold rationale: The 10% threshold reflects the minimum share at which a jurisdiction's policy materially affects global compute markets. Below 10%, even aggressive state intervention affects only a regional market and deployers can route around it. Above 10%, the jurisdiction's decisions shape pricing, availability, and deployment norms for a substantial fraction of frontier capability. The "three signals" requirement filters noise—a single intervention may reflect idiosyncratic politics rather than systematic capture.

Falsifier: If three or more of the above signals manifest in a jurisdiction controlling more than 10% of global frontier compute capacity, the "states as friction" assumption requires revision for that jurisdiction.

Claim 7.6: The framework's predictions are jurisdiction-contingent under bifurcation.

Statement: If compute allocation bifurcates between market-governed and state-governed jurisdictions, the framework's predictions apply to the former but not the latter. The V/C ordering, energy floor mechanism, and settlement infrastructure predictions depend on price signals governing deployment decisions. Where administrative allocation replaces price signals, different mechanisms operate.

Relevance to framework: This is not a falsification condition but a scope limitation. Section VII.C (The Bifurcation Risk) develops this contingency in detail.

Update frequency: Monitor for state-action signals annually. Flag jurisdictions that transition from market governance to administrative allocation.

C.8 — Update Protocol

The claims in this appendix should be reviewed annually. For each claim:

Verify the source remains current and the magnitude remains accurate.
Check whether falsification conditions have been triggered.
Update magnitudes, sources, and dates as new data becomes available.
Add new load-bearing claims as the framework develops.
Flag claims referencing prior-year data for priority refresh (claims citing "2024" data in a 2026 review require verification or update).

Claims whose falsification conditions trigger do not automatically invalidate the framework. They require analysis of whether the failure is local (affecting one prediction) or structural (requiring revision of core mechanisms).

Versioning: Each annual review should increment a version number and append a changelog noting which claims were updated, added, or flagged for attention. The revision history enables tracking of how the empirical foundation evolves alongside the framework.

This appendix is current as of January 2026 (Version 1.2). Updates should preserve the structure and add revision notes rather than overwriting without record.

Version 1.2 changelog: Revised Claim 3.4 methodology (specified combined capital/operational costs, added Coin Metrics and Campbell Harvey sources, clarified 1-2 year timeframe for $10-20B estimate, updated hash rate to ~1 ZH/s).

Version 1.1 changelog: Added Claim 3.4 (51% attack cost as security floor), Claims 7.3-7.6 (geopolitical boundary conditions including TSMC concentration, export controls, state capture signals, and jurisdiction-contingent predictions).

If you are reading this in 2027 and the thresholds have not been updated, treat the framework with suspicion. If these tests are passed and the framework survives, I will not claim vindication. I will update the thresholds and specify new tests. That is what distinguishes analysis from advocacy.