Value Needs Work

The Selection Gradient

16 min read

Armen Alchian offered a reframing of the firm that still feels like a corrective. Writing in 1950, he did not require firms to be rational, or even particularly coherent. He required only that markets be competitive enough to prune. A clumsy firm that blunders into a superior method will outlast a careful firm that plans an inferior one. Outcomes survive. Intentions do not. Richard Nelson and Sidney Winter extended the insight into an evolutionary account of economic change: firms as carriers of routines, habitual patterns that once worked well enough to escape pruning, modified through search containing luck, error, imitation, drift. Order emerges without omniscience, through the differential survival of routines that happen to fit the world as it is.

The previous chapter established that thermodynamic depth (the computational work required to produce a structure) is the entry fee to the economic contest. But the entry fee is not the prize. Most training runs, judged by what anyone will pay for the result, are losses whose thermodynamic signatures are indistinguishable from those of the successes. Between expenditure and reward sits selection, the set of filters that decide whether coherence ever becomes surplus. Training can manufacture coherence. Only selection converts coherence into surplus.

A training run that produces a frontier model passes three nested filters before it earns the name. Gradient descent, cycling through parameter space at a pace measured in milliseconds, retains configurations that reduce loss and discards those that raise it. Because loss curves can be plotted and benchmark scores compared, this first stage receives the largest share of public attention. A model that achieves state-of-the-art performance on an evaluation suite detached from deployment reality has cleared the first gate, and the curves look beautiful all the way to convergence. Then the model meets the world. Real users with real problems adopt it, abandon it, route around it, or quietly stop relying on it. A code model that is imperfect on benchmarks but useful in the frameworks engineers actually work with survives; a translation system flawless in a language pair nobody needs does not. Deployment is where the market enters, as an attrition mechanism indifferent to leaderboard rankings. And finally, investors, boards, and internal budget committees observe which approaches survived deployment and feed them more energy. Capital selection is the crudest filter and the most consequential, because it determines the composition of the ecosystem over time: which kinds of computational structure the world will produce more of and which will quietly disappear from the research agenda.

The filters do not add; they multiply. A structure that is internally coherent can still be rejected by deployment, and capital, arriving last and judging most roughly, decides which patterns the world will be forced to see again. The energy bill was paid either way. And the difference between a run that passes all three filters and a run that fails at any stage is the difference between a factor of production and an expensive puddle of waste heat — the diamond and the mud pie.

The V/C Ratio

In the revenue-cycle office of a regional hospital, a prior-authorization request arrives for a proposed knee replacement. An analyst (or increasingly a system) checks the patient's coverage against the payer's policy terms, verifies eligibility, compares the proposed procedure code to the coverage table, and issues a determination. Criteria are explicit. Inputs are structured: diagnostic codes, procedure codes, formulary lists, medical-necessity guidelines. The work is real, and it requires genuine interpretive skill at the margins, but the determination, once made, can be checked by anyone with access to the same documents. Did the diagnosis code satisfy the medical-necessity criteria? Did the eligibility window include the date of service? Industry estimates suggest that roughly three-quarters of commercial prior-authorization decisions now process straight-through, meaning a computational system evaluates the case, applies the criteria, and issues the authorization without a human reviewing the file.

Down the corridor in the claims-resolution office, a different analyst sits before a different kind of problem. A surgical complication has occurred. The insurer must determine whether it was foreseeable, whether the surgical team's response fell within the accepted standard of care, and how the resulting costs divide between coverage and liability. She pulls the operative notes, the nursing records, the anesthesiology logs. She may need to consult with a medical director. She may need to request an independent medical examination. Legal standards vary by state. The causation question (whether the surgeon's delay caused the deterioration or whether deterioration was inevitable) resists clean resolution because the counterfactual cannot be observed, only argued. In claims adjudication, straight-through processing rates remain in the single digits.

Prior authorization is not ten times simpler than claims adjudication. Both tasks require trained professionals. Both involve ambiguity. By most reasonable measures of intellectual complexity, the gap between them is modest. But the gap in automation rates is an order of magnitude, and the reason has nothing to do with how hard the tasks are to perform. It has to do with how hard they are to check.

Prior-authorization decisions can be verified against the policy terms by anyone with the relevant documents: the comparison is mechanical, the inputs structured, the review a matter of minutes. Claims adjudication can require chart review, expert consultation, legal analysis, and the reconstruction of causality in a space where causality is contested; verification that routinely costs more than having a human adjuster do the original work. When the cost of checking the machine exceeds the cost of the human, automation creates no surplus, however capable the machine may be.

Verification cost governs the automation sequence with a fidelity that cognitive difficulty cannot match. The ratio between the value a correct decision produces and the cost of verifying that the decision was correct (call it the V/C ratio) determines which tasks cross the substitution threshold first, regardless of their intellectual complexity. High-V/C tasks automate early. Low-V/C tasks resist automation regardless of the machine's competence, because no one can economically confirm that the competence was exercised correctly.

Oliver Williamson's transaction cost economics identified the cost of writing and enforcing complete contracts as the constraint shaping firm boundaries. His framework has been productive for decades and remains valid for the cases it was designed to explain. But V/C identifies a different fulcrum. A contract for custom software development is easy to write and hard to verify: a page of requirements can specify the desired behavior, but whether the resulting code handles edge cases, scales under load, and resists adversaries is a verification problem that can cost more than the development itself. A contract for commodity inference (classify these images, return the labels, maintain accuracy above a defined threshold) is simple to write and equally simple to verify. Accuracy can be measured against a labeled set, latency tracked, uptime monitored, and providers switched if quality drops. The firm boundary dissolves where performance is inspectable, and it persists where verification is expensive, a prediction that agrees with Williamson's direction but locates the mechanism in checkability rather than asset specificity.

Chess illustrates the extreme. Grandmasters spend decades developing intuitions about position, sacrifice, and timing that no one can fully articulate. But the game has an explicit objective function, a win condition that is mechanically checkable. You play, you count wins. A chess engine replaced the grandmaster decades before systems could handle tasks most people would consider far easier; checking chess is cheap. Code generation followed the same logic: a test suite is a verification instrument, and the cost of running it is trivial relative to the value of correctly functioning software. In both cases, the output can be checked, and the cost of checking is low enough relative to the value that automation produces genuine surplus.

Where verification is cheap, automation is competitive. You can switch providers because you can measure quality. You can refuse bad output because you can detect it. Where verification is expensive, automation becomes a different creature: a system whose output cannot be cheaply checked, whose errors cannot be cheaply detected, and whose users are therefore unable to discipline the technology through the normal mechanisms of market feedback. A legal memorandum generated by a model may be fluent, well-structured, and entirely wrong in its citations, and the cost of verifying the citations is the cost of a lawyer who reads the cases, which is the cost of having a lawyer write the memo from scratch.

And in the middle range, where verification is possible but burdensome, the most insidious pathology emerges.

The Competence Trap

A radiologist sits down each morning and opens the queue of images the system has flagged. In the first weeks she works as she has always worked, reading with care, comparing what she sees against what the model suggests, disagreeing when the model is wrong, catching what it missed. She is alert. She is practicing the skill that twenty years of training built. The system is usually right.

Over time the ratios work on her. If she agrees with the model ninety-four percent of the time, then in a day of sixty studies her disagreements become rare events, two or three per shift. In the second month the agreement rate climbs higher. By the sixth month she has a rhythm. A flag appears. She glances at the image. The flag is consistent with what she sees. She confirms. Seconds, not minutes. Throughput triples. The department chair is pleased.

Something has changed that does not show up on any dashboard. She is no longer performing independent diagnosis. She is performing confirmation of the model's diagnosis, and confirmation is a different cognitive act, requiring less sustained attention, less of the slow muscular pattern recognition that distinguishes a competent reader from a great one. The skill atrophies: not in a dramatic failure but in the quiet erosion that follows from disuse, the way a surgeon's hands lose their precision during a long sabbatical, the way a musician's intonation drifts when she stops practicing scales. She understands the image less thoroughly than the system that flagged it. But she is the one who signs the report. She is the one who will be deposed if the patient's family sues.

Her signature has become a ceremony, a ritual performance of oversight whose epistemic weight has thinned to nothing without anyone noticing, least of all herself. Delegation has hollowed out the competence required to oversee the delegation. The loop has emptied the human of the skill the loop was supposed to preserve. And the pathology is invisible from within: the hospital sees faster throughput, the insurer sees lower costs, the radiologist herself sees a manageable workload. The only evidence that anything has gone wrong will be the error she would once have caught, arriving one morning in a study she approves in three seconds and a keystroke, producing a consequence that unfolds over months of misdiagnosis while her signature assures the world that a qualified human was watching.

This is the Competence Trap: delegation without maintained understanding. It appears precisely where the V/C ratio sits in a middle range: high enough that automation is economically justified, low enough that a human signature is still demanded. Oversight becomes the justification for delegation and then becomes incapable of catching delegation's errors, because the oversight function has been converted into assent.

An institution that demands a human signature on automated output inherits an obligation it rarely acknowledges: the obligation to keep the signer competent to evaluate what is being signed. Periodic independent readings where the system's output is withheld and the radiologist must decide on her own. Audits that measure diagnostic accuracy, not throughput. Requalification that tests the skill the automation is displacing rather than the speed with which the human can approve its conclusions. Without these investments, the signature becomes an institutional fiction: an assurance to the patient, the insurer, and the court that a qualified human reviewed the image, when in truth a qualified human glanced at a screen for three seconds and pressed Confirm.

Attorneys sign AI-generated memoranda without independently verifying the cited cases. Financial auditors accept automated analyses without reperforming the calculations. In aviation, the automation paradox has been documented since the 1990s and has contributed to accidents in which the crew's first manual intervention in months was the one that occurred during an emergency. The pathology appears wherever verification is expensive enough to tempt the human into deference but not so expensive that the institution gives up on human oversight entirely. The middle range. The range where institutions pretend that someone is watching, and no one, in any meaningful sense, is. The trap operates not only on individuals but on the pipeline that produces them. When agents perform the junior work through which future verifiers are trained — the associate's first brief, the resident's first unassisted read, the apprentice auditor's first independent check — the mechanism by which verification capacity renews itself is severed. Human oversight becomes a depletable stock rather than a renewable flow.

What Verification Cost Predicts

Arithmetic automated first because a correct sum is immediately valuable and almost costless to verify: mechanical calculators made the point, electronic computers scaled it, spreadsheets domesticated it. Pattern recognition followed: optical character recognition, speech-to-text, image classification. These tasks are cognitively nontrivial, but their outputs can be compared cheaply against labeled ground truth that centuries of human practice left behind as an unintended subsidy to both training and verification. Filing clerks who typed index cards for library catalogs, court reporters who transcribed testimony, medical illustrators who drew anatomical plates: all produced labeled datasets as a byproduct of their work. The labels were never intended as training data. They became training data: examples of correct performance in standardized formats, mechanically comparable.

In radiology alone, over a thousand AI devices had received FDA authorization by 2024, while in pathology (which requires physical tissue to be fixed, stained, sectioned, mounted on glass slides, and scanned before any computational analysis can begin) the number of FDA-cleared tools remained in the single digits. The ratio is approximately two hundred to one, and the explanation has nothing to do with cognitive difficulty. It has everything to do with the difference in verification cost.

Coordination is beginning to automate, and here the V/C ratio draws a clean line within the domain. Bilateral coordination with explicit terms is mechanically checkable: did the delivery arrive by the specified date? Was the quantity correct? Did the payment settle at the agreed price? Each question has a deterministic answer computable without judgment. Multilateral coordination (fourteen counterparties with conflicting objectives and incomplete mutual visibility) requires understanding those interests well enough to judge whether a reallocation was equitable, and that judgment resists reduction to a mechanical comparison. Bilateral cases will automate first. Multilateral cases will resist: harder to verify.

Beyond coordination lies the setting of objectives, and here the framework encounters a structural boundary rather than a technical delay: the specification horizon, the point beyond which a principal's intent cannot be encoded in an agent's objective function. Not because the principal is inarticulate but because intent is constitutively richer than any finite instruction set. The gap between what a principal means and what an objective function captures is not reducible by better specification; it is inherent in the relationship between finite instructions and infinite consequence.

The V/C ratio illuminates why. Verifying whether an agent followed an instruction requires comparing the action against the instruction. When the instruction fully specifies the desired behavior, verification is cheap: did the agent deliver the cargo by Tuesday? When the instruction underspecifies ("act in the client's best interest"), verification requires reconstructing intent and evaluating the action against a standard never formally articulated. Verification cost approaches infinity as specification completeness approaches zero, and the specification horizon is the boundary where this ratio diverges. Beyond it, the agent is not disobedient; it is obedient to something that is not what you meant.

The structural parallel to the participation horizon is precise. One marks where governance cannot reach the governed: deliberation too slow for the decisions it must constrain. The other marks where delegation cannot reach the intent: specification too narrow for the purposes it must serve. Both are boundaries of the same constitutional problem: coordination outrunning the human capacities on which its legitimacy depends.

The hospital administrator from Act I (the one who overrides the resource-allocation agent optimizing system-wide outcomes) is operating at the specification horizon. The objective function specified maximizing patient outcomes. The specification did not, because it could not, encode the weight she assigns to the third-floor ward serving single parents who cannot travel across the city. That weight is a judgment emerging from encounter with the particular community the ward serves: knowledge that resists compression into numerical form because it is constituted by attention to particulars no enumeration can exhaust.

The specification horizon is the V/C ratio's structural ceiling. Below it, verification cost governs which tasks automate first. At the horizon, verification cost diverges because the standard against which to verify was never fully specified. Above it lies the domain of Homo Arbiter: purposes, mercy, judgment, the functions that persist not because machines are insufficiently capable but because the intentions machines serve exceed any specification.

A corporation that sets the wrong objective for its agent fleet (optimizing for engagement when it should have optimized for satisfaction, or for growth when it should have optimized for sustainability) will not discover the error until consequences have accumulated over months or years, by which time the objective has shaped a million decisions whose aggregate effect cannot be reversed. The quality of an objective reveals itself only through consequences, and consequences unfold on timescales that no verification procedure can compress. Machines can propose purposes. No one can economically confirm whether the purposes are right until long after adoption has become irrevocable.

Human judgment persists at the apex because the evidence for evaluating ends arrives slowly, through time, through consequence, through the lived experience of what a purpose actually produces when pursued at scale and over years. Time is part of the proof. Machines cannot compress it into certainty. The structural reason the role of Homo Arbiter remains is a prediction about verification cost: one the framework either gets right or gets wrong.

Energy structured through computation and disciplined by selection is the emerging factor of production. Verification cost governs the sequence of automation. If a task with a low V/C ratio automates before tasks with higher V/C ratios in the same domain, the framework is wrong.

Full trilogy·Codex·Research·Proofs

#The V/C Ratio

#The Competence Trap

#What Verification Cost Predicts

The V/C Ratio

The Competence Trap

What Verification Cost Predicts