Reading Guide
Seventeen papers, one argument — from doctrinal characterization through impossibility theorem to Lean-verified spectral geometry
Papers
| Name | Role | Part |
|---|---|---|
| Composition Doctrine | Doctrinal spine: the unique numerical invariant respecting seam-independent disclosure (Lean-verified) | 0 |
| SCPI | Formal hinge: predicate invention under sheaf constraints; the impossibility theorem in topological form | I |
| Bridge | Empirical witness: bilateral validity coexists with compositional failure on real systems | II |
| Seam | Protocol consequence: manifests, semantic witnesses, fraud proofs | III |
| The Coherence Fee (Paper I) | Formal definition: fee as rank deficit of the coboundary (hierarchical decomposition) | IV-A |
| Column-Matroid Backbone (Paper II) | Structural identity: fee as matroid corank with closed-form pairwise formula | IV-B |
| The Witness Gram (Paper III) | Spectral geometry: Kron-reduced Laplacian and disclosure substitutability | IV-C |
| Signed-Incidence Structure | Field-independence: TU rows; partition-style typed repair structurally fails | IV-D |
| Witness Geometry Beyond Fee | Repair calculus: basis count β(G) = ∏|C_{d,j}|; sharp bounds; realizability | IV-E |
| The Stitching Defect | Gluing identity: fee(G₁ ∪_B G₂) = fee(G₁) + fee(G₂) + rank(δ_B); safe stitching | IV-F |
| SHEAF | Frontier extension: heterogeneous agents, enriched cohomology, scaling theory | V |
| Communication Bottleneck | Spectral lower bound: Haar-universality and triangle holonomy (harvested from SHEAF) | V |
| Coherence Cliff | Scaling evidence: regime change as composition grows | V |
| Local Validity Does Not Compose | Impossibility theorem: no bounded-radius local audit certifies coherence | V |
| BABEL | Public benchmark: 932 instances, 7 families, 3 tracks, oracle CoT | VI |
| Compositional Hiddenness | Representation-conditioned predicate access: 88% structured vs 46% flat accuracy | VI |
| Interpretability Frontier | Representational boundary: edge-local interpretability is not enough | VII |
#Reading Guide
This volume can be read in three different ways.
The first is doctrinal: start from the canonical statement, then descend to the constituent papers as instances.
- The Composition Doctrine (Part 0) for the canonical statement: witness rank is the unique numerical invariant respecting seam-independent disclosure as the primitive operation of compositional repair. Lean-verified.
SCPI(Part I) for the topological direction: predicate invention as a descent problem, with the obstruction sequence formalized.Bridge(Part II) for the empirical witness: the obstruction appears in concrete LLM-tool compositions.Seam(Part III) for the protocol consequence: which is the disclosure normal form of the doctrine paper, made operational.- The Bulla Formal Trilogy (Part IV: Papers I–III + Signed-Incidence + Witness Geometry Beyond Fee) for the structural identity, field-independence, spectral geometry, and repair calculus on real-schema MCP corpora.
SHEAF, the Communication Bottleneck, the Coherence Cliff, and Local Validity Does Not Compose (Part V) for the frontier extension and impossibility theorem.BABELand Compositional Hiddenness (Part VI) for the public benchmark and representation-conditioned LLM study.- The Interpretability Frontier (Part VII) for the representational boundary: even the best per-component tools cannot substitute for structural diagnosis.
The second is architectural: the logical structure of the argument, beginning from impossibility and ending at the boundary.
SCPIfor the formal hinge: bilateral checks cannot see cycles.- Local Validity Does Not Compose for the impossibility theorem: no bounded-radius local audit certifies global coherence in general.
Bridgefor the empirical witness: LLMs cannot see cycles either.Seamfor the protocol consequence: and you can prove they didn't.- The Bulla Formal Trilogy + Signed-Incidence + Witness Geometry Beyond Fee for the structural identity, field-independence, and repair geometry of the obstruction invariant.
SHEAFfor the frontier extension: heterogeneous-agent coordination under explicit impossibility conditions.- The Coherence Cliff and Communication Bottleneck: and the problem grows worse at scale.
BABEL,Bronze+,Silver, and Compositional Hiddenness: and here is how to measure all of it.- The Interpretability Frontier: and even the best per-component tools cannot substitute for structural diagnosis.
- The Composition Doctrine: and all of this is the unique invariant any reasonable theory of compositional obstruction must recover.
The third is evidence-first: start with what you can run, then ask why it works.
BABEL(Part VI) for the benchmark: 932 instances, 7 families, 3 tracks. Five frontier LLMs fail.- Compositional Hiddenness (Part VI) for the LLM study: 88% structured vs 46% flat accuracy on a hidden-convention task.
- The Coherence Cliff (Part V) for the scaling evidence: the regime change where bounded-depth testing collapses.
- The Interpretability Frontier (Part VII) for the representational boundary: mechanistic interpretability at its best still cannot diagnose cyclic compositional failure.
- The Bulla Formal Trilogy (Part IV) for the formal identities: 14 Lean-verified theorems anchoring the empirical results to a 703-composition real-schema MCP corpus.
SCPI(Part I) and the Composition Doctrine (Part 0) for the formal foundation: the obstruction is topological, the proof is machine-checked, and the invariant is uniquely forced.Seam(Part III) for the protocol response: what to build once the obstruction is granted.
Three distinctions apply throughout:
Doctrinal layer(Composition Doctrine) vsinstance layer(the rest).Settled core(Parts 0–IV) vsfrontier(Parts V–VII).Paper materialvsappendix burden.
Parts 0–IV constitute the hard center of the present program. Within Part IV, the Bulla Formal Trilogy (Papers I, II, III) is the structural-spectral spine, with Signed-Incidence Structure providing the field-independence theorem and Witness Geometry Beyond Fee developing the repair-entropy decision theory. Part V is the frontier: SHEAF reaches further than the settled core, the Communication Bottleneck Theorem is the first frontier result that has passed an autonomy test, the Coherence Cliff is the program's strongest large-scale empirical evidence (a regime change where the predictive gap between the best sheaf diagnostic and the best conventional baseline nearly triples from 5 to 50 nodes), and Local Validity Does Not Compose proves the impossibility theorem in the rank-1 signed model with field-independence to the Q-valued Bulla model via signed-incidence.
Part VI is the strongest section of the omnibus in terms of external artifact weight. It presents BABEL, the public benchmark (932 instances across 7 workflow families and 3 provenance tiers) that operationalizes the central claim into a public instrument with three evaluation tracks: failure prediction, failure localization, and budgeted repair. Compositional Hiddenness tests whether frontier LLMs can recover the non-local hidden-convention predicate that the structural diagnostic computes exactly: under structured relational prompts, accuracy is 88%; under flat prompts, 46%; the hiddenness predicate is graph-relative — the same tool has different hidden sets depending on its composition partner — and frontier LLMs revert to lexical heuristics without explicit relational framing. Bronze+ is a mixed-provenance MCP composition where an official reference server participates in a workflow that is protocol-green and semantics-red. Silver extends to invoice/settlement with two non-house servers (MarkItDown + Memory). The structural diagnostic achieves R² ≥ 0.86 across all seven families. Five frontier LLMs fail to rank compositions by severity (ρ near zero); an oracle CoT experiment across all five models isolates the bottleneck as arithmetic reasoning rather than information extraction. The live-pipeline validation (BABEL §6.6) provides the first fully non-circular result: structural holonomy correlates with measured dollar error from the actual MCP server pipeline (ρ = 0.795, p < 7.4 × 10⁻⁷).
Part VII is the capstone. The Interpretability Frontier paper tests whether mechanistic interpretability — the most powerful per-component diagnostic technology available — can substitute for structural diagnosis on cyclic compositional failure. Across 240+ compositions, three domains, four scales, two model architectures (GPT-2 Small and Gemma 2 2B), and six interpretability baseline families (including a cycle-oracle aggregator with explicit knowledge of graph topology), the answer is no. The structural diagnostic achieves ρ = 1.0 in every condition; the best interpretability baseline never exceeds ρ = 0.758. Probing classifiers achieve 99.8% accuracy at every edge and carry zero global signal, and cycle-oracle aggregation adds nothing over edge-local averaging. Cross-model replication on Gemma 2 2B (a 20× larger model from a different architecture family) shows the gap widens rather than narrows with model scale. The gap is representational, not aggregational. This completes the argument: the doctrine identifies the canonical invariant; the formal theory predicts local blindness; the benchmark measures it; and the interpretability paper shows that even the richest local tools inherit the same boundary — across architectures.