Technical Spine Omnibus

Reading Guide

Seventeen papers, one argument — from doctrinal characterization through impossibility theorem to Lean-verified spectral geometry

Papers

NameRolePart
Composition DoctrineDoctrinal spine: the unique numerical invariant respecting seam-independent disclosure (Lean-verified)0
SCPIFormal hinge: predicate invention under sheaf constraints; the impossibility theorem in topological formI
BridgeEmpirical witness: bilateral validity coexists with compositional failure on real systemsII
SeamProtocol consequence: manifests, semantic witnesses, fraud proofsIII
The Coherence Fee (Paper I)Formal definition: fee as rank deficit of the coboundary (hierarchical decomposition)IV-A
Column-Matroid Backbone (Paper II)Structural identity: fee as matroid corank with closed-form pairwise formulaIV-B
The Witness Gram (Paper III)Spectral geometry: Kron-reduced Laplacian and disclosure substitutabilityIV-C
Signed-Incidence StructureField-independence: TU rows; partition-style typed repair structurally failsIV-D
Witness Geometry Beyond FeeRepair calculus: basis count β(G) = ∏|C_{d,j}|; sharp bounds; realizabilityIV-E
The Stitching DefectGluing identity: fee(G₁ ∪_B G₂) = fee(G₁) + fee(G₂) + rank(δ_B); safe stitchingIV-F
SHEAFFrontier extension: heterogeneous agents, enriched cohomology, scaling theoryV
Communication BottleneckSpectral lower bound: Haar-universality and triangle holonomy (harvested from SHEAF)V
Coherence CliffScaling evidence: regime change as composition growsV
Local Validity Does Not ComposeImpossibility theorem: no bounded-radius local audit certifies coherenceV
BABELPublic benchmark: 932 instances, 7 families, 3 tracks, oracle CoTVI
Compositional HiddennessRepresentation-conditioned predicate access: 88% structured vs 46% flat accuracyVI
Interpretability FrontierRepresentational boundary: edge-local interpretability is not enoughVII

Reading Guide

This volume can be read in three different ways.

The first is doctrinal: start from the canonical statement, then descend to the constituent papers as instances.

  1. The Composition Doctrine (Part 0) for the canonical statement: witness rank is the unique numerical invariant respecting seam-independent disclosure as the primitive operation of compositional repair. Lean-verified.
  2. SCPI (Part I) for the topological direction: predicate invention as a descent problem, with the obstruction sequence formalized.
  3. Bridge (Part II) for the empirical witness: the obstruction appears in concrete LLM-tool compositions.
  4. Seam (Part III) for the protocol consequence: which is the disclosure normal form of the doctrine paper, made operational.
  5. The Bulla Formal Trilogy (Part IV: Papers I–III + Signed-Incidence + Witness Geometry Beyond Fee) for the structural identity, field-independence, spectral geometry, and repair calculus on real-schema MCP corpora.
  6. SHEAF, the Communication Bottleneck, the Coherence Cliff, and Local Validity Does Not Compose (Part V) for the frontier extension and impossibility theorem.
  7. BABEL and Compositional Hiddenness (Part VI) for the public benchmark and representation-conditioned LLM study.
  8. The Interpretability Frontier (Part VII) for the representational boundary: even the best per-component tools cannot substitute for structural diagnosis.

The second is architectural: the logical structure of the argument, beginning from impossibility and ending at the boundary.

  1. SCPI for the formal hinge: bilateral checks cannot see cycles.
  2. Local Validity Does Not Compose for the impossibility theorem: no bounded-radius local audit certifies global coherence in general.
  3. Bridge for the empirical witness: LLMs cannot see cycles either.
  4. Seam for the protocol consequence: and you can prove they didn't.
  5. The Bulla Formal Trilogy + Signed-Incidence + Witness Geometry Beyond Fee for the structural identity, field-independence, and repair geometry of the obstruction invariant.
  6. SHEAF for the frontier extension: heterogeneous-agent coordination under explicit impossibility conditions.
  7. The Coherence Cliff and Communication Bottleneck: and the problem grows worse at scale.
  8. BABEL, Bronze+, Silver, and Compositional Hiddenness: and here is how to measure all of it.
  9. The Interpretability Frontier: and even the best per-component tools cannot substitute for structural diagnosis.
  10. The Composition Doctrine: and all of this is the unique invariant any reasonable theory of compositional obstruction must recover.

The third is evidence-first: start with what you can run, then ask why it works.

  1. BABEL (Part VI) for the benchmark: 932 instances, 7 families, 3 tracks. Five frontier LLMs fail.
  2. Compositional Hiddenness (Part VI) for the LLM study: 88% structured vs 46% flat accuracy on a hidden-convention task.
  3. The Coherence Cliff (Part V) for the scaling evidence: the regime change where bounded-depth testing collapses.
  4. The Interpretability Frontier (Part VII) for the representational boundary: mechanistic interpretability at its best still cannot diagnose cyclic compositional failure.
  5. The Bulla Formal Trilogy (Part IV) for the formal identities: 14 Lean-verified theorems anchoring the empirical results to a 703-composition real-schema MCP corpus.
  6. SCPI (Part I) and the Composition Doctrine (Part 0) for the formal foundation: the obstruction is topological, the proof is machine-checked, and the invariant is uniquely forced.
  7. Seam (Part III) for the protocol response: what to build once the obstruction is granted.

Three distinctions apply throughout:

  • Doctrinal layer (Composition Doctrine) vs instance layer (the rest).
  • Settled core (Parts 0–IV) vs frontier (Parts V–VII).
  • Paper material vs appendix burden.

Parts 0–IV constitute the hard center of the present program. Within Part IV, the Bulla Formal Trilogy (Papers I, II, III) is the structural-spectral spine, with Signed-Incidence Structure providing the field-independence theorem and Witness Geometry Beyond Fee developing the repair-entropy decision theory. Part V is the frontier: SHEAF reaches further than the settled core, the Communication Bottleneck Theorem is the first frontier result that has passed an autonomy test, the Coherence Cliff is the program's strongest large-scale empirical evidence (a regime change where the predictive gap between the best sheaf diagnostic and the best conventional baseline nearly triples from 5 to 50 nodes), and Local Validity Does Not Compose proves the impossibility theorem in the rank-1 signed model with field-independence to the Q-valued Bulla model via signed-incidence.

Part VI is the strongest section of the omnibus in terms of external artifact weight. It presents BABEL, the public benchmark (932 instances across 7 workflow families and 3 provenance tiers) that operationalizes the central claim into a public instrument with three evaluation tracks: failure prediction, failure localization, and budgeted repair. Compositional Hiddenness tests whether frontier LLMs can recover the non-local hidden-convention predicate that the structural diagnostic computes exactly: under structured relational prompts, accuracy is 88%; under flat prompts, 46%; the hiddenness predicate is graph-relative — the same tool has different hidden sets depending on its composition partner — and frontier LLMs revert to lexical heuristics without explicit relational framing. Bronze+ is a mixed-provenance MCP composition where an official reference server participates in a workflow that is protocol-green and semantics-red. Silver extends to invoice/settlement with two non-house servers (MarkItDown + Memory). The structural diagnostic achieves R² ≥ 0.86 across all seven families. Five frontier LLMs fail to rank compositions by severity (ρ near zero); an oracle CoT experiment across all five models isolates the bottleneck as arithmetic reasoning rather than information extraction. The live-pipeline validation (BABEL §6.6) provides the first fully non-circular result: structural holonomy correlates with measured dollar error from the actual MCP server pipeline (ρ = 0.795, p < 7.4 × 10⁻⁷).

Part VII is the capstone. The Interpretability Frontier paper tests whether mechanistic interpretability — the most powerful per-component diagnostic technology available — can substitute for structural diagnosis on cyclic compositional failure. Across 240+ compositions, three domains, four scales, two model architectures (GPT-2 Small and Gemma 2 2B), and six interpretability baseline families (including a cycle-oracle aggregator with explicit knowledge of graph topology), the answer is no. The structural diagnostic achieves ρ = 1.0 in every condition; the best interpretability baseline never exceeds ρ = 0.758. Probing classifiers achieve 99.8% accuracy at every edge and carry zero global signal, and cycle-oracle aggregation adds nothing over edge-local averaging. Cross-model replication on Gemma 2 2B (a 20× larger model from a different architecture family) shows the gap widens rather than narrows with model scale. The gap is representational, not aggregational. This completes the argument: the doctrine identifies the canonical invariant; the formal theory predicts local blindness; the benchmark measures it; and the interpretability paper shows that even the richest local tools inherit the same boundary — across architectures.