Technical Claim Ledger
Technical Claim Ledger
This ledger is the program-control surface for the /papers research layer.
Its purpose is not to repeat every theorem, experiment, or protocol detail. Its purpose is to keep the technical program honest about five different things that should not be conflated:
- current claim status
- dependency and scope
- strongest supporting artifact
- disconfirmation condition
- downgrade language if the claim weakens
Use this ledger before promoting any result into a paper abstract, omnibus appendix, gateway essay, or commercialization memo.
Status Classes
| Status | Meaning |
|---|---|
Settled | Strongly supported by proof, formalization, or repeated artifact evidence in the regime actually claimed. |
Conditional | Supported under explicit assumptions, restricted regimes, or theorem hypotheses that remain load-bearing. |
Empirical | Supported by experiment, benchmark, or observed artifact behavior, but not yet elevated to theorem-level generality. |
Artifact-backed | Supported by executable or inspectable implementation surfaces, but not yet by broad empirical closure. |
Conjectural | A real research target with nontrivial evidence, but not yet closed. |
Ledger
| Layer | Flagship claim | Current status | Dependency / scope | Strongest supporting artifact | Disconfirmation condition | Downgrade language if disconfirmed |
|---|---|---|---|---|---|---|
SCPI | Predicate invention under sheaf constraints decomposes into distinct topological, conservativity, and definability gates. | Settled | Finite-site / spine regime; stated hypotheses still matter at the perimeter. | scpi/paper/scpi.tex, scpi/lean/SCPI/Torsor.lean, scpi/lean/SCPI/Conservativity.lean | A counterexample in the stated spine regime where the gates collapse or fail to separate diagnostically. | Restrict the claim to the verified finite-site regime and remove broader separation language. |
SCPI | The obstruction to gluing is borne by an H^1-class under the paper's descent setup. | Settled | Depends on the explicit descent surface used in the paper and Lean formalization. | scpi/lean/SCPI/Torsor.lean, technical-spine/06-appendix-proof-status-ledger.md | A verified failure of the obstruction classification in the stated coefficient / finite-cover regime. | Restate as a regime-specific obstruction diagnostic rather than a general classification. |
SCPI | Schema discovery admits the stronger functorial / universality surface sketched in the paper and Lean statements. | Conditional | Depends on SchemaDiscovery.lean closure and stronger categorical proof burden. | scpi/lean/SCPI/SchemaDiscovery.lean | Failure to prove the stated adjunction or a counterexample to the proposed universal behavior. | Keep only the compatibility statement and treat universality as deferred frontier work. |
Bridge | Bilateral validity can coexist with cycle failure in real compositions. | Empirical | Current seam experiment and related demos; stronger beyond a single-cycle family still needed. | bridge/PLAN.md, bridge/demos/seam_experiment.py, technical-spine/08-appendix-empirical-methods.md | Expanded benchmarks show bilateral-pass / cycle-fail cases are rare, unstable, or reducible to ordinary bilateral errors. | Reframe as a narrow seam failure phenomenon rather than a broad compositional law. |
Bridge | The obstruction dimension predicts the minimum typed bridge burden required for repair in the tested regime. | Empirical | Presently strongest in the A05 minimality witness and nearby cases. | bridge/PLAN.md, technical-spine/08-appendix-empirical-methods.md | A topology-matched benchmark where the predicted minimal bridge set fails while a smaller non-predicted repair consistently succeeds. | Downgrade from minimum bridge burden to a useful repair heuristic in the current benchmark family. |
Bridge | The next flagship result should be interventionally law-like: failure tracks topology, predicted minimal repair removes it, and non-topological intervention does not. | Empirical | Two live non-isomorphic beta_1=2 families with selective repair, minimality witnesses, portability diagnosis, and wrong-bridge failure. Hidden-seam recovery shows 14/15 exact type matches across independent models under hidden vocabulary. | bridge/EMPIRICAL-STATUS.md, bridge/HIDDEN-REPAIR-STUDY.md, bridge/demos/hidden_repair/summary.md | Expanded experiments show the same failures disappear under prompt tuning or other non-topological interventions; or hidden-seam recovery collapses under paraphrase or provider variation. | Recast the work as a strong benchmark family rather than a law-like empirical finding. |
Seam | The coherence fee is the computable blind-spot burden of observable-only verification. | Conditional | Depends on the model distinction between F_obs and S, and on the current composition formalization. | seam/paper/seam.tex, seam/seam-lint/README.md, technical-spine/09-appendix-protocol-artifacts.md | Real compositions with declared blind spots repeatedly show zero or negative fee under the tool's own assumptions. | Restate as a useful diagnostic quantity for the present tool model, not a generally reliable invariant. |
Seam | Bridge annotations can reduce the fee to zero in inspected composition families. | Artifact-backed | Proven strongest in curated examples and inspectable YAML compositions. | seam/seam-lint/compositions/*.yaml, seam/example/seam_example.py | Production-scale compositions repeatedly resist fee elimination despite adding the predicted bridge fields. | Downgrade from eliminability claim to partial fee reduction and diagnostics guidance. |
Seam | Among functionally matched candidates that all pass local validation, fee-aware qualification reduces downstream procurement regret under externally anchored holdout cases, without using post-reveal outcomes as an input to selection. | Empirical | Externally anchored finance consequence artifact: six holdout cases across four consequence types (committee correction, regulatory restatement, enforcement action, convention-insensitive null) grounded in published findings from ISDA, Basel Committee, ARRC, and US Senate. Fee-aware qualification reduces regret from 57.23 to 33.96, outperforming both local baseline and naive checklist rival. Still synthetic in construction. | seam/study/procurement_tournament/finance_rfp/summary.md, seam/study/procurement_tournament/finance_rfp/holdout_cases.json, seam/study/procurement_tournament/finance_rfp/PROTOCOL.md | The fee-aware rule fails to improve or worsens regret against either the local baseline or the naive convention-checklist rival under a materially different externally anchored holdout regime, or the visible surface effectively leaks the hidden answer. | Keep only the claim that procurement-regret comparison is a tractable study design, not that fee has yet shown operational consequence in real procurement decisions.eal procurement. |
Seam | The near-term job of the protocol layer is measurement and deterrence, not proof of a full market equilibrium. | Settled | A scope rule for publication and commercialization discipline. | seam/seam-lint/README.md, technical-spine/09-appendix-protocol-artifacts.md | The layer is forced to carry unsupported full-market claims to make the papers legible. | Explicitly narrow the paper back to workflow and procurement relevance only. |
SHEAF | H^1-style obstruction diagnostics remain the right frontier extension for heterogeneous agent agreement. | Conditional | Depends on importing the settled core while keeping enriched and distributed claims visibly open. | sheaf/paper/sheaf.tex, TECHNICAL-SPINE.md, technical-spine/10-appendix-frontier-notes.md | The frontier examples fail to preserve the obstruction logic even in the implemented abelian or benchmarked surfaces. | Treat SHEAF as a speculative design program rather than a direct extension of the core spine. |
SHEAF | The communication bottleneck theorem is a paper-shaped second technical object with independent mathematical dignity. | Conditional | Self-contained paper draft exists (sheaf/paper/communication-bottleneck-paper.tex). Parts (a)-(b) proved; Part (c) assembled from two derived lemmas + BSS citation. Autonomy test passed. Remaining risk ~3% in holonomy lower-bound constant. | sheaf/paper/communication-bottleneck-paper.tex, sheaf/paper/AUTONOMY-TEST.md, sheaf/proofs/haar-universality-lemma.tex, sheaf/proofs/CITATION-SPINE.md, sheaf/COMMUNICATION-BOTTLENECK-HARVEST.md | The holonomy lower-bound constant cannot be made precise, or the paper still reads as program-dependent despite the autonomy test. | Keep as a near-complete paper draft with the constant gap explicitly named as the residual, rather than promoting as fully submission-grade. |
SHEAF | The enriched Laplacian-Cohomology Bridge and quantale-enriched H^1 are real frontier targets, but not the next-cycle public center. | Conjectural | Requires major new mathematics and likely outside collaboration. | sheaf/paper/sheaf.tex, technical-spine/10-appendix-frontier-notes.md | Progress stalls or the definitions fail to cohere into a usable object. | Preserve as research pulls and frontier notes, not as headline claims. |
| BABEL | Five frontier LLMs fail to rank compositions by failure severity under standard and guided-CoT prompting. | Demonstrated | 5 models (GPT-4o, Claude Sonnet 4, Opus 4, Codex 5.2, Gemini 3.1 Pro), 3 providers, N=18–50 per model. All achieve negative R² and near-zero Spearman ρ. | ../benchmark/coherence-gym/BABEL_PAPER.md, ../benchmark/coherence-gym/results_canonical/llm_eval_*.json | A frontier LLM achieves positive R² on BABEL under a reasonable prompting setup, or a tool-augmented LLM matches the structural diagnostic. | Reframe as a prompting-setup limitation rather than a capability limitation; narrow to specific prompt conditions tested. |
| BABEL | Oracle CoT decomposes LLM failure into informational and arithmetic components. | Demonstrated | Pre-computed restriction matrices + explicit arithmetic instructions at N=50 across all 5 models. Hierarchy: Claude ρ=0.80, GPT-4o ρ=0.39, Opus ρ=0.35, Codex ρ=0.26, Gemini ρ=0.04. R² deeply negative for all. | ../benchmark/coherence-gym/BABEL_PAPER.md, ../benchmark/coherence-gym/results_canonical/llm_eval_*_oracle.json | A model achieves both positive R² and high ρ under oracle conditions, closing the arithmetic gap. | Narrow the decomposition claim to the specific matrix-arithmetic task format tested. |
| BABEL | Structural repair prescriptions dominate sheaf-free alternatives at constrained budget across all 7 families. | Demonstrated | K=1: 11–29% vs 8–26% (rounded); K=3: structural leads on every family; K=8: methods converge. 932 instances, deterministic symbolic executor. | ../benchmark/coherence-gym/BABEL_PAPER.md, ../benchmark/coherence-gym/results_canonical/canonical_structural_sheaf_dev.json | A simpler method matches or exceeds structural repair at K=1–3 across families, eliminating the triage advantage. | Reframe as a benchmark-internal finding; narrow the dominance claim to the tested repair model. |
| BABEL | The benchmark is a public, reproducible instrument with frozen evaluation protocol. | Complete | 932 instances, 7 families, 3 provenance tiers, 3 tracks, 10 baselines, dev/test/hidden splits, replication pack, submission schema. | ../benchmark/coherence-gym/BABEL_PAPER.md, ../benchmark/coherence-gym/BENCHMARK_SPEC.md, ../benchmark/coherence-gym/replication_pack/ | External replication attempts reveal non-reproducibility or systematic errors in the evaluation harness. | Restate as a benchmark release with known internal consistency but pending external confirmation. |
| Interp-Frontier | Edge-local mechanistic interpretability (SAE, probing, CKA, attention) is provably incomplete for cyclic compositional failure prediction. | Conjectural | Paper design complete. Requires empirical execution: open-weight models, SAE/probing extraction, scale sweep across acyclic/cyclic topologies. No results yet. | interpretability-frontier/paper/interpretability-frontier.tex, interpretability-frontier/PLAN.md | Interpretability baselines remain competitive on large cyclic graphs, or the gap vanishes after giving pairwise methods mild graph aggregation. | Reframe as a regime-dependent finding rather than a general limitation of interpretability tools. |
| Interp-Frontier | Structural diagnostics outperform interpretability-informed methods using only composition metadata, without model internals. | Conjectural | Predicted by Edge-Local Blindness Lemma (Bridge). Requires empirical confirmation on real LLM compositions. | interpretability-frontier/paper/interpretability-frontier.tex | A combined interpretability baseline (e.g. graph-aggregated SAE features) matches or exceeds the structural method on cyclic graphs. | Narrow to specific interpretability feature sets and model families tested. |
Release Artifact Rule
Each major workstream should ship three things together:
| Workstream | Paper surface | Executable or inspectable artifact | One-page claim memo |
|---|---|---|---|
SCPI | scpi/paper/scpi.tex | scpi/lean/ | this ledger plus local README status |
Bridge | bridge/paper/bridge.tex | bridge/demos/ | benchmark memo for the next flagship result |
Seam | seam/paper/seam.tex | seam/seam-lint/ and seam/example/ | measurement-and-deterrence memo |
SHEAF | sheaf/paper/sheaf.tex or harvested theorem note | sheaf/simulations/, sheaf/proofs/ | theorem-harvest memo |
BABEL | ../benchmark/coherence-gym/BABEL_PAPER.md | ../benchmark/coherence-gym/ (evaluator, instances, results) | this ledger |
Interp-Frontier | interpretability-frontier/paper/interpretability-frontier.tex | interpretability-frontier/experiments/ (extraction pipeline, evaluation harness) | this ledger |
Usage Rule
Before expanding any paper or adding a new one:
- Confirm that the claim appears in this ledger with a status and a disconfirmation condition.
- Confirm that the strongest artifact named here actually exists and is inspectable.
- Confirm that the public wording matches the downgrade language if the stronger version is not yet earned.