Empirical

BABEL

Benchmark for Autonomous Bridge Assessment and Localization Evaluation

FrozenCanonical artifact (v0.1.0, 2026-03-24). 932 instances, 7 families, 3 provenance tiers. Frozen evaluation protocol. Two live validations: 5-cluster invoice (N=75) + 4-server cyclic calendar (β₁≥3, N=192).
Key result

Live-pipeline: holonomy and distance both ρ=0.423 on invoice (N=75); cyclic calendar 7.5× PI separation (N=192, β₁≥3). Topological awareness is the key ingredient; algebraic apparatus adds narrow dimension-specific value.

Falsification

A submission achieving positive R² on Track A without using topological features

Abstract

A public benchmark operationalizing the compositional-failure claim. 932 synthetic and real-MCP compositions across workflow families. Three evaluation tracks: failure prediction, failure localization, and budgeted repair.