BABEL
Benchmark for Autonomous Bridge Evaluation and Localization
Key result
Live-pipeline: holonomy and distance both ρ=0.423 on invoice (N=75); cyclic calendar 7.5× PI separation (N=192, β₁≥3). Topological awareness is the key ingredient; algebraic apparatus adds narrow dimension-specific value.
Falsification
A submission achieving positive R² on Track A without using topological features
Abstract
Compositional semantic failure — locally correct systems composing into globally wrong outcomes — has caused billions in documented losses across six domains. BABEL is the first public benchmark for this failure mode: 932 instances, three evaluation tracks, five frontier LLMs that cannot solve it.