Interpretability Frontier
Can Mechanistic Interpretability Substitute for Structural Diagnosis?
Key result
Edge-local interpretability is provably incomplete for cyclic compositional failure — structural diagnostic achieves ρ=1.0 in every condition; best interpretability baseline never exceeds ρ=0.758
Falsification
Probing classifiers achieving compositional signal (not just edge-local accuracy)
Abstract
Tests whether mechanistic interpretability (SAE, probing, circuit tracing) can substitute for structural diagnosis on cyclic compositional failure.