Interpretability Frontier
Can Mechanistic Interpretability Substitute for Structural Diagnosis?
Falsification
Probing classifiers achieving compositional signal (not just edge-local accuracy)
Abstract
Tests whether mechanistic interpretability (SAE, probing, circuit tracing) can substitute for structural diagnosis on cyclic compositional failure.