Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Paper • 2403.17806 • Published Mar 26 • 3
🔍 Daily Picks in Interpretability & Analysis of LMs Collection Outstanding research in interpretability and evaluation of language models, summarized • 82 items • Updated 3 days ago • 91