The Hydra Effect: Emergent Self-repair in Language Model Computations Paper • 2307.15771 • Published Jul 28, 2023 • 18
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Paper • 2307.09458 • Published Jul 18, 2023 • 10