What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper
•
2404.07129
•
Published
•
3
Modifying activations during training with proper gradient flow