What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Paper
•
2404.07129
•
Published
•
3
See figure 6: Classes vs labels in columns B and C. Subcircuit B delays phase change on number classes vs C delays on number of labels (dramatically)