Post
π₯ Today in Interpretability & Analysis of LMs: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking by
@nikhil07prakash
@tamarott
@TalHaklay
@belinkov
@davidbau
Fine-tuning is commonly used to improve LMβs capabilities, but its impact on model-internal mechanisms remains poorly understood.
This work evaluates the impact of fine-tuning from a mechanistic perspective, using entity tracking in fine-tuned LLaMA 7B variants as a test bench.
Authors use path patching to highlight how fine-tuned models largely employ the same circuits as their pre-trained counterparts to solve entity tracking. Desiderata-based Component Masking (DCM) is used to discern the function of circuit components, finding that even the functionality of the circuit components remains consistent after fine-tuning.
Where do the gains stem from, then? Using Cross-Model Activation Patching (CMAP), authors show the benefits of fine-tuning are largely derived from an improved ability of circuit components to encode important task-relevant information rather than an overall functional rearrangement. Interestingly, fine-tuned activations are compatible with the base model despite no explicit constraint during representation learning.
π Website: http://finetuning.baulab.info/
π€ Model: nikhil07prakash/float-7b
π Paper: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking (2402.14811)
π All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9
Fine-tuning is commonly used to improve LMβs capabilities, but its impact on model-internal mechanisms remains poorly understood.
This work evaluates the impact of fine-tuning from a mechanistic perspective, using entity tracking in fine-tuned LLaMA 7B variants as a test bench.
Authors use path patching to highlight how fine-tuned models largely employ the same circuits as their pre-trained counterparts to solve entity tracking. Desiderata-based Component Masking (DCM) is used to discern the function of circuit components, finding that even the functionality of the circuit components remains consistent after fine-tuning.
Where do the gains stem from, then? Using Cross-Model Activation Patching (CMAP), authors show the benefits of fine-tuning are largely derived from an improved ability of circuit components to encode important task-relevant information rather than an overall functional rearrangement. Interestingly, fine-tuned activations are compatible with the base model despite no explicit constraint during representation learning.
π Website: http://finetuning.baulab.info/
π€ Model: nikhil07prakash/float-7b
π Paper: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking (2402.14811)
π All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9