@gsarti on Hugging Face: "💥 Today in Interpretability & Analysis of LMs: Fine-Tuning Enhances Existing…"

Post

💥 Today in Interpretability & Analysis of LMs: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking by @nikhil07prakash @tamarott @TalHaklay @belinkov @davidbau

Fine-tuning is commonly used to improve LM’s capabilities, but its impact on model-internal mechanisms remains poorly understood.

This work evaluates the impact of fine-tuning from a mechanistic perspective, using entity tracking in fine-tuned LLaMA 7B variants as a test bench.

Authors use path patching to highlight how fine-tuned models largely employ the same circuits as their pre-trained counterparts to solve entity tracking. Desiderata-based Component Masking (DCM) is used to discern the function of circuit components, finding that even the functionality of the circuit components remains consistent after fine-tuning.

Where do the gains stem from, then? Using Cross-Model Activation Patching (CMAP), authors show the benefits of fine-tuning are largely derived from an improved ability of circuit components to encode important task-relevant information rather than an overall functional rearrangement. Interestingly, fine-tuned activations are compatible with the base model despite no explicit constraint during representation learning.

🌐 Website: http://finetuning.baulab.info/
🤖 Model: nikhil07prakash/float-7b
📄 Paper: Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking (2402.14811)

🔍 All daily picks: gsarti/daily-picks-in-interpretability-and-analysis-of-lms-65ae3339949c5675d25de2f9

Join the conversation