@gsarti on Hugging Face: "💥 Today's pick in Interpretability & Analysis of LMs: 🩺 Patchscopes: A…"

Post

💥 Today's pick in Interpretability & Analysis of LMs: 🩺 Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models by @asmadotgh , @codevan , @1wheel , @iislucas & @mega

Patchscopes is a generalized framework for verbalizing information contained in LM representations. This is achieved via a mid-forward patching operation inserting the information into an ad-hoc prompt aimed at eliciting model knowledge. Patchscope instances for vocabulary projection, feature extraction and entity resolution in model representation are show to outperform popular interpretability approaches, often resulting in more robust and expressive information.

🌐 Website: https://pair-code.github.io/interpretability/patchscopes/
📄 Paper: Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models (2401.06102)

Join the conversation