Post
π₯ Today's pick in Interpretability & Analysis of LMs: π©Ί Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models by
@asmadotgh
,
@codevan
,
@1wheel
,
@iislucas
&
@mega
Patchscopes is a generalized framework for verbalizing information contained in LM representations. This is achieved via a mid-forward patching operation inserting the information into an ad-hoc prompt aimed at eliciting model knowledge. Patchscope instances for vocabulary projection, feature extraction and entity resolution in model representation are show to outperform popular interpretability approaches, often resulting in more robust and expressive information.
π Website: https://pair-code.github.io/interpretability/patchscopes/
π Paper: Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models (2401.06102)
Patchscopes is a generalized framework for verbalizing information contained in LM representations. This is achieved via a mid-forward patching operation inserting the information into an ad-hoc prompt aimed at eliciting model knowledge. Patchscope instances for vocabulary projection, feature extraction and entity resolution in model representation are show to outperform popular interpretability approaches, often resulting in more robust and expressive information.
π Website: https://pair-code.github.io/interpretability/patchscopes/
π Paper: Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models (2401.06102)