Llama Scope

Use with OpenMOSS lm_sae Github Repo

[Use with SAELens (In progress)]

[Explore in Neuronpedia (In progress)]

Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 improved TopK SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features.

This is a frontpage of all Llama Scope SAEs. Please see the following link for checkpoints.

Naming Convention

L[Layer][Position]-[Expansion]x

For instance, an SAE with 8x the hidden size of Llama-3.1-8B, i.e. 32K features, trained on the 15th post-MLP residual stream is called L15R-8x.

Checkpoints

Llama-3.1-8B-LXR-8x

Llama-3.1-8B-LXA-8x

Llama-3.1-8B-LXM-8x

Llama-3.1-8B-LXTC-8x

Llama-3.1-8B-LXR-32x

Llama-3.1-8B-LXA-32x

Llama-3.1-8B-LXM-32x

Llama-3.1-8B-LXTC-32x

Llama Scope SAE Overview

	Llama Scope	Scaling Monosemanticity	GPT-4 SAE	Gemma Scope
Models	Llama-3.1 8B (Open Source)	Claude-3.0 Sonnet (Proprietary)	GPT-4 (Proprietary)	Gemma-2 2B & 9B (Open Source)
SAE Training Data	SlimPajama	Proprietary	Proprietary	Proprietary, Sampled from Mesnard et al. (2024)
SAE Position (Layer)	Every Layer	The Middle Layer	5/6 Late Layer	Every Layer
SAE Position (Site)	R, A, M, TC	R	R	R, A, M, TC
SAE Width (# Features)	32K, 128K	1M, 4M, 34M	128K, 1M, 16M	16K, 64K, 128K, 256K - 1M (Partial)
SAE Width (Expansion Factor)	8x, 32x	Proprietary	Proprietary	4.6x, 7.1x, 28.5x, 36.6x
Activation Function	TopK-ReLU	ReLU	TopK-ReLU	JumpReLU

Citation

Please cite as:

@article{he2024llamascope,
  title={Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders},
  author={He, Zhengfu and Shu, Wentao and Ge, Xuyang and Chen, Lingjie and Wang, Junxuan and Zhou, Yunhua and Liu, Frances and Guo, Qipeng and Huang, Xuanjing and Wu, Zuxuan and others},
  journal={arXiv preprint arXiv:2410.20526},
  year={2024}
}

fnlp
/

Llama-Scope

Llama Scope

Naming Convention

Checkpoints

Llama Scope SAE Overview

Citation

Model tree for fnlp/Llama-Scope