Multi-token prediction models and baselines
Models accompanying the research paper "Better & Faster Large Language Models via Multi-token Prediction" (https://arxiv.org/abs/2404.19737).
Included are the following four 7B parameter models trained on code:
- baseline model (
n=1
) trained on 200B tokens of code: 7B_200B_1/ - multi-token prediction model (
n=4
) trained on 200B tokens of code: 7B_200B_4/ - baseline model (
n=1
) trained on 1T tokens of code: 7B_1T_1/ - multi-token prediction model (
n=4
) trained on 1T tokens of code: 7B_1T_4/
Tokenizer: standard Llama 2 SentencePiece tokenizer in tokenizer.model.
Quickstart
Install torch
, fairscale
, fire
and sentencepiece
and run
torchrun --nproc_per_node 1 example_completion.py --ckpt_dir 7B_200B_4/ --tokenizer_path tokenizer.model --max_seq_len 128 --max_batch_size 2
replacing 7B_200B_4
by the respective checkpoint directory.
Format
The Pytorch state_dicts
are compatible with Llama format: the layers of the shared trunk and the next-token prediction head layer are numbered contiguously. Additional prediction heads for tokens further in the future are names extra_heads
and can be ignored for standard autoregressive inference.
The implementation of forward()
in llama/model.py provides an additional argument return_all_heads
. If set, the additional prediction heads are called and the logits are returned in shape (batch_size, seq_len, n_future_tokens, vocab_size)
. Otherwise, the logit's shape is (batch_size, seq_len, 1, vocab_size)
.
Citation
Gloeckle, F., Idrissi, B. Y., Rozière, B., Lopez-Paz, D., & Synnaeve, G. (2024). Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737.
Bibtex entry:
@article{gloeckle2024better,
title={Better \& faster large language models via multi-token prediction},
author={Gloeckle, Fabian and Idrissi, Badr Youbi and Rozi{\`e}re, Baptiste and Lopez-Paz, David and Synnaeve, Gabriel},
journal={arXiv preprint arXiv:2404.19737},
year={2024}
}
Feedback and comments
Please report risks as indicated in the Acceptable Use Policy and address bugs and any other comments to the corresponding authors as indicated in the research paper.