JunxiongWang
/

mamba_0_75_dpo_ep1

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

JunxiongWang commited on Sep 2

Commit

8360e47

•

1 Parent(s): 10e558d

Update README.md

Files changed (1) hide show

README.md +12 -0

README.md CHANGED Viewed

@@ -60,3 +60,15 @@ The following hyperparameters were used during training:
 - Pytorch 2.1.0+cu118
 - Datasets 2.20.0
 - Tokenizers 0.19.1

 - Pytorch 2.1.0+cu118
 - Datasets 2.20.0
 - Tokenizers 0.19.1
+[MambaInLlama](arxiv.org/abs/2408.15237)
+```
+@article{junxiongdaniele2024mambainllama,
+  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
+  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
+  journal = {arXiv preprint arXiv:2408.15237},
+  year    = {2024}
+}
+```