rccmsu commited on
Commit
7f84342
1 Parent(s): eb8cece

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -11,6 +11,8 @@ WARNING! Load tokenizer as AutoTokenizer.from_pretrained(model_path, use_fast=Tr
11
 
12
  Up to 60% faster generation and 35% training (on identical russian text sequences!) with HF because of different tokenizer.
13
 
 
 
14
  ## Training procedure
15
 
16
  ruadapt mistral trained on saiga corpuses.
 
11
 
12
  Up to 60% faster generation and 35% training (on identical russian text sequences!) with HF because of different tokenizer.
13
 
14
+ Paper: Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //arXiv preprint arXiv:2312.02598. – 2023.
15
+
16
  ## Training procedure
17
 
18
  ruadapt mistral trained on saiga corpuses.