microsoft
/

Phi-3-mini-128k-instruct

@@ -53,6 +53,10 @@ Phi-3 Mini-128K-Instruct has been integrated in the development version (4.40.0)
 The current `transformers` version can be verified with: `pip list | grep transformers`.
 ### Chat Format
 Given the nature of the training data, the Phi-3 Mini-128K-Instruct model is best suited for prompts using the chat format as follows.

 The current `transformers` version can be verified with: `pip list | grep transformers`.
+### Tokenizer
+Phi-3 Mini-128K-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
 ### Chat Format
 Given the nature of the training data, the Phi-3 Mini-128K-Instruct model is best suited for prompts using the chat format as follows.

added_tokens.json CHANGED Viewed

@@ -9,5 +9,5 @@
   "<|end|>": 32007,
   "<|placeholder5|>": 32008,
   "<|placeholder6|>": 32009,
-  "<|user|>": 320010
 }

   "<|end|>": 32007,
   "<|placeholder5|>": 32008,
   "<|placeholder6|>": 32009,
+  "<|user|>": 32010
 }