microsoft
/

Phi-3-mini-4k-instruct

@@ -56,6 +56,10 @@ The current `transformers` version can be verified with: `pip list | grep transf
 Phi-3 Mini-4K-Instruct is also available in [HuggingChat](https://aka.ms/try-phi3-hf-chat).
 ### Chat Format
 Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows.

 Phi-3 Mini-4K-Instruct is also available in [HuggingChat](https://aka.ms/try-phi3-hf-chat).
+### Tokenizer
+Phi-3 Mini-4K-Instruct supports a vocabulary size of up to `32064` tokens. The [tokenizer files](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct/blob/main/added_tokens.json) already provide placeholder tokens that can be used for downstream fine-tuning, but they can also be extended up to the model's vocabulary size.
 ### Chat Format
 Given the nature of the training data, the Phi-3 Mini-4K-Instruct model is best suited for prompts using the chat format as follows.

added_tokens.json CHANGED Viewed

@@ -9,5 +9,5 @@
   "<|end|>": 32007,
   "<|placeholder5|>": 32008,
   "<|placeholder6|>": 32009,
-  "<|user|>": 320010
 }

   "<|end|>": 32007,
   "<|placeholder5|>": 32008,
   "<|placeholder6|>": 32009,
+  "<|user|>": 32010
 }