Post
2045
Updated https://huggingface.co/blog/nroggendorff/train-with-llama-architecture so you can "train" your own tokenizer from your dataset.
Join the community of Machine Learners and AI enthusiasts.
Sign Upvery good !
maybe a colab !
could this be used to extend a tokenizer model with training ?
as i would like to update my mistral tokenizer to include forign chars, such as hebrew and amaric, and hindi
Im pretty sure you can add additional tokens and special tokens, so I suppose