@nroggendorff on Hugging Face: "Updated https://huggingface.co/blog/nroggendorff/train-with-llama-architecture…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

nroggendorff

posted an update Jul 14

Post

2045

Updated https://huggingface.co/blog/nroggendorff/train-with-llama-architecture so you can "train" your own tokenizer from your dataset.

nroggendorff

Jul 14

cc @osanseviero

LeroyDyer

Jul 15

very good !
maybe a colab !
could this be used to extend a tokenizer model with training ?
as i would like to update my mistral tokenizer to include forign chars, such as hebrew and amaric, and hindi

nroggendorff

Jul 15

Im pretty sure you can add additional tokens and special tokens, so I suppose

In this post

nroggendorff Noa Roggendorff
LeroyDyer leroy Samuel Dyer