Multilingual capabilities?

#8
by joujiboi - opened

I have tried out the model for Japanese and in my small testing it understands the language fine. That said, it would be useful to know how much multilingual capabilities were prioritised in the creation of the dataset and tokenisation. For example, how much of the dataset is not English? What about just Japanese?

Sign up or log in to comment