Hugging Face
Models
Datasets
Spaces
Posts
Docs
Solutions
Pricing
Log In
Sign Up
HuggingFaceTB
/
cosmo2-tokenizer
like
1
Follow
Hugging Face TB Research
358
Transformers
HuggingFaceTB/cosmo2_training_data_subset_1M
Inference Endpoints
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
Edit model card
cosmo2-tokenizer
cosmo2-tokenizer
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from:
FineWeb-Edu 70%
Cosmopedia v2 15%
StarCoderData 8%
OpenWebMath 5%
StackOverFlow 2%
Downloads last month
-
Downloads are not tracked for this model.
How to track
Inference API
Unable to determine this model’s pipeline type. Check the
docs
.