Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
yhavinga
/
dutch-tokenizer-arena
like
6
Running
App
Files
Files
Community
1
main
dutch-tokenizer-arena
/
utils
3 contributors
History:
20 commits
yhavinga
Add Llama tokenizer creation for Dutch, English, Code, Markdown and TeX.
c78da21
7 months ago
byte_util.py
0 Bytes
update
about 1 year ago
character_util.py
Safe
6.92 kB
add compression leaderboard
7 months ago
compression_util.py
Safe
7.26 kB
Add Llama tokenizer creation for Dutch, English, Code, Markdown and TeX.
7 months ago
convert_sp_to_json.py
Safe
54 Bytes
update
about 1 year ago
fn_util.py
0 Bytes
add more tokenizers
12 months ago
lang_util.py
Safe
3.45 kB
add compression leaderboard
7 months ago
lang_util_2.py
Safe
3.05 kB
update
7 months ago
log_util.py
Safe
285 Bytes
update
about 1 year ago
oov_util.py
Safe
265 Bytes
update
about 1 year ago
speed_util.py
Safe
77 Bytes
update
10 months ago
symbol.py
Safe
1.28 kB
update
about 1 year ago
text_util.py
Safe
671 Bytes
add compression leaderboard
7 months ago
vocab.jd.txt.v2
Safe
47.7 kB
update
12 months ago