tangled-llama-v-128k-base-v0.1 / scripts /prepare_pretrain_dataset.py

Commit History

general pretrain data generation
c721106

mtasic85 commited on

general pretrain data generation
716dba4

mtasic85 commited on

pretrain dataset
049be21

mtasic85 commited on

pretrain dataset
012c999

mtasic85 commited on

pretrain dataset
d8a12fc

mtasic85 commited on

pretrain dataset
55dc3c2

mtasic85 commited on

pretrain dataset
9255616

mtasic85 commited on

pretrain dataset
41211a9

mtasic85 commited on

pretrain dataset
0abae5c

mtasic85 commited on

new tokenizer 38400
54c27fe

mtasic85 commited on

fixed smaller pretrain dataset
80f3ec1

mtasic85 commited on

smaller pretrain dataset
854297e

mtasic85 commited on

pretrain model
7854677

mtasic85 commited on

pretrain model
cbbac33

mtasic85 commited on

pretrain model
1816ac6

mtasic85 commited on

pretrain model
abd5982

mtasic85 commited on

new tokenizer 38400
c62a845

mtasic85 commited on

trained new 128k tokenizer
fd468b1

mtasic85 commited on