MrGonao/Mamba-Tiny-Stories-Pre

models are in models/

names are model_dimension and n_layers (768-8 is not fully trained, but the loss is pretty flat)

inside models/old/ there are models that were trained on the non-cleaned dataset (with a tokenizer trained on that dataset)(I think all off them are fully trained, but some are missing from my wandb)

tok4096.model is of the cleaned dataset, tok4096_old.model is on the non_cleaned one

train_snakes.py is the training script (you need to change the outdir, d_model and n_layer). It initializes the mamba using the MambaLMHeadModel class.

model.py is where the MambaLMHeadModel class is defined.

context lenght is 256

MrGonao
/

Mamba-Tiny-Stories-Pre

Dataset used to train MrGonao/Mamba-Tiny-Stories-Pre