LLamaStory-70M / README.md
erfanzar's picture
Update README.md
a48b7af
metadata
license: mit
datasets:
  - qwedsacf/story-generation
language:
  - en

LLamaStory-70M is a LLama Model Pre-trained on a story-generation dataset

About Training:

  • EasyDel Platform Used
  • TPU-v4
  • batch-size 2048
  • max positioning embedding 512
  • 12 Epochs (yet)

this model will be used to Debug 4 and 8 bit training and inference in JAX and Rust with EasyDel