Edit model card

(!) Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the <unk> tokens in dataset will be split into the '<', 'unk' and '>' tokens

  • Full context (1024) perplexity on test set: 13.68

Dependence of the cross entropy loss on the length of the context for prediction

  • x-axis*128 = context length
  • y-axis = cross entropy

image/png

Downloads last month
61
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train irodkin/gpt2-wiki103