Edit model card

(!) Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the <unk> tokens in dataset will be split into the '<', 'unk' and '>' tokens

Full context (1024) perplexity on test set: 13.68

Dependence of the cross entropy loss on the length of the context for prediction

x-axis*128 = context length
y-axis = cross entropy

Downloads last month: 61

Safetensors

Model size

124M params

Tensor type

F32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

irodkin
/

gpt2-wiki103

Dataset used to train irodkin/gpt2-wiki103