(!) Don't forget to preprocess unknown_tokens and substitute them with <|endoftext|>. Otherwise the <unk> tokens in dataset will be split into the '<', 'unk' and '>' tokens
- Full context (1024) perplexity on test set: 13.68
Dependence of the cross entropy loss on the length of the context for prediction
- x-axis*128 = context length
- y-axis = cross entropy
- Downloads last month
- 61
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.