File size: 494 Bytes
f8757fc
 
a1b1220
 
 
 
 
f8757fc
5ac7f22
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
library_name: transformers
license: apache-2.0
datasets:
- Fece228/latin-literature-dataset-170M
language:
- la
---
Pretrained from scratch using GPT-2 architecture and a dataset of Latin texts ([Corpus Corporum](https://huggingface.co/datasets/Fece228/latin-literature-dataset-170M))
64 token context, loss 4.5, trained on 1 epoch of 492 million tokens
GPT2 style tokenizer trained with min_frequency of 2000

Tends to get repetitive and is not very coherent, due to size and limited data.