gaodrew commited on
Commit
5ac7f22
1 Parent(s): a1b1220

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -6,4 +6,8 @@ datasets:
6
  language:
7
  - la
8
  ---
9
- Pretrained from scratch using GPT-2 architecture and a dataset of Latin texts ([Corpus Corporum](https://huggingface.co/datasets/Fece228/latin-literature-dataset-170M))
 
 
 
 
 
6
  language:
7
  - la
8
  ---
9
+ Pretrained from scratch using GPT-2 architecture and a dataset of Latin texts ([Corpus Corporum](https://huggingface.co/datasets/Fece228/latin-literature-dataset-170M))
10
+ 64 token context, loss 4.5, trained on 1 epoch of 492 million tokens
11
+ GPT2 style tokenizer trained with min_frequency of 2000
12
+
13
+ Tends to get repetitive and is not very coherent, due to size and limited data.