gpt2_distily

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
No log	0	0	13088.9219
2126.0	0.0323	2000	1936.6752
1836.0	0.0646	4000	1773.7856
1808.0	0.0970	6000	1547.3760
1546.0	0.1293	8000	1487.7600
1576.0	0.1616	10000	1449.2288
1548.0	0.1939	12000	1413.3920
1246.0	0.2263	14000	1387.0656
1516.0	0.2586	16000	1364.6768
1330.0	0.2909	18000	1346.6160
1284.0	0.3232	20000	1332.1040
1290.0	0.3556	22000	1320.7792
1390.0	0.3879	24000	1310.2496
1568.0	0.4202	26000	1302.9937
1404.0	0.4525	28000	1299.0112
1528.0	0.4848	30000	1293.5887
1263.0	0.5172	32000	1290.0032
1294.0	0.5495	34000	1287.8672
1355.0	0.5818	36000	1285.7808
1300.0	0.6141	38000	1283.5009
1368.0	0.6465	40000	1282.9136
1496.0	0.6788	42000	1281.6096
1502.0	0.7111	44000	1281.7408
1352.0	0.7434	46000	1280.9344
1418.0	0.7758	48000	1280.6288
1158.0	0.8081	50000	1280.5760
1534.0	0.8404	52000	1280.4000
1276.0	0.8727	54000	1280.4032
1184.0	0.9051	56000	1280.4160
1370.0	0.9374	58000	1280.4320
1210.0	0.9697	60000	1280.4192