Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,11 @@ datasets:
|
|
17 |
|
18 |
# Dolphin 2.9.4 Gemma2 2b 🐬
|
19 |
|
20 |
-
Curated and trained by Eric Hartford and Cognitive Computations
|
|
|
|
|
|
|
|
|
21 |
|
22 |
[![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
|
23 |
Discord: https://discord.gg/h3K4XGj2RH
|
|
|
17 |
|
18 |
# Dolphin 2.9.4 Gemma2 2b 🐬
|
19 |
|
20 |
+
Curated and trained by Eric Hartford and Cognitive Computations.
|
21 |
+
|
22 |
+
This one is special because I used [GrokAdamW](https://github.com/cognitivecomputations/grokadamw) and [Liger Kernel](https://github.com/linkedin/Liger-Kernel)
|
23 |
+
|
24 |
+
GrokAdamW is intended to enable fast Grokking, to increase generalization. (I am not certain this occurred because this checkpoint is 4 epochs, and it probabaly take more epochs to achieve grok.)
|
25 |
|
26 |
[![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/h3K4XGj2RH)
|
27 |
Discord: https://discord.gg/h3K4XGj2RH
|