3/27 update
Browse files
README.md
CHANGED
@@ -11,7 +11,7 @@ pipeline_tag: text-generation
|
|
11 |
---
|
12 |
|
13 |
# Cerebras-GPT 111M
|
14 |
-
|
15 |
|
16 |
## Model Description
|
17 |
|
@@ -175,25 +175,8 @@ Cerebras-GPT models have not been tuned for human-facing dialog applications lik
|
|
175 |
* **Risks and harms**: There can be distributional bias in the Pile dataset that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
|
176 |
* **Mitigations**: Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT.
|
177 |
|
178 |
-
|
179 |
-
|
180 |
<br><br>
|
181 |
|
182 |
-
## Citation and Related Information
|
183 |
-
|
184 |
-
### BibTeX entry
|
185 |
-
|
186 |
-
To cite this model:
|
187 |
-
```bibtex
|
188 |
-
@misc{Cerebras-GPT,
|
189 |
-
author = {Nolan Dey and Gurpreet Gosal and Charles Chen and Hemant Khachane and Ribhu Pathria and William Marshall and Marvin Tom and Joel Hestness},
|
190 |
-
title = {GPT-3 Scaling Laws for the PILE Dataset, Trained on the Cerebras Wafer-Scale Engine},
|
191 |
-
year = {2023},
|
192 |
-
month = {March},
|
193 |
-
howpublished = {\url{TODO: arXiv link}}
|
194 |
-
}
|
195 |
-
```
|
196 |
-
|
197 |
## Acknowledgements
|
198 |
|
199 |
We are thankful to all Cerebras engineers, past and present, that made this work possible.
|
|
|
11 |
---
|
12 |
|
13 |
# Cerebras-GPT 111M
|
14 |
+
Check out our [Blog Post](https://www.cerebras.net/cerebras-gpt). Our arXiv paper is coming soon!
|
15 |
|
16 |
## Model Description
|
17 |
|
|
|
175 |
* **Risks and harms**: There can be distributional bias in the Pile dataset that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
|
176 |
* **Mitigations**: Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT.
|
177 |
|
|
|
|
|
178 |
<br><br>
|
179 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
180 |
## Acknowledgements
|
181 |
|
182 |
We are thankful to all Cerebras engineers, past and present, that made this work possible.
|