Update README.md
Browse filesAdd "Training Configuration" details.
README.md
CHANGED
@@ -145,6 +145,11 @@ For more details on the pretraining process, see [MPT-7B](https://huggingface.co
|
|
145 |
|
146 |
The data was tokenized using the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
|
147 |
|
|
|
|
|
|
|
|
|
|
|
148 |
## Limitations and Biases
|
149 |
|
150 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|
|
|
145 |
|
146 |
The data was tokenized using the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
|
147 |
|
148 |
+
### Training Configuration
|
149 |
+
|
150 |
+
This model was trained on 8 A100-80GBs for about 2 days using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
151 |
+
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the [LION](https://arxiv.org/abs/2302.06675) optimizer.
|
152 |
+
|
153 |
## Limitations and Biases
|
154 |
|
155 |
_The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
|