Intermediate Checkpoints
Warning: the checkpoints in this repo are not fully trained model.
For final model checkpoint, please see: https://huggingface.co/cerebras/Cerebras-GPT-13B
Usage of muP checkpoints
Note: Transformers does not support muP for all models, so we need a custom model class (BTLM-3B-8k-base). This causes a situation where users must either (1) enable trust_remote_code=True
when loading the model or (2) acknowledge the warning about code execution upon loading the model.
Uses and Limitations
Intended Use
The primary intended use is to further research into large language models. These models can be used as a foundation model for NLP, applications, ethics, and alignment research. Our primary intended users are researchers who are working to improve LLMs and practitioners seeking reference implementations, training setups, hyperparameters, or pre-trained models. We release these models with a fully permissive Apache license for the community to use freely.
You may fine-tune and adapt Cerebras-GPT models for deployment via either Cerebras Model Studio or third-party libraries. Further safety-related testing and mitigations should be applied beore using the Cerebras-GPT model family in production downstream applications.
Due to financial and compute budgets, Cerebras-GPT models were only trained and evaluated following the approaches described in the paper.
Out of Scope Use
Cerebras-GPT models are trained on the Pile, with English language only, and are not suitable for machine translation tasks.
Cerebras-GPT models have not been tuned for human-facing dialog applications like chatbots and will not respond to prompts in a similar way to models that have received instruction tuning or reinforcement learning from human feedback (RLHF) like Flan-T5 or ChatGPT. Cerebras-GPT models can be tuned using those methods.
Risk, Bias, Ethical Considerations
- Data: The Pile dataset has been thoroughly analyzed from various ethical standpoints such as toxicity analysis, gender bias, pejorative content, racially sensitive content etc. Please refer to Pile dataset references.
- Human life: The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
- Risks and harms: There can be distributional bias in the Pile dataset that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
- Mitigations: Only mitigations in standard Pile dataset pre-processing were employed when pre-training Cerebras-GPT.
Acknowledgements
We are thankful to all Cerebras engineers, past and present, that made this work possible.