ncoop57 commited on
Commit
9005453
1 Parent(s): 46a34ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -20,7 +20,7 @@ extra_gated_fields:
20
 
21
  ## Model Description
22
 
23
- `Stable Zephyr 3B` is a 3 billion parameter instruction tuned inspired by [HugginFaceH4's Zephyr 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) training pipeline this model was trained on a mix of publicly available datasets, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), evaluation for this model based on
24
  [MT Bench](https://tatsu-lab.github.io/alpaca_eval/) and [Alpaca Benchmark](https://tatsu-lab.github.io/alpaca_eval/)
25
 
26
  ## Usage
@@ -29,7 +29,7 @@ Get started generating text with `Stable Zephyr 3B` by using the following code
29
 
30
  ```python
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
32
- tokenizer = AutoTokenizer.from_pretrained("stabilityai/stable-zephyr-3b-dpo")
33
  model = AutoModelForCausalLM.from_pretrained(
34
  "stable-zephyr-3b",
35
  trust_remote_code=True,
@@ -51,7 +51,7 @@ print(tokenizer.decode(tokens[0], skip_special_tokens=True))
51
  ## Model Details
52
 
53
  * **Developed by**: [Stability AI](https://stability.ai/)
54
- * **Model type**: `Stable Zephyr 3B` models are auto-regressive language models based on the transformer decoder architecture.
55
  * **Language(s)**: English
56
  * **Library**: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git)
57
  * **Finetuned from model**: [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
@@ -81,7 +81,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
81
 
82
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
83
  |-------------|-----|----|---------------|--------------|
84
- | **Stable Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
85
  | Stable Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
86
  | MPT-Chat | 7B |dSFT |5.42| -|
87
  | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
@@ -181,7 +181,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
181
 
182
  ### Training Infrastructure
183
 
184
- * **Hardware**: `Stable Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.
185
  * **Code Base**: We use our internal script for SFT steps and used [HuggingFace Alignment Handbook script](https://github.com/huggingface/alignment-handbook) for DPO training.
186
  ## Use and Limitations
187
 
 
20
 
21
  ## Model Description
22
 
23
+ `StableLM Zephyr 3B` is a 3 billion parameter instruction tuned inspired by [HugginFaceH4's Zephyr 7B](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) training pipeline this model was trained on a mix of publicly available datasets, synthetic datasets using [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290), evaluation for this model based on
24
  [MT Bench](https://tatsu-lab.github.io/alpaca_eval/) and [Alpaca Benchmark](https://tatsu-lab.github.io/alpaca_eval/)
25
 
26
  ## Usage
 
29
 
30
  ```python
31
  from transformers import AutoModelForCausalLM, AutoTokenizer
32
+ tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablelm-zephyr-3b-dpo")
33
  model = AutoModelForCausalLM.from_pretrained(
34
  "stable-zephyr-3b",
35
  trust_remote_code=True,
 
51
  ## Model Details
52
 
53
  * **Developed by**: [Stability AI](https://stability.ai/)
54
+ * **Model type**: `StableLM Zephyr 3B` models are auto-regressive language models based on the transformer decoder architecture.
55
  * **Language(s)**: English
56
  * **Library**: [Alignment Handbook](https://github.com/huggingface/alignment-handbook.git)
57
  * **Finetuned from model**: [stabilityai/stablelm-3b-4e1t](https://huggingface.co/stabilityai/stablelm-3b-4e1t)
 
81
 
82
  | Model | Size | Alignment | MT-Bench (score) | AlpacaEval (win rate %) |
83
  |-------------|-----|----|---------------|--------------|
84
+ | **StableLM Zephyr 3B** 🪁 | 3B | DPO | 6.64 | 76.00 |
85
  | Stable Zephyr (SFT only) | 3B | SFT | 6.04 | 71.15 |
86
  | MPT-Chat | 7B |dSFT |5.42| -|
87
  | Xwin-LMv0.1 | 7B| dPPO| 6.19| 87.83|
 
181
 
182
  ### Training Infrastructure
183
 
184
+ * **Hardware**: `StableLM Zephyr 3B` was trained on the Stability AI cluster across 8 nodes with 8 A100 80GBs GPUs for each nodes.
185
  * **Code Base**: We use our internal script for SFT steps and used [HuggingFace Alignment Handbook script](https://github.com/huggingface/alignment-handbook) for DPO training.
186
  ## Use and Limitations
187