update readme
Browse files
README.md
CHANGED
@@ -46,11 +46,19 @@ This repository provides a Japanese GPT-NeoX model of 3.6 billion parameters. Th
|
|
46 |
The RL data is the subset of the following dataset and has been translated into Japanese.
|
47 |
* [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
48 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
49 |
* **Authors**
|
50 |
|
51 |
[Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
|
52 |
|
53 |
-
|
54 |
# Limitations
|
55 |
* We found this verison of PPO model tends to generate repeated text more often than its SFT counterpart, and thus we set `repetition_penalty=1.1` for better generation performance. (*The same generation hyper-parameters are applied to the SFT model in aforementioned evaluation experiments.*) You can also explore other hyperparameter combinations that yield higher generation randomness/diversity for better generation quality, e.g. `temperature=0.9, repetition_penalty=1.0`.
|
56 |
|
|
|
46 |
The RL data is the subset of the following dataset and has been translated into Japanese.
|
47 |
* [Anthropic HH RLHF data](https://huggingface.co/datasets/Anthropic/hh-rlhf)
|
48 |
|
49 |
+
* **Model Series**
|
50 |
+
|
51 |
+
| Variant | Link |
|
52 |
+
| :-- | :--|
|
53 |
+
| 3.6B PPO | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo |
|
54 |
+
| 3.6B SFT-v2 | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft-v2 |
|
55 |
+
| 3.6B SFT | https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-sft |
|
56 |
+
| 3.6B pretrained | https://huggingface.co/rinna/japanese-gpt-neox-3.6b |
|
57 |
+
|
58 |
* **Authors**
|
59 |
|
60 |
[Tianyu Zhao](https://huggingface.co/tianyuz) and [Kei Sawada](https://huggingface.co/keisawada)
|
61 |
|
|
|
62 |
# Limitations
|
63 |
* We found this verison of PPO model tends to generate repeated text more often than its SFT counterpart, and thus we set `repetition_penalty=1.1` for better generation performance. (*The same generation hyper-parameters are applied to the SFT model in aforementioned evaluation experiments.*) You can also explore other hyperparameter combinations that yield higher generation randomness/diversity for better generation quality, e.g. `temperature=0.9, repetition_penalty=1.0`.
|
64 |
|