Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,65 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
GPT-R [Ronin]
|
2 |
+
|
3 |
+
This is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1.
|
4 |
+
|
5 |
+
- Intended Merge Value -
|
6 |
+
As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
|
7 |
+
GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical
|
8 |
+
achievements are blended with the intent to elevate the strengths of
|
9 |
+
both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
|
10 |
+
the largest impact on the usefulness of a model without the expense of
|
11 |
+
fine-tuning. Blend was done in FP32 and output in FP16.
|
12 |
+
|
13 |
+
-Intended Use-
|
14 |
+
|
15 |
+
Research purposes only, intended for responsible use.
|
16 |
+
Express a task in natural language, and GPT-R will do the thing.
|
17 |
+
Try telling it "Write an article about X but put Y spin on it.",
|
18 |
+
"Write a five step numbered guide on how to do X.", or any other
|
19 |
+
basic instructions. It does its best.
|
20 |
+
|
21 |
+
Can also be used as a base to merge with conversational,
|
22 |
+
story writing, or adventure themed models of the same class
|
23 |
+
(GPT-J & 6b NeoX) and parameter size (6b) to experiment with
|
24 |
+
the morphology of model weights based on the value added
|
25 |
+
by instruct.
|
26 |
+
|
27 |
+
Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers
|
28 |
+
disabled.
|
29 |
+
|
30 |
+
-Credits to-
|
31 |
+
|
32 |
+
Core Model:
|
33 |
+
https://huggingface.co/EleutherAI/gpt-j-6B
|
34 |
+
Author:
|
35 |
+
https://www.eleuther.ai/
|
36 |
+
|
37 |
+
Model1; 60% ppo_hh_gpt-j:
|
38 |
+
https://huggingface.co/reciprocate/ppo_hh_gpt-j
|
39 |
+
Author Repo:
|
40 |
+
https://huggingface.co/reciprocate
|
41 |
+
Related; CarperAI:
|
42 |
+
https://huggingface.co/CarperAI
|
43 |
+
Dataset is a variant of the Helpful Harmless assistant themed
|
44 |
+
dataset and Proximal Policy Optimization, specific datasets
|
45 |
+
used are unknown; listed repo datasets include:
|
46 |
+
https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
|
47 |
+
https://huggingface.co/datasets/reciprocate/hh_eval_ilql
|
48 |
+
|
49 |
+
PPO explained:
|
50 |
+
https://paperswithcode.com/method/ppo
|
51 |
+
Potential HH-type datasets utilized:
|
52 |
+
https://huggingface.co/HuggingFaceH4
|
53 |
+
https://huggingface.co/datasets/Anthropic/hh-rlhf
|
54 |
+
|
55 |
+
Model2; 40% GPT-JT-6B-V1:
|
56 |
+
https://huggingface.co/togethercomputer/GPT-JT-6B-v1
|
57 |
+
Author Repo:
|
58 |
+
https://huggingface.co/togethercomputer
|
59 |
+
Related; BigScience:
|
60 |
+
https://huggingface.co/bigscience
|
61 |
+
Datasets:
|
62 |
+
https://huggingface.co/datasets/the_pile
|
63 |
+
https://huggingface.co/datasets/bigscience/P3
|
64 |
+
https://github.com/allenai/natural-instructions
|
65 |
+
https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html
|