digitous commited on
Commit
5945354
1 Parent(s): 038f5b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -6
README.md CHANGED
@@ -1,6 +1,65 @@
1
- ---
2
- license: creativeml-openrail-m
3
- language:
4
- - en
5
- pipeline_tag: text-generation
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ GPT-R [Ronin]
2
+
3
+ This is an experimental model containing a parameter-wise 60/40 blend (weighted average) of the weights of ppo_hh_gpt-j and GPT-JT-6B-v1.
4
+
5
+ - Intended Merge Value -
6
+ As with fine-tuning, merging weights does not add information but transforms it, therefore it is important to consider trade-offs.
7
+ GPT-Ronin combines ppo_hh_gpt-j and GPT-JT; both technical
8
+ achievements are blended with the intent to elevate the strengths of
9
+ both. Datasets of both are linked below to assist in exploratory speculation on which datasets in what quantity and configuration have
10
+ the largest impact on the usefulness of a model without the expense of
11
+ fine-tuning. Blend was done in FP32 and output in FP16.
12
+
13
+ -Intended Use-
14
+
15
+ Research purposes only, intended for responsible use.
16
+ Express a task in natural language, and GPT-R will do the thing.
17
+ Try telling it "Write an article about X but put Y spin on it.",
18
+ "Write a five step numbered guide on how to do X.", or any other
19
+ basic instructions. It does its best.
20
+
21
+ Can also be used as a base to merge with conversational,
22
+ story writing, or adventure themed models of the same class
23
+ (GPT-J & 6b NeoX) and parameter size (6b) to experiment with
24
+ the morphology of model weights based on the value added
25
+ by instruct.
26
+
27
+ Merge tested using KoboldAI with Nucleus Sampling Top-P set to 0.7, Temperature at 0.5, and Repetition Penalty at 1.14; extra samplers
28
+ disabled.
29
+
30
+ -Credits to-
31
+
32
+ Core Model:
33
+ https://huggingface.co/EleutherAI/gpt-j-6B
34
+ Author:
35
+ https://www.eleuther.ai/
36
+
37
+ Model1; 60% ppo_hh_gpt-j:
38
+ https://huggingface.co/reciprocate/ppo_hh_gpt-j
39
+ Author Repo:
40
+ https://huggingface.co/reciprocate
41
+ Related; CarperAI:
42
+ https://huggingface.co/CarperAI
43
+ Dataset is a variant of the Helpful Harmless assistant themed
44
+ dataset and Proximal Policy Optimization, specific datasets
45
+ used are unknown; listed repo datasets include:
46
+ https://huggingface.co/datasets/reciprocate/summarize_eval_ilql
47
+ https://huggingface.co/datasets/reciprocate/hh_eval_ilql
48
+
49
+ PPO explained:
50
+ https://paperswithcode.com/method/ppo
51
+ Potential HH-type datasets utilized:
52
+ https://huggingface.co/HuggingFaceH4
53
+ https://huggingface.co/datasets/Anthropic/hh-rlhf
54
+
55
+ Model2; 40% GPT-JT-6B-V1:
56
+ https://huggingface.co/togethercomputer/GPT-JT-6B-v1
57
+ Author Repo:
58
+ https://huggingface.co/togethercomputer
59
+ Related; BigScience:
60
+ https://huggingface.co/bigscience
61
+ Datasets:
62
+ https://huggingface.co/datasets/the_pile
63
+ https://huggingface.co/datasets/bigscience/P3
64
+ https://github.com/allenai/natural-instructions
65
+ https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html