hanyinwang
commited on
Commit
•
0bf6223
1
Parent(s):
fe7753f
Update README.md
Browse files
README.md
CHANGED
@@ -5,39 +5,48 @@ tags:
|
|
5 |
- ppo
|
6 |
- transformers
|
7 |
- reinforcement-learning
|
|
|
|
|
8 |
---
|
9 |
|
10 |
# TRL Model
|
11 |
|
12 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
13 |
-
guide the model outputs according to a
|
|
|
14 |
|
15 |
## Usage
|
16 |
|
17 |
-
To use this model for inference, first install the TRL library:
|
18 |
-
|
19 |
-
```bash
|
20 |
-
python -m pip install trl
|
21 |
-
```
|
22 |
-
|
23 |
-
You can then generate text as follows:
|
24 |
-
|
25 |
-
```python
|
26 |
-
from transformers import pipeline
|
27 |
-
|
28 |
-
generator = pipeline("text-generation", model="hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
|
29 |
-
outputs = generator("Hello, my llama is cute")
|
30 |
-
```
|
31 |
-
|
32 |
-
If you want to use the model for training or to obtain the outputs from the value head, load the model as follows:
|
33 |
-
|
34 |
```python
|
35 |
from transformers import AutoTokenizer
|
36 |
from trl import AutoModelForCausalLMWithValueHead
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- ppo
|
6 |
- transformers
|
7 |
- reinforcement-learning
|
8 |
+
language:
|
9 |
+
- en
|
10 |
---
|
11 |
|
12 |
# TRL Model
|
13 |
|
14 |
This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
|
15 |
+
guide the model outputs according to a simulated human feedback. The model was fine-tuned for classification of cancer / diabetes based on clinical notes.
|
16 |
+
|
17 |
|
18 |
## Usage
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
```python
|
21 |
from transformers import AutoTokenizer
|
22 |
from trl import AutoModelForCausalLMWithValueHead
|
23 |
|
24 |
+
tokenizer_kwargs = {
|
25 |
+
"padding": "max_length",
|
26 |
+
"truncation": True,
|
27 |
+
"return_tensors": "pt",
|
28 |
+
"padding_side": "left"
|
29 |
+
}
|
30 |
+
|
31 |
+
generation_kwargs = {
|
32 |
+
"min_length": -1,
|
33 |
+
"top_k": 40,
|
34 |
+
"top_p": 0.95,
|
35 |
+
"do_sample": True,
|
36 |
+
"pad_token_id": trained_tokenizer.eos_token_id,
|
37 |
+
"max_new_tokens":11,
|
38 |
+
"temperature":0.1,
|
39 |
+
"repetition_penalty":1.2
|
40 |
+
}
|
41 |
+
|
42 |
+
tokenizer = AutoTokenizer.from_pretrained("hanyinwang/layer-project-rlhf-mistral", **tokenizer_kwargs)
|
43 |
+
tokenizer.pad_token = tokenizer.eos_token
|
44 |
+
|
45 |
+
model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang/layer-project-rlhf-mistral")
|
46 |
+
|
47 |
+
query_tensors = tokenizer.encode(<prompt>, return_tensors="pt")
|
48 |
+
prompt_length = input_ids.shape[1]
|
49 |
+
|
50 |
+
outputs = model(query_tensors, **generation_kwargs)
|
51 |
+
response = tokenizer.decode(outputs[prompt_length:])
|
52 |
+
```
|