hanyinwang commited on
Commit
0bf6223
1 Parent(s): fe7753f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -24
README.md CHANGED
@@ -5,39 +5,48 @@ tags:
5
  - ppo
6
  - transformers
7
  - reinforcement-learning
 
 
8
  ---
9
 
10
  # TRL Model
11
 
12
  This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
13
- guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
 
14
 
15
  ## Usage
16
 
17
- To use this model for inference, first install the TRL library:
18
-
19
- ```bash
20
- python -m pip install trl
21
- ```
22
-
23
- You can then generate text as follows:
24
-
25
- ```python
26
- from transformers import pipeline
27
-
28
- generator = pipeline("text-generation", model="hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
29
- outputs = generator("Hello, my llama is cute")
30
- ```
31
-
32
- If you want to use the model for training or to obtain the outputs from the value head, load the model as follows:
33
-
34
  ```python
35
  from transformers import AutoTokenizer
36
  from trl import AutoModelForCausalLMWithValueHead
37
 
38
- tokenizer = AutoTokenizer.from_pretrained("hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
39
- model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
40
-
41
- inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
42
- outputs = model(**inputs, labels=inputs["input_ids"])
43
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - ppo
6
  - transformers
7
  - reinforcement-learning
8
+ language:
9
+ - en
10
  ---
11
 
12
  # TRL Model
13
 
14
  This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
15
+ guide the model outputs according to a simulated human feedback. The model was fine-tuned for classification of cancer / diabetes based on clinical notes.
16
+
17
 
18
  ## Usage
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ```python
21
  from transformers import AutoTokenizer
22
  from trl import AutoModelForCausalLMWithValueHead
23
 
24
+ tokenizer_kwargs = {
25
+ "padding": "max_length",
26
+ "truncation": True,
27
+ "return_tensors": "pt",
28
+ "padding_side": "left"
29
+ }
30
+
31
+ generation_kwargs = {
32
+ "min_length": -1,
33
+ "top_k": 40,
34
+ "top_p": 0.95,
35
+ "do_sample": True,
36
+ "pad_token_id": trained_tokenizer.eos_token_id,
37
+ "max_new_tokens":11,
38
+ "temperature":0.1,
39
+ "repetition_penalty":1.2
40
+ }
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("hanyinwang/layer-project-rlhf-mistral", **tokenizer_kwargs)
43
+ tokenizer.pad_token = tokenizer.eos_token
44
+
45
+ model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang/layer-project-rlhf-mistral")
46
+
47
+ query_tensors = tokenizer.encode(<prompt>, return_tensors="pt")
48
+ prompt_length = input_ids.shape[1]
49
+
50
+ outputs = model(query_tensors, **generation_kwargs)
51
+ response = tokenizer.decode(outputs[prompt_length:])
52
+ ```