hanyinwang
/

layer-project-diagnostic-mistral

Reinforcement Learning

Inference Endpoints

Model card Files Files and versions Community

hanyinwang commited on May 3

Commit

0bf6223

•

1 Parent(s): fe7753f

Update README.md

Files changed (1) hide show

README.md +33 -24

README.md CHANGED Viewed

@@ -5,39 +5,48 @@ tags:
 - ppo
 - transformers
 - reinforcement-learning
 ---
 # TRL Model
 This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
- guide the model outputs according to a value, function, or human feedback. The model can be used for text generation.
 ## Usage
-To use this model for inference, first install the TRL library:
-```bash
-python -m pip install trl
-```
-You can then generate text as follows:
-```python
-from transformers import pipeline
-generator = pipeline("text-generation", model="hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
-outputs = generator("Hello, my llama is cute")
-```
-If you want to use the model for training or to obtain the outputs from the value head, load the model as follows:
 ```python
 from transformers import AutoTokenizer
 from trl import AutoModelForCausalLMWithValueHead
-tokenizer = AutoTokenizer.from_pretrained("hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
-model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang//tmp/tmpgsc6yhsr/hanyinwang/layer-project-rlhf-mistral")
-inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
-outputs = model(**inputs, labels=inputs["input_ids"])
-```

 - ppo
 - transformers
 - reinforcement-learning
+language:
+- en
 ---
 # TRL Model
 This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to
+ guide the model outputs according to a simulated human feedback. The model was fine-tuned for classification of cancer / diabetes based on clinical notes.
 ## Usage
 ```python
 from transformers import AutoTokenizer
 from trl import AutoModelForCausalLMWithValueHead
+tokenizer_kwargs = {
+  "padding": "max_length",
+  "truncation": True,
+  "return_tensors": "pt",
+  "padding_side": "left"
+              }
+generation_kwargs = {
+  "min_length": -1,
+  "top_k": 40,
+  "top_p": 0.95,
+  "do_sample": True,
+  "pad_token_id": trained_tokenizer.eos_token_id,
+  "max_new_tokens":11,
+  "temperature":0.1,
+  "repetition_penalty":1.2
+}
+tokenizer = AutoTokenizer.from_pretrained("hanyinwang/layer-project-rlhf-mistral", **tokenizer_kwargs)
+tokenizer.pad_token = tokenizer.eos_token
+model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang/layer-project-rlhf-mistral")
+query_tensors = tokenizer.encode(<prompt>, return_tensors="pt")
+prompt_length = input_ids.shape[1]
+outputs = model(query_tensors, **generation_kwargs)
+response = tokenizer.decode(outputs[prompt_length:])
+```