--- license: apache-2.0 tags: - trl - ppo - transformers - reinforcement-learning language: - en --- # TRL Model This is a [TRL language model](https://github.com/huggingface/trl) that has been fine-tuned with reinforcement learning to guide the model outputs according to a simulated human feedback. The model was fine-tuned for classification of cancer / diabetes based on clinical notes. ```bash pip install torch transformers trl peft ``` ## Usage ```python from transformers import AutoTokenizer from trl import AutoModelForCausalLMWithValueHead from peft import LoraConfig tokenizer_kwargs = { "padding": "max_length", "truncation": True, "return_tensors": "pt", "padding_side": "left" } tokenizer = AutoTokenizer.from_pretrained("hanyinwang/layer-project-diagnostic-mistral", **tokenizer_kwargs) tokenizer.pad_token = tokenizer.eos_token generation_kwargs = { "min_length": -1, "top_k": 40, "top_p": 0.95, "do_sample": True, "pad_token_id": tokenizer.eos_token_id, "max_new_tokens":11, "temperature":0.1, "repetition_penalty":1.2 } model = AutoModelForCausalLMWithValueHead.from_pretrained("hanyinwang/layer-project-diagnostic-mistral").cuda() def format_prompt_mistral(text, condition): prompt = """[INST]You are a medical doctor specialized in %s diagnosis. From the provided document, assert if the patient historically and currently has %s. For each condition, only pick from "YES", "NO", or "MAYBE". And you must follow format without anything further. The results have to be directly parseable with python json.loads(). Sample output: {"%s": "MAYBE"} Never output anything beyond the format.[/INST] Provided document: %s"""%(condition, condition, condition, text) return prompt query_tensors = tokenizer.encode(format_prompt_mistral(, ), return_tensors="pt") # : clinical note # : "cancer" or "diabetes" prompt_length = query_tensors.shape[1] outputs = model.generate(query_tensors.cuda(), **generation_kwargs) response = tokenizer.decode(outputs[0][prompt_length:]) ```