--- language: - en library_name: transformers tags: - orpo - Mistral - Mistral-7B-v0.3 - sft datasets: - mlabonne/orpo-dpo-mix-40k --- # Model description This model is an ORPO fine-tuned version of the [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on 2.5k subsamples of the [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) dataset. Thanks to [Maxime Labonne](https://huggingface.co/mlabonne) for providing this [amazing guide](https://huggingface.co/blog/mlabonne/orpo-llama-3) on Odds Ratio Policy Optimization (ORPO). ORPO combines the traditional supervised fine-tuning and preference alignment stages into a single process. This model follows the ChatML chat template! ## How to use ```` import torch from transformers import AutoTokenizer, pipeline model_id = "MuntasirHossain/Orpo-Mistral-7B-v0.3" tokenizer = AutoTokenizer.from_pretrained(model_id) llm = pipeline( "text-generation", model=model_id, torch_dtype=torch.float16, device_map="auto", ) def generate(input_text): messages = [{"role": "user", "content": input_text}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = llm(prompt, max_new_tokens=512,) return outputs[0]["generated_text"][len(prompt):] generate("Explain quantum tunneling in simple terms.") ````