File size: 3,278 Bytes

180ab05
 
5a150fc
 
 
 
 
 
 
 
180ab05
 
5a150fc
180ab05
5a150fc
180ab05
5a150fc
180ab05
e53699e
 
 
 
5a150fc
180ab05
5a150fc
180ab05
5a150fc
180ab05
5a150fc
 
 
 
 
 
 
180ab05
5a150fc
180ab05
5a150fc
180ab05
5a150fc
180ab05
5a150fc
 
180ab05
5a150fc
 
 
180ab05
5a150fc
 
180ab05
5a150fc
 
 
 
 
 
 
 
180ab05
5a150fc

---
library_name: transformers
tags:
- orpo
- llama 3
license: other
datasets:
- mlabonne/orpo-dpo-mix-40k
language:
- en
---

# OrpoLlama-3-8B

![](https://i.imgur.com/ZHwzQvI.png)

This is a quick fine-tune of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on 1k samples of [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) created for [this article](https://huggingface.co/blog/mlabonne/orpo-llama-3).

It's not very good at the moment (it's the sassiest model ever), but I'm currently training a version on the entire dataset.

**Try the demo**: https://huggingface.co/spaces/mlabonne/OrpoLlama-3-8B

## 🏆 Evaluation

### Nous

Evaluation performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval), see the entire leaderboard [here](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).

| Model                                                                                                                                                                     |   Average |   AGIEval |   GPT4All | TruthfulQA |  Bigbench |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------: | --------: | --------: | ---------: | --------: |
| [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [📄](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8)     |     52.42 |     42.75 |     72.99 |      52.99 |     40.94 |
| [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) [📄](https://gist.github.com/mlabonne/8329284d86035e6019edb11eb0933628) |     51.34 |     41.22 |     69.86 |      51.65 |     42.64 |
| [mistralai/Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) [📄](https://gist.github.com/mlabonne/7a0446c3d30dfce72834ef780491c4b2)   |     49.15 |     33.36 |     67.87 |      55.89 |     39.48 |
| [**mlabonne/OrpoLlama-3-8B**](https://huggingface.co/mlabonne/OrpoLlama-3-8B) [📄](https://gist.github.com/mlabonne/f41dad371d1781d0434a4672fd6f0b82)                     | **46.76** | **31.56** | **70.19** |  **48.11** | **37.17** |
| [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) [📄](https://gist.github.com/mlabonne/616b6245137a9cfc4ea80e4c6e55d847)                   |     45.42 |      31.1 |     69.95 |      43.91 |      36.7 |

## 📈 Training curves

![](https://i.imgur.com/r78hGrl.png)

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "mlabonne/OrpoLlama-3-8B"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```