|
--- |
|
license: afl-3.0 |
|
library_name: transformers |
|
tags: |
|
- UNA |
|
- juanako |
|
datasets: |
|
- jondurbin/py-dpo-v0.1 |
|
- Replete-AI/code_bagel_hermes-2.5 |
|
- mlabonne/orpo-dpo-mix-40k |
|
model-index: |
|
- name: UNA-ThePitbull-21.4B-v2 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: AI2 Reasoning Challenge (25-Shot) |
|
type: ai2_arc |
|
config: ARC-Challenge |
|
split: test |
|
args: |
|
num_few_shot: 25 |
|
metrics: |
|
- type: acc_norm |
|
value: 77.73 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: HellaSwag (10-Shot) |
|
type: hellaswag |
|
split: validation |
|
args: |
|
num_few_shot: 10 |
|
metrics: |
|
- type: acc_norm |
|
value: 91.79 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU (5-Shot) |
|
type: cais/mmlu |
|
config: all |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 68.25 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: TruthfulQA (0-shot) |
|
type: truthful_qa |
|
config: multiple_choice |
|
split: validation |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: mc2 |
|
value: 78.24 |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: Winogrande (5-shot) |
|
type: winogrande |
|
config: winogrande_xl |
|
split: validation |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 87.37 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GSM8k (5-shot) |
|
type: gsm8k |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 63.53 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 37.9 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 46.79 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 9.59 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 6.94 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 6.42 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 27.95 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2 |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# UNA-ThePitbull 21.4B v2 |
|
|
|
Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B based on saltlux/luxia-21.4b-alignment-v1.0 |
|
![UNA - ThePitbull 21.4B v2](https://huggingface.co/fblgit/UNA-ThePitbull-21.4B-v2/resolve/main/DE-UNA-ThePitbull-21.4B-v2.png) |
|
|
|
This model has not been poisoned to score high and be useless. We release him becaues its the real deal of EQ & IQ all together in a crazy powerful smart and conversational model. |
|
|
|
Quant Versions available at [bartowski/UNA-ThePitbull-21.4B-v2-GGUF](https://huggingface.co/bartowski/UNA-ThePitbull-21.4B-v2-GGUF) |
|
|
|
## Difference V1 vs V2 |
|
|
|
On V2 we implemented a different UNA strategy and covered partially the MLP's and Attention Layers. |
|
We also performed further SFT over V1 and further DPO over V1 and we'll release some of those soon as well. |
|
|
|
### Changes |
|
|
|
1. SFT over V1 with `Replete-AI/code_bagel_hermes-2.5` at 1.0e-4 till 5.0e-5 for 1 epoch |
|
2. DPO with: 1.0e-4 to min_lr 5.0e-5 for 1 epoch |
|
* `mlabonne/orpo-dpo-mix-40k` |
|
* `jondurbin/py-dpo-v0.1` |
|
|
|
# Evaluations |
|
## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNA-ThePitbull-21.4B-v2) |
|
|
|
| Metric |Value| |
|
|---------------------------------|----:| |
|
|Avg. |77.82| |
|
|AI2 Reasoning Challenge (25-Shot)|77.73| |
|
|HellaSwag (10-Shot) |91.79| |
|
|MMLU (5-Shot) |68.25| |
|
|TruthfulQA (0-shot) |78.24| |
|
|Winogrande (5-shot) |87.37| |
|
|GSM8k (5-shot) |63.53| |
|
|
|
Can only be compared with its non-una base model: the original luxia-21.4b and ThePitbull-v1 |
|
|
|
## UNA v2 (VLLM) Evaluations: |
|
``` |
|
vllm (pretrained=/data/tools/mergekit/una-thepitbull-v5,dtype=bfloat16,gpu_memory_utilization=0.8,max_model_len=2048,data_parallel_size=2,tensor_parallel_size=4), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8 |
|
| Tasks |Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k | 3|strict-match | 5|exact_match|0.7695|± |0.0116|+ |
|
| | |flexible-extract| 5|exact_match|0.7695|± |0.0116|+ |
|
|hellaswag | 1|none | 10|acc |0.8110|± |0.0039| |
|
| | |none | 10|acc_norm |0.9169|± |0.0028|+ |
|
|winogrande | 1|none | 5|acc |0.8777|± |0.0092|+ |
|
|mmlu |N/A |none | 0|acc |0.6427|± |0.0038|- |
|
|arc_challenge | 1|none | 25|acc |0.7713|± |0.0123| |
|
| | |none | 25|acc_norm |0.7875|± |0.0120|+ |
|
|truthfulqa_mc2| 2|none | 0|acc |0.7824|± |0.0135|- |
|
|mathqa | 1|none | 0|acc |0.4037|± | 0.009| |
|
| | |none | 0|acc_norm |0.4034|± | 0.009|+ |
|
|pubmedqa | 1|none | 0|acc |0.7260|± | 0.020|+ |
|
|boolq | 2|none | 0|acc |0.8602|± |0.0061|+ |
|
``` |
|
|
|
## UNA v1 (VLLM) Evaluations |
|
``` |
|
| Tasks |Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k | 3|strict-match | 5|exact_match|0.7566|± |0.0118| |
|
| | |flexible-extract| 5|exact_match|0.7582|± |0.0118| |
|
|hellaswag | 1|none | 10|acc |0.8168|± |0.0039| |
|
| | |none | 10|acc_norm |0.9188|± |0.0027| |
|
|winogrande | 1|none | 5|acc |0.8635|± |0.0097| |
|
|mmlu | N/A|none | 0|acc |0.6444|± |0.0038| |
|
|arc_challenge | 1|none | 25|acc |0.7747|± |0.0122| |
|
| | |none | 25|acc_norm |0.7850|± |0.0120| |
|
|truthfulqa_mc2| 2|none | 0|acc |0.7902|± |0.0134| |
|
|mathqa | 1|none | 0|acc |0.4030|± | 0.009| |
|
| | |none | 0|acc_norm |0.4034|± | 0.009| |
|
|pubmedqa | 1|none | 0|acc |0.6860|± |0.0208| |
|
|boolq | 2|none | 0|acc |0.8401|± |0.0064| |
|
``` |
|
|
|
## Original (VLLM) Evaluations |
|
``` |
|
| Tasks |Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k | 3|strict-match | 5|exact_match|0.7528|± |0.0119| |
|
| | |flexible-extract| 5|exact_match|0.7521|± |0.0119| |
|
|hellaswag | 1|none | 10|acc |0.8117|± |0.0039| |
|
| | |none | 10|acc_norm |0.9167|± |0.0028| |
|
|winogrande | 1|none | 5|acc |0.8682|± |0.0095| |
|
|mmlu | N/A|none | 0|acc |0.6448|± |0.0038| |
|
|arc_challenge | 1|none | 25|acc |0.7688|± |0.0123| |
|
| | |none | 25|acc_norm |0.7730|± |0.0122| |
|
|truthfulqa_mc2| 2|none | 0|acc |0.7895|± |0.0133| |
|
|mathqa | 1|none | 0|acc |0.4000|± | 0.009| |
|
| | |none | 0|acc_norm |0.4003|± | 0.009| |
|
|pubmedqa | 1|none | 0|acc |0.6680|± |0.0211| |
|
|boolq | 2|none | 0|acc |0.8346|± |0.0065| |
|
``` |
|
|
|
## Citations |
|
* mlabonne |
|
* jondurbin & Replete-AI |
|
* bartowski |
|
* saltlux |
|
|
|
If you use UNA models dont forget to cite: |
|
``` |
|
@misc{unathepitbull21b, |
|
title={ThePitbull: Uniform Neural Alignment}, |
|
author={Xavier Murias}, |
|
year={2024}, |
|
publisher = {Juanako.AI}, |
|
journal = {HuggingFace repository}, |
|
howpublished = {\url{https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1}}, |
|
} |
|
``` |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNA-ThePitbull-21.4B-v2) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. |22.60| |
|
|IFEval (0-Shot) |37.90| |
|
|BBH (3-Shot) |46.79| |
|
|MATH Lvl 5 (4-Shot)| 9.59| |
|
|GPQA (0-shot) | 6.94| |
|
|MuSR (0-shot) | 6.42| |
|
|MMLU-PRO (5-shot) |27.95| |
|
|
|
|