leaderboard-pr-bot's picture
Adding Evaluation Results
7ae14c2
|
raw
history blame
1.44 kB
## Fresh Alpasta, done Al Dente!
It's da *logical* choice! Now with a similar personality emulation quality to [GPT4-X-Alpasta-30b!](https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b)
## Model Info:
ChanSung's [Alpaca-LoRA-30B-elina](https://huggingface.co/LLMs/Alpaca-LoRA-30B-elina) merged with [Open Assistant's second Finetune](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor)
## Benchmarks:
**Wikitext2:** 4.662261962890625
**PTB:** 24.547462463378906
**C4:** 7.05504846572876
[4bit](https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/blob/main/4bit.safetensors):
**Wikitext2:** 5.016242980957031
**PTB:** 25.576189041137695
**C4:** 7.332120418548584
~ Thanks to [askmyteapot](https://huggingface.co/askmyteapot) for performing these benchmarks!
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Aeala__GPT4-x-AlpacaDente2-30b)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 57.05 |
| ARC (25-shot) | 60.58 |
| HellaSwag (10-shot) | 81.81 |
| MMLU (5-shot) | 56.63 |
| TruthfulQA (0-shot) | 48.38 |
| Winogrande (5-shot) | 78.14 |
| GSM8K (5-shot) | 26.76 |
| DROP (3-shot) | 47.06 |