Top performer on MT-benchπ
xDAN-AI β’ GitHub Repo β’ Discord β’ Twitter β’ Huggingface
Model Description
Benchmark Results
The xDAN-L1-Global-Chat, a 7B model, delivers remarkable results comparable to GPT, securing the #2 rank on MT-bench with an impressive score of 8.09, surpassing the performance of 70B models.
Top 1 performer on MT-bench
- Performance in Writing, Extraction, STEM, and Humanities: The xDAN-L1-Global-Chat model exhibits strong performance in writing, extraction, STEM, and humanities, achieving scores competitive with larger models. Its capabilities in creative and analytical tasks are evident, particularly in writing with a score of 9.55 and in humanities with a score of 9.90, demonstrating its adeptness in linguistic and contextual understanding.
- Response Quality in Initial Conversation Turns: In terms of conversational abilities, the model shows impressive performance in the first and second turns of a conversation, with scores of 8.33 and 7.85 respectively. This highlights its effectiveness in maintaining context and coherence, a key aspect of natural language processing.
- Reasoning Capabilities: The xDAN-L1-Global-Chat model scores 7.75 in reasoning, indicating a strong capability in logical processing and problem-solving. This reflects the model's proficiency in tasks that require critical thinking and analysis.
Model | # Params | Average | 1st turn | 2nd turn | Writing | Roleplay | Reasoning | Math | Coding | Extraction | STEM | Humanities |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GPT-4-0613 | - | 8.89 | 9.16 | 8.63 | 9.80 | 8.95 | 8.75 | 6.65 | 8.40 | 9.55 | 9.20 | 9.85 |
GPT-3.5-Turbo-0613 | - | 8.22 | 8.53 | 7.90 | 9.65 | 9.45 | 5.40 | 6.10 | 6.55 | 9.15 | 9.47 | 9.95 |
GPT-3.5-Turbo-0301 | - | 8.11 | 8.21 | 8.01 | 9.55 | 8.40 | 6.05 | 6.95 | 7.20 | 8.85 | 8.32 | 9.55 |
xDAN-L1-Chat-v0.1 | 7B | 8.09 | 8.33 | 7.85 | 9.55 | 8.30 | 7.75 | 4.90 | 6.65 | 8.85 | 8.82 | 9.90 |
WizardLM 70B V1.0 | 70B | 7.68 | 7.91 | 7.46 | 9.35 | 8.97 | 5.50 | 4.05 | 5.60 | 8.50 | 9.55 | 9.95 |
Stable Beluga 2 | 70B | 7.42 | 7.66 | 7.17 | 9.30 | 8.00 | 5.50 | 4.55 | 5.00 | 8.05 | 9.05 | 9.90 |
OpenAssistant Llama-2 70B V10 | 70B | 7.13 | 7.76 | 6.50 | 7.85 | 7.35 | 5.80 | 4.30 | 4.90 | 8.05 | 9.05 | 9.75 |
LLaMA-2 70B Chat | 70B | 7.10 | 7.38 | 6.83 | 8.72 | 8.15 | 6.35 | 3.70 | 3.85 | 7.80 | 8.85 | 9.38 |
Note: MT-Bench measures conversational capabilities. It evaluates an LLM on a set of 80 conversations with two turns each. The 160 model outputs are then rated by GPT-4. The evaluation method should be the same as for the LMSys Leaderboard so the results should be comparable. Note however that due to differences in the model prompt templates and the low number of samples the numbers may be slightly different.
- Performance Comparison with Larger Models: The xDAN-L1-Global-Chat, with a 7B parameter size, demonstrates competitive performance when compared to models with 70B parameters. This is particularly evident in its performance on MT-Bench, where it scores 8.09, indicating efficiency and optimized functioning despite its smaller size.
- Benchmarking Against Industry Standards: The model shows comparable capabilities to established models such as GPT-4-0613 in various metrics. This positions xDAN-L1-Global-Chat as a significant player in the field of language models, holding its own in key areas of performance.
Evaluation by HuggingFaceH4 Open LLM Leaderboard
Here are the results on metrics used by HuggingFaceH4 Open LLM Leaderboard
Task | Metric | Value |
---|---|---|
Total Average | - | 0.6583 |
arc_challenge | acc_norm | 0.63.74 |
hellaswag | acc_norm | 0.8453 |
mmlu | acc_norm | 0.629 |
truthfulqa_mc | mc2 | 0.5213 |
Usage
Feel free to apply to download the model and use the template of Alpaca. For more details, just visit our https://www.xdan.ai
Disclaimer
We employ data compliance checking algorithms during the training of our language model to strive for the highest degree of compliance. However, given the intricate nature of data and the vast array of potential usage scenarios for the model, we cannot assure that it will always generate correct and reasonable outputs. Users should be cognizant of the risk of the model producing problematic outputs. Our organization will not bear responsibility for any risks or issues stemming from misuse, misguidance, illegal use, and related misinformation, as well as any consequent data security concerns.
About xDAN-AI
xDAN-AI is a top lead high-performance model factory. For detailed information and further insights into our cutting-edge technology and offerings, please visit our website: www.xdan.ai.
- Downloads last month
- 0