Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Top performer on MT-benchπŸ†

xDAN-AI β€’ GitHub Repo β€’ Discord β€’ Twitter β€’ Huggingface

Model Description

Benchmark Results

The xDAN-L1-Global-Chat, a 7B model, delivers remarkable results comparable to GPT, securing the #2 rank on MT-bench with an impressive score of 8.09, surpassing the performance of 70B models.

Top 1 performer on MT-bench

  1. Performance in Writing, Extraction, STEM, and Humanities: The xDAN-L1-Global-Chat model exhibits strong performance in writing, extraction, STEM, and humanities, achieving scores competitive with larger models. Its capabilities in creative and analytical tasks are evident, particularly in writing with a score of 9.55 and in humanities with a score of 9.90, demonstrating its adeptness in linguistic and contextual understanding.
  2. Response Quality in Initial Conversation Turns: In terms of conversational abilities, the model shows impressive performance in the first and second turns of a conversation, with scores of 8.33 and 7.85 respectively. This highlights its effectiveness in maintaining context and coherence, a key aspect of natural language processing.
  3. Reasoning Capabilities: The xDAN-L1-Global-Chat model scores 7.75 in reasoning, indicating a strong capability in logical processing and problem-solving. This reflects the model's proficiency in tasks that require critical thinking and analysis.

image/jpeg

Model # Params Average 1st turn 2nd turn Writing Roleplay Reasoning Math Coding Extraction STEM Humanities
GPT-4-0613 - 8.89 9.16 8.63 9.80 8.95 8.75 6.65 8.40 9.55 9.20 9.85
GPT-3.5-Turbo-0613 - 8.22 8.53 7.90 9.65 9.45 5.40 6.10 6.55 9.15 9.47 9.95
GPT-3.5-Turbo-0301 - 8.11 8.21 8.01 9.55 8.40 6.05 6.95 7.20 8.85 8.32 9.55
xDAN-L1-Chat-v0.1 7B 8.09 8.33 7.85 9.55 8.30 7.75 4.90 6.65 8.85 8.82 9.90
WizardLM 70B V1.0 70B 7.68 7.91 7.46 9.35 8.97 5.50 4.05 5.60 8.50 9.55 9.95
Stable Beluga 2 70B 7.42 7.66 7.17 9.30 8.00 5.50 4.55 5.00 8.05 9.05 9.90
OpenAssistant Llama-2 70B V10 70B 7.13 7.76 6.50 7.85 7.35 5.80 4.30 4.90 8.05 9.05 9.75
LLaMA-2 70B Chat 70B 7.10 7.38 6.83 8.72 8.15 6.35 3.70 3.85 7.80 8.85 9.38

Note: MT-Bench measures conversational capabilities. It evaluates an LLM on a set of 80 conversations with two turns each. The 160 model outputs are then rated by GPT-4. The evaluation method should be the same as for the LMSys Leaderboard so the results should be comparable. Note however that due to differences in the model prompt templates and the low number of samples the numbers may be slightly different.

  1. Performance Comparison with Larger Models: The xDAN-L1-Global-Chat, with a 7B parameter size, demonstrates competitive performance when compared to models with 70B parameters. This is particularly evident in its performance on MT-Bench, where it scores 8.09, indicating efficiency and optimized functioning despite its smaller size.
  2. Benchmarking Against Industry Standards: The model shows comparable capabilities to established models such as GPT-4-0613 in various metrics. This positions xDAN-L1-Global-Chat as a significant player in the field of language models, holding its own in key areas of performance.

Evaluation by HuggingFaceH4 Open LLM Leaderboard

Here are the results on metrics used by HuggingFaceH4 Open LLM Leaderboard

Task Metric Value
Total Average - 0.6583
arc_challenge acc_norm 0.63.74
hellaswag acc_norm 0.8453
mmlu acc_norm 0.629
truthfulqa_mc mc2 0.5213

Usage

Feel free to apply to download the model and use the template of Alpaca. For more details, just visit our https://www.xdan.ai

Disclaimer

We employ data compliance checking algorithms during the training of our language model to strive for the highest degree of compliance. However, given the intricate nature of data and the vast array of potential usage scenarios for the model, we cannot assure that it will always generate correct and reasonable outputs. Users should be cognizant of the risk of the model producing problematic outputs. Our organization will not bear responsibility for any risks or issues stemming from misuse, misguidance, illegal use, and related misinformation, as well as any consequent data security concerns.

About xDAN-AI

xDAN-AI is a top lead high-performance model factory. For detailed information and further insights into our cutting-edge technology and offerings, please visit our website: www.xdan.ai.

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using xDAN-AI/xDAN-L1-Chat-v0.1 3