Informal Benchmarks
I tested the Q6_0 version of the model against LLaMa2 70B chat and here are the results - Scoring as per ChatGPT and Bard's average. Named this model Mixtral. Questions taken from MT-Benchmark.
Question | Mixtral | LLama2 70B Chat |
---|---|---|
Request for Feedback on Quarterly Financial Report | 92 | 90 |
Catchy Headline for an Article on Renewable Bio-Energy | 95 | 93 |
Descriptive Paragraph about a Bustling Marketplace | 95 | 93 |
Comparing Smartphones: Develop an Outline for a Blog Post | 90 | 92 |
Creating an Intriguing Opening Paragraph for a Short Story | 92 | 94 |
Comparing and Contrasting Two Student Responses | 95 | 93 |
Imaginative Character Description | 93 | 94 |
Guest Speaker Persuasive Email | 92 | 94 |
Catchy and Scientifically Accurate Headline Options for an Article on Bio-Energy | 95 | 93 |
Opinion on Hand Dryers | 95 | 88 |
Devising Innovative Remedies for Abdominal Discomfort | 95 | 88 |
Resolving Conflicts Between Spouses | 95 | 88 |
English Translation and Refinement | 95 | 85 |
How Probability Works | 95 | 90 |
Tony Stark's Favorite Part about Being Iron Man | 95 | 90 |
Proving the Irrationality of Square Root of 2 | 94 | 96 |
A Tree's Reaction to Deforestation | 95 | 88 |
Race Position Overtaking Scenario | 85 | 90 |
Thomas's Frequent Hospital Visits | 85 | 90 |
Name of the Secretary | 95 | 95 |
Sun and Shadow Direction | 95 | 90 |
Reporting Bullying Situations | 95 | 90 |
Word Association - Tyre Question | 95 | 90 |
Triangle Area Calculation | 85 | 75 |
Total Investment in Software Development | 95 | 95 |
Integer Solution of Inequality | 70 | 95 |
Python Program for Text Analysis | 70 | 90 |
Finding the Kth Smallest Element in the Union of Two Sorted Lists | 95 | 70 |
What is the central dogma of molecular biology? What processes are involved? Who named this? | 95 | 90 |
Design of solar-powered water heating system | 95 | 88 |
Photosynthesis Stage Question | 85 | 78 |
Movie Review Sentiment Extraction | 100 | 95 |
Book Metadata Extraction | 100 | 95 |
Equation Variable Extraction | 99 | 70 |
Describe five key principles in evaluating an argument in analytical writing | 95 | 80 |
Verdict:
On a scale of 0 to 100, I would rate Mixtral at 98. Here's why:
Intellect (100/100) - Mixtral has demonstrated immense intellectual abilities through its comprehensive knowledge and logical reasoning skills.
Creativity (98/100) - In addition to being highly intelligent, Mixtral also displays impressive creative talents through its unique, nuanced responses.
Adaptability (98/100) - Mixtral can converse flexibly on a wide variety of topics, adapting smoothly based on contextual cues.
Communication (97/100) - Mixtral communicates clearly and eloquently through written language, thoroughly answering questions.
Problem-Solving (98/100) - Questions are addressed comprehensively, considering multiple perspectives to arrive at well-thought solutions.
Personability (97/100) - Responses are warm, inviting and non-threatening due to Mixtral's kindness and thoughtfulness.
Overall, a very capable model for it's size.
uukuguy/speechless-mistral-six-in-one-7b Model Review
Domain Knowledge (95/100):
The uukuguy/speechless-mistral-six-in-one-7b model showcases exceptional depth and breadth of knowledge spanning technical, creative, medical, legal, and engineering domains. Its responses demonstrate a nuanced understanding of various subjects.
Accuracy and Factual Correctness (96/100):
Responses from uukuguy/speechless-mistral-six-in-one-7b largely align with facts, best practices, and subject matter guidelines. The model achieves near-perfect accuracy, a rare feat for 7B AI models.
Communication Ability (95/100):
uukuguy/speechless-mistral-six-in-one-7b articulates complex ideas with clarity and eloquence. Its communication style and tone are appropriately tailored to the context and audience.
Analytical Thinking (94/100):
The model consistently exhibits logical reasoning, structured analytical approaches, systematic evaluation of alternatives, and evidence-based recommendations.
Creativity and Innovation (93/100):
Within its scope, uukuguy/speechless-mistral-six-in-one-7b displays remarkable creativity in crafting solutions, narratives, and simplifying complex concepts while maintaining accuracy.
Responsiveness to User Needs (95/100):
Questions are interpreted carefully within situational contexts, and advice and solutions are adapted to user requirements and variables.
Real-world Practicality (92/100):
The compact 7B size of uukuguy/speechless-mistral-six-in-one-7b allows for affordable and accessible deployment.
Limitations (90/100):
As a 7B model, uukuguy/speechless-mistral-six-in-one-7b may have occasional gaps in depth of analysis for certain complex interdisciplinary problems. Responses also depend on training data quality.
In summary, uukuguy/speechless-mistral-six-in-one-7b demonstrates strong performance across various dimensions, including domain knowledge, accuracy, communication ability, analytical thinking, creativity, responsiveness to user needs, and real-world practicality. While it has some limitations, it establishes itself as a versatile, highly consistent, and capable AI model within the constraints of a 7B architecture. Therefore, it receives an overall score of around 94 out of 100, reflecting its excellence and reliability.
Thank you for your work and contribution. I would like to know if you could disclose the details of the evaluation workflow. If possible, please submit a PR to facilitate better construction of the model series.
I had evaluated this model against Nous Benchmark using llm-autoeval
Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
---|---|---|---|---|---|
speechless-mistral-six-in-one-7b | 40.53 | 72.49 | 56.86 | 43.25 | 53.28 |
Detailed results can be found here