Independent evaluation results

#8
by yaronr - opened

Dear Qwen Coder team,

I'm pleased to share our independent evaluation of the model using our implementation of the MMLU-Pro benchmark.
We know that MMLU-Pro is probably not the best benchmark for coding, but this is our starting point for all models, and we decided to share this with you nonetheless, hoping you may find this useful.

Sign up or log in to comment