Eval results

by theblackcat102 - opened Jul 24, 2023

Jul 24, 2023

Its interesting to know how does this model performs compared to others in terms of CoT and world knowledge use ( mainly due to the expanded FF layer )

BBH :
chargoddard/llama2-22b 37.48
vicuna-13B v1.3 35.78
WizardLM-13B-V1.1 39.59
llama-v1-13b 36.52

Still running MMLU, but the all the sub tasks score does seems similar to llama-v2-13b

theblackcat102

Jul 24, 2023

Updated MMLU scores:
WizardLM-13B-V1.1 49.95
vicuna-13B v1.3 52.1
llama-v1-13b 46.2
chargoddard/llama2-22b 53.60
llama-v2-13b 55.75

chargoddard

Owner Jul 26, 2023

Thanks for running these! It’s great to have actual benchmark scores. I’d call this a win - the fact that the score is only slightly deteriorated from llama-v2-13b is very promising. The amount of rehabilitation training done to this model was fairly minimal. I’m hopeful that this will shine with some actual training.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment