Q-Bench-Leaderboard / qbench_a1_pair_test.csv
zhangzicheng's picture
Update qbench_a1_pair_test.csv
db7a618 verified
Model (variant),Yes-or-No,What,How,Distortion,Other,Compare,Joint,Overall
InfiMM (Zephyr-7B),54.21,43.38,45.32,49.57,45.67,48.32,48.88,48.44
Emu2-Chat (LLaMA-33B),51.94,29.78,53.84,42.01,55.71,46.26,49.09,47.08
Fuyu-8B (Persimmon-8B),70.36,28.13,35.98,44.08,57.43,47.02,51.11,47.94
BakLLava (Mistral-7B),60.09,45.42,50.86,53.09,58.82,54.52,55.55,52.75
mPLUG-Owl2 (Q-Instruct),60.24,47.46,48.78,52.81,53.97,51.42,59.11,53.15
mPLUG-Owl2 (LLaMA-7B),58.07,36.61,48.44,47.74,51.9,45.73,60,48.94
LLaVA-v1.5 (Vicuna-v1.5-7B),60.72,42.37,50.17,49.15,59.86,52.97,49.77,52.25
LLaVA-v1.5 (Vicuna-v1.5-13B),57.34,47.45,49.13,49.01,59.51,52.06,52,52.05
Qwen-VL-Plus (Close-Source),66.85,55.79,59.91,62.46,58.77,62.17,59.2,61.48
Qwen-VL-Max (Close-Source),67.65,67.56,65.35,69.09,61.18,68.65,61.29,66.99
BlueImage-GPT (Close-Source),88.43,80.33,79.58,84.64,80.62,84.62,79.55,83.48
Gemini-Pro (Close-Source),65.78,56.61,56.74,60.42,60.55,60.46,60.44,60.46
GPT-4V (Close-Source),79.75,69.49,84.42,77.32,79.93,81,68,78.07