Model (variant),Yes-or-No,What,How,Distortion,Other,In-context Distortion,In-context Other,Overall InfiMM (Zephyr-7B),57.45,57.96,44.62,47.27,57.17,49.67,64.08,53.37 Emu2-Chat (LLaMA-33B),71.81,67.25,56.18,64.78,63.19,63.48,72.24,65.28 Fuyu-8B (Persimmon-8B),53.33,43.7,38.0,40.81,47.4,45.45,49.23,45.05 BakLLava (Mistral-7B),66.0,56.16,51.12,51.15,61.57,53.72,72.0,57.48 SPHINX,74.18,68.81,62.07,63.62,71.76,66.12,76.33,68.56 mPLUG-Owl2 (LLaMA-7B),72.18,57.96,56.19,56.68,69.21,53.29,72.65,61.61 LLaVA-v1.5 (Vicuna-v1.5-7B),66.36,58.19,50.51,49.42,65.74,54.61,70.61,58.66 LLaVA-v1.5 (Vicuna-v1.5-13B),65.27,64.38,56.59,56.03,67.13,61.18,67.35,62.14 InternLM-XComposer-VL (InternLM),69.45,65.27,60.85,61.67,70.14,56.91,75.1,65.35 IDEFICS-Instruct (LLaMA-7B),56.18,44.69,44.02,42.8,54.17,44.74,56.33,48.7 Qwen-VL (QwenLM),63.09,58.19,56.39,50.58,62.73,57.89,73.88,59.4 Shikra (Vicuna-7B),65.64,47.35,49.09,48.83,59.49,50.0,64.08,54.65 Otter-v1 (MPT-7B),57.09,40.71,39.55,42.22,49.31,44.08,52.65,46.35 InstructBLIP (Flan-T5-XL),67.64,59.96,55.98,56.23,65.51,58.22,69.39,61.47 InstructBLIP (Vicuna-7B),71.64,52.65,43.81,48.64,62.5,55.59,64.9,56.72 VisualGLM-6B (GLM-6B),60.18,54.2,46.25,51.75,54.4,53.62,57.14,53.78 mPLUG-Owl (LLaMA-7B),66.0,54.87,44.02,51.36,55.09,54.28,65.71,55.38 LLaMA-Adapter-V2,66.18,59.29,52.13,57.39,56.25,63.16,64.9,59.46 LLaVA-v1 (Vicuna-13B),54.0,53.1,55.38,48.64,54.63,55.59,63.27,54.18 MiniGPT-4 (Vicuna-13B),55.82,50.22,40.37,42.02,48.38,51.97,61.22,49.03 Qwen-VL-Plus (Close-Source),73.77,69.47,53.88,66.21,65.72,63.81,68.75,66.04 Qwen-VL-Max (Close-Source),75.6,79.43,66.09,73.39,74.08,71.0,76.92,73.63 Gemini-Pro (Close-Source),68.8,73.74,62.34,66.3,71.34,63.91,73.09,68.16 GPT-4V (Close-Source),76.85,79.17,67.52,73.53,76.18,72.83,76.47,74.51