Model is even worse than InternVL-1.5 for multiple images

by pseudotensor - opened Jul 12

Jul 12

E.g. send 3 images, one of birthday cake, another receipt, and another bigben. If asked "What tower do you see?" one gets: "The image does not show any tower. It shows a receipt from a shopping store and a cake with a congratulatory message."

czczup

OpenGVLab org Jul 16

Thank you for your feedback. Recently, I have been working on a quantitative assessment of the model's multi-image performance.

czczup changed discussion status to closed Aug 9

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment