Model is even worse than InternVL-1.5 for multiple images
#6
by
pseudotensor
- opened
E.g. send 3 images, one of birthday cake, another receipt, and another bigben. If asked "What tower do you see?" one gets: "The image does not show any tower. It shows a receipt from a shopping store and a cake with a congratulatory message."
Thank you for your feedback. Recently, I have been working on a quantitative assessment of the model's multi-image performance.
czczup
changed discussion status to
closed