Post
🔥 New multimodal leaderboard on the hub: ConTextual!
Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ... 🖼️
So how do you evaluate performance on this task?
The ConTextual team introduced a brand new dataset of instructions and images, to test LMM (large multimodal models) reasoning capabilities, and an associated leaderboard (with a private test set).
This is super exciting imo because it has the potential to be a good benchmark both for multimodal models and for assistants' vision capabilities, thanks to the instructions in the dataset.
Congrats to @rohan598 , @hbXNov , @kaiweichang and @violetpeng !!
Learn more in the blog: https://huggingface.co/blog/leaderboard-contextual
Leaderboard: ucla-contextual/contextual_leaderboard
Many situations require models to parse images containing text: maps, web pages, real world pictures, memes, ... 🖼️
So how do you evaluate performance on this task?
The ConTextual team introduced a brand new dataset of instructions and images, to test LMM (large multimodal models) reasoning capabilities, and an associated leaderboard (with a private test set).
This is super exciting imo because it has the potential to be a good benchmark both for multimodal models and for assistants' vision capabilities, thanks to the instructions in the dataset.
Congrats to @rohan598 , @hbXNov , @kaiweichang and @violetpeng !!
Learn more in the blog: https://huggingface.co/blog/leaderboard-contextual
Leaderboard: ucla-contextual/contextual_leaderboard