-
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Paper • 2407.07053 • Published • 41 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 33 -
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Paper • 2407.11691 • Published • 13 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 60
Collections
Discover the best community collections!
Collections including paper arxiv:2407.12772
-
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper • 2407.12580 • Published • 39 -
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models
Paper • 2407.12772 • Published • 33 -
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
Paper • 2408.02718 • Published • 60
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 12 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 85 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 30
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 124 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 50 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 12 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 65
-
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
Paper • 2401.12208 • Published • 21 -
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 18 -
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter
Paper • 2402.10896 • Published • 14 -
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Paper • 2402.11690 • Published • 7