MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Paper • 2410.17637 • Published 29 days ago • 34
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published 30 days ago • 43
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17 • 53
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published about 1 month ago • 58
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree Paper • 2410.16268 • Published about 1 month ago • 65
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs Paper • 2410.12405 • Published Oct 16 • 13
POINTS: Improving Your Vision-language Model with Affordable Strategies Paper • 2409.04828 • Published Sep 7 • 22
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models Paper • 2407.11691 • Published Jul 16 • 13
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
InternVL 2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 16 items • Updated 1 day ago • 76
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published Jun 25 • 18
AI Paper of the Day Collection A collection of papers that I think are interesting, one added each day • 222 items • Updated 1 day ago • 28
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published Jun 20 • 32
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Paper • 2406.14544 • Published Jun 20 • 34
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published Jun 17 • 61
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 72