EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss Paper • 2402.05008 • Published Feb 7 • 19
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published about 1 month ago • 23