HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning Paper • 2407.15680 • Published Jul 22 • 1
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 118
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40
🎭 Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 69 items • Updated Oct 21 • 76
FeatUp: A Model-Agnostic Framework for Features at Any Resolution Paper • 2403.10516 • Published Mar 15 • 16
Matryoshka Embedding Models Collection https://huggingface.co/blog/matryoshka • 14 items • Updated Jun 4 • 13
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts Paper • 2402.13220 • Published Feb 20 • 13
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models Paper • 2402.01118 • Published Feb 2 • 29
Instruct-Imagen: Image Generation with Multi-modal Instruction Paper • 2401.01952 • Published Jan 3 • 30
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper • 2312.04461 • Published Dec 7, 2023 • 57
Describing Differences in Image Sets with Natural Language Paper • 2312.02974 • Published Dec 5, 2023 • 13
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models Paper • 2311.12092 • Published Nov 20, 2023 • 21