Collections
Discover the best community collections!
Collections including paper arxiv:2312.14233
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 28 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 27 -
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Paper • 2401.00604 • Published • 4 -
LARP: Language-Agent Role Play for Open-World Games
Paper • 2312.17653 • Published • 30
-
Generative Multimodal Models are In-Context Learners
Paper • 2312.13286 • Published • 34 -
StarVector: Generating Scalable Vector Graphics Code from Images
Paper • 2312.11556 • Published • 27 -
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Paper • 2311.08046 • Published • 1 -
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Paper • 2312.14233 • Published • 15
-
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 39 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
The Consensus Game: Language Model Generation via Equilibrium Search
Paper • 2310.09139 • Published • 12 -
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24