Collections
Discover the best community collections!
Collections including paper arxiv:2310.11441
-
ImageBind: One Embedding Space To Bind Them All
Paper • 2305.05665 • Published • 3 -
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth
Paper • 2302.12288 • Published -
HuggingFaceM4/howto100m
Updated • 39 • 4 -
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Paper • 2201.12086 • Published • 3
-
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Paper • 2310.16045 • Published • 14 -
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Paper • 2310.14566 • Published • 25 -
SILC: Improving Vision Language Pretraining with Self-Distillation
Paper • 2310.13355 • Published • 6 -
Conditional Diffusion Distillation
Paper • 2310.01407 • Published • 20
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 74 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 41 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 40
-
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 2 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 28 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper • 2310.09520 • Published • 10 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 52
-
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 58 -
FoleyGen: Visually-Guided Audio Generation
Paper • 2309.10537 • Published • 8 -
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 26 -
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 57
-
Compositional Foundation Models for Hierarchical Planning
Paper • 2309.08587 • Published • 9 -
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper • 2309.11499 • Published • 58 -
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Paper • 2309.15091 • Published • 32 -
Context-Aware Meta-Learning
Paper • 2310.10971 • Published • 16
-
OmnimatteRF: Robust Omnimatte with 3D Background Modeling
Paper • 2309.07749 • Published • 7 -
AudioSR: Versatile Audio Super-resolution at Scale
Paper • 2309.07314 • Published • 24 -
Generative Image Dynamics
Paper • 2309.07906 • Published • 52 -
MagiCapture: High-Resolution Multi-Concept Portrait Customization
Paper • 2309.06895 • Published • 27