Collections
Discover the best community collections!
Collections including paper arxiv:2410.02073
-
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
Paper • 2410.01804 • Published • 5 -
DreamCinema: Cinematic Transfer with Free Camera and 3D Character
Paper • 2408.12601 • Published • 28 -
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Paper • 2408.14819 • Published • 19 -
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper • 2409.20563 • Published • 7
-
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Paper • 2410.00201 • Published -
Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems
Paper • 2409.19804 • Published -
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling
Paper • 2409.15156 • Published -
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue
Paper • 2409.04927 • Published
-
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper • 2405.20340 • Published • 19 -
Spectrally Pruned Gaussian Fields with Neural Compensation
Paper • 2405.00676 • Published • 8 -
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Paper • 2404.18212 • Published • 27 -
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper • 2405.00732 • Published • 118
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 4
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 26
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 596 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper • 2312.16837 • Published • 5 -
Learning the 3D Fauna of the Web
Paper • 2401.02400 • Published • 9 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper • 2310.15110 • Published • 2 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper • 2303.11328 • Published • 5