-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
Collections
Discover the best community collections!
Collections including paper arxiv:2405.10320
-
Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era
Paper • 2305.06131 • Published • 2 -
Perpetual Humanoid Control for Real-time Simulated Avatars
Paper • 2305.06456 • Published • 1 -
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold
Paper • 2305.10973 • Published • 32 -
LDM3D: Latent Diffusion Model for 3D
Paper • 2305.10853 • Published • 10
-
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 19 -
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Paper • 2403.14148 • Published • 18 -
Explorative Inbetweening of Time and Space
Paper • 2403.14611 • Published • 11 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 13
-
Seamless Human Motion Composition with Blended Positional Encodings
Paper • 2402.15509 • Published • 14 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper • 2403.02151 • Published • 12 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 7 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper • 2403.09981 • Published • 6
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper • 2312.16837 • Published • 5 -
Learning the 3D Fauna of the Web
Paper • 2401.02400 • Published • 9 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper • 2310.15110 • Published • 2 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper • 2303.11328 • Published • 5
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 113 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 73 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33