3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination Paper • 2406.05132 • Published Jun 7 • 27
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11 • 32
MotionClone: Training-Free Motion Cloning for Controllable Video Generation Paper • 2406.05338 • Published Jun 8 • 39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published Jun 10 • 50
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion Paper • 2406.04338 • Published Jun 6 • 34