MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies Paper • 2403.01422 • Published Mar 3 • 26
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14 • 8
VidToMe: Video Token Merging for Zero-Shot Video Editing Paper • 2312.10656 • Published Dec 17, 2023 • 10
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion Paper • 2403.17237 • Published Mar 25 • 9
Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians Paper • 2403.17898 • Published Mar 26 • 14
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens Paper • 2404.03413 • Published Apr 4 • 25
Phenaki: Variable Length Video Generation From Open Domain Textual Description Paper • 2210.02399 • Published Oct 5, 2022 • 3
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21