Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30 • 29
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 29
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data Paper • 2403.11207 • Published Mar 17 • 14
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos Paper • 2403.01444 • Published Mar 3 • 4
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on Paper • 2403.01779 • Published Mar 4 • 28
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model Paper • 2403.05034 • Published Mar 8 • 20
Graph Mamba: Towards Learning on Graphs with State Space Models Paper • 2402.08678 • Published Feb 13 • 13
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation Paper • 2401.17053 • Published Jan 30 • 30
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models Paper • 2310.14566 • Published Oct 23, 2023 • 25
3D-GPT: Procedural 3D Modeling with Large Language Models Paper • 2310.12945 • Published Oct 19, 2023 • 57
Drag View: Generalizable Novel View Synthesis with Unposed Imagery Paper • 2310.03704 • Published Oct 5, 2023 • 7
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory Paper • 2308.08089 • Published Aug 16, 2023 • 21
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models Paper • 2308.06721 • Published Aug 13, 2023 • 29
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding Paper • 2307.16449 • Published Jul 31, 2023 • 15