LDM3D-VR: Latent Diffusion Model for 3D VR
Abstract
Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- JointNet: Extending Text-to-Image Diffusion for Dense Distribution Modeling (2023)
- DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model (2023)
- Light Field Diffusion for Single-View Novel View Synthesis (2023)
- HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation (2023)
- Customizing 360-Degree Panoramas through Text-to-Image Diffusion Models (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space