SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Abstract
Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis. Recent research has introduced tuning methods that make subtle adjustments to the original models, yielding promising results in specific adaptations of foundational generative diffusion models. Rather than modifying the main backbone of the diffusion model, we delve into the role of skip connection in U-Net and reveal that hierarchical features aggregating long-distance information across encoder and decoder make a significant impact on the content and quality of image generation. Based on the observation, we propose an efficient generative tuning framework, dubbed SCEdit, which integrates and edits Skip Connection using a lightweight tuning module named SC-Tuner. Furthermore, the proposed framework allows for straightforward extension to controllable image synthesis by injecting different conditions with Controllable SC-Tuner, simplifying and unifying the network design for multi-condition inputs. Our SCEdit substantially reduces training parameters, memory usage, and computational expense due to its lightweight tuners, with backward propagation only passing to the decoder blocks. Extensive experiments conducted on text-to-image generation and controllable image synthesis tasks demonstrate the superiority of our method in terms of efficiency and performance. Project page: https://scedit.github.io/
Community
AWESOME! When will publish models and code?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models (2023)
- X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model (2023)
- ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors (2023)
- UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models (2023)
- SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
SCEdit has been integrated into the tuning library SWIFT, and the training code based on SWIFT has been released! In the upcoming week after New Year's Day, we will be working on the independent implementation of SCEdit.
For those who are unwilling to wait any longer, you can go to the official repo for the SWIFT-based training code at https://github.com/ali-vilab/SCEdit.
For complete implementation, you need to refer to our framework SCEPTER at https://github.com/modelscope/scepter and the tuning library SWIFT at https://github.com/modelscope/swift.
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper