Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Abstract
Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion on pre-trained diffusion models. The first, adopted from the image domain, allows text-based editing. The second, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples can be found on our examples page in https://hilamanor.github.io/AudioEditing/ and code can be found in https://github.com/hilamanor/AudioEditing/ .
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models (2024)
- DITTO: Diffusion Inference-Time T-Optimization for Music Generation (2024)
- RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing (2023)
- ZONE: Zero-Shot Instruction-Guided Local Editing (2023)
- Tuning-Free Inversion-Enhanced Control for Consistent Image Editing (2023)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper