crypto-code
commited on
Commit
•
6b56fcd
1
Parent(s):
9264590
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
datasets:
|
4 |
+
- M2UGen/MUCaps
|
5 |
+
- M2UGen/MUEdit
|
6 |
+
- M2UGen/MUImage
|
7 |
+
- M2UGen/MUVideo
|
8 |
---
|
9 |
+
# M<sup>2</sup>UGen Model with MusicGen-medium
|
10 |
+
|
11 |
+
The M<sup>2</sup>UGen model is a Music Understanding and Generation model that is capable of Music Question Answering and also Music Generation
|
12 |
+
from texts, images, videos and audios, as well as Music Editing. The model utilizes encoders such as MERT for music understanding, ViT for image understanding
|
13 |
+
and ViViT for video understanding and the MusicGen/AudioLDM2 model as the music generation model (music decoder), coupled with adapters and the LLaMA 2 model
|
14 |
+
to make the model possible for multiple abilities.
|
15 |
+
|
16 |
+
M<sup>2</sup>UGen was published in [M<sup>2</sup>UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models](https://arxiv.org/abs/2311.11255) by *Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun and Ying Shan*.
|
17 |
+
|
18 |
+
The code repository for the model is published in [crypto-code/M2UGen](https://github.com/crypto-code/M2UGen). Clone the repository, download the checkpoint and run the following for a model demo:
|
19 |
+
```bash
|
20 |
+
python gradio_app.py --model ./ckpts/M2UGen-MusicGen-medium/checkpoint.pth --llama_dir ./ckpts/LLaMA-2 --music_decoder musicgen --music_decoder_path facebook/musicgen-medium
|
21 |
+
```
|
22 |
+
|
23 |
+
## Citation
|
24 |
+
|
25 |
+
If you find this model useful, please consider citing:
|
26 |
+
|
27 |
+
```bibtex
|
28 |
+
@article{hussain2023m,
|
29 |
+
title={{M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models}},
|
30 |
+
author={Hussain, Atin Sakkeer and Liu, Shansong and Sun, Chenshuo and Shan, Ying},
|
31 |
+
journal={arXiv preprint arXiv:2311.11255},
|
32 |
+
year={2023}
|
33 |
+
}
|
34 |
+
```
|