Spaces:

m-a-p
/

README

Running

App Files Files Community

README / README.md

musicaudiopretrain

Update README.md

849242f over 1 year ago

preview code

raw

history blame

No virus

3.43 kB

	---
	title: README
	emoji: 👀
	colorFrom: green
	colorTo: yellow
	sdk: static
	pinned: false
	---

	We are a group of people training music LLMs~ 🔥

	# Introduction to our series work

	The development log of our Music Audio Pre-training (m-a-p) model family:
	- 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
	- 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
	- 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with MLM paradigm, which performs better at downstream tasks.
	- 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with BYOL paradigm.



	Here is a table for quick model pick-up:

	\| Name \| Pre-train Paradigm \| Training Data (hour) \| Pre-train Context (second) \| Model Size \| Transformer Layer-Dimension \| Feature Rate \| Sample Rate \| Release Date \|
	\| ------------------------------------------------------------ \| ------------------ \| -------------------- \| ---------------------------- \| ---------- \| --------------------------- \| ------------ \| ----------- \| ------------ \|
	\| [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) \| MLM \| 160K \| 5 \| 330M \| 24-1024 \| 75 Hz \| 24K Hz \| 17/03/2023 \|
	\| [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) \| MLM \| 20K \| 5 \| 95M \| 12-768 \| 75 Hz \| 24K Hz \| 17/03/2023 \|
	\| [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) \| MLM \| 900 \| 5 \| 95M \| 12-768 \| 50 Hz \| 16K Hz \| 14/03/2023 \|
	\| [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) \| MLM \| 1000 \| 5 \| 95 M \| 12-768 \| 50 Hz \| 16K Hz \| 29/12/2022 \|
	\| [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1) \| BYOL \| 1000 \| 30 \| 95 M \| 12-768 \| 50 Hz \| 16K Hz \| 30/10/2022 \|

	## Explanation

	The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:

	- Model Size: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
	- Transformer Layer-Dimension: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by different layers could have various performance depending on tasks.
	- Feature Rate: Given a 1-second audio input, the number of features output by the model.
	- Sample Rate: The frequency of audio that the model is trained with.