---
title: README
emoji: 👀
colorFrom: green
colorTo: yellow
sdk: static
pinned: false
---

We are a group of people training music LLMs~ 🔥

# Introduction to our series work

The development log of our Music Audio Pre-training (m-a-p) model family:
- 17/03/2023: we release two advanced music understanding models, [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M) and [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M) , trained with new paradigm and dataset. They outperform the previous models and can better generalize to more tasks.
- 14/03/2023: we retrained the MERT-v0 model with open-source-only music dataset [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public)
- 29/12/2022: a music understanding model [MERT-v0](https://huggingface.co/m-a-p/MERT-v0) trained with **MLM** paradigm, which performs better at downstream tasks.
- 29/10/2022: a pre-trained MIR model [music2vec](https://huggingface.co/m-a-p/music2vec-v1) trained with **BYOL** paradigm.


Here is a table for quick model pick-up:

| Name                                                         | Pre-train Paradigm | Training Data (hour) | Pre-train Context   (second) | Model Size | Transformer Layer-Dimension | Feature Rate | Sample Rate | Release Date |
| ------------------------------------------------------------ | ------------------ | -------------------- | ---------------------------- | ---------- | --------------------------- | ------------ | ----------- | ------------ |
| [MERT-v1-330M](https://huggingface.co/m-a-p/MERT-v1-330M)    | MLM                | 160K                 | 5                            | 330M       | 24-1024                     | 75 Hz        | 24K Hz      | 17/03/2023   |
| [MERT-v1-95M](https://huggingface.co/m-a-p/MERT-v1-95M)      | MLM                | 20K                  | 5                            | 95M        | 12-768                      | 75 Hz        | 24K Hz      | 17/03/2023   |
| [MERT-v0-public](https://huggingface.co/m-a-p/MERT-v0-public) | MLM                | 900                  | 5                            | 95M        | 12-768                      | 50 Hz        | 16K Hz      | 14/03/2023   |
| [MERT-v0](https://huggingface.co/m-a-p/MERT-v0)              | MLM                | 1000                 | 5                            | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 29/12/2022   |
| [music2vec-v1](https://huggingface.co/m-a-p/music2vec-v1)    | BYOL               | 1000                 | 30                           | 95 M       | 12-768                      | 50 Hz        | 16K Hz      | 30/10/2022   |

## Explanation

The m-a-p models share the similar model architecture and the most distinguished difference is the paradigm in used pre-training. Other than that, there are several nuance technical configuration needs to know before using:

- **Model Size**: the number of parameters that would be loaded to memory. Please select the appropriate size fitting your hardware.
- **Transformer Layer-Dimension**: The number of transformer layers and the corresponding feature dimensions can be outputted from our model. This is marked out because features extracted by **different layers could have various performance depending on tasks**.
- **Feature Rate**: Given a 1-second audio input, the number of features output by the model.
- **Sample Rate**: The frequency of audio that the model is trained with.