---
license: bigscience-bloom-rail-1.0
---

# Model Card for udever-bloom

<!-- Provide a quick summary of what the model is/does. -->

`udever-bloom-560m` is finetuned from [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) via [BitFit](https://aclanthology.org/2022.acl-short.1/) on MS MARCO Passage Ranking, SNLI and MultiNLI data. 
It is a universal embedding model across tasks, natural and programming languages.
(From a technical view, `udever` is merely with some minor improvements to `sgpt-bloom`)

<div align=center><img width="338" height="259" src="https://user-images.githubusercontent.com/26690193/277643721-cdb7f227-cae5-40e1-b6e1-a201bde00339.png" /></div>


## Model Details

### Model Description

- **Developed by:** Alibaba Group
- **Model type:** Transformer-based Language Model (decoder-only)
- **Language(s) (NLP):** Multiple; see [bloom training data](https://huggingface.co/bigscience/bloom-560m#training-data)
- **Finetuned from model :** [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m)

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** [github.com/izhx/uni-rep](https://github.com/izhx/uni-rep)
- **Paper :** [Language Models are Universal Embedders](https://arxiv.org/pdf/2310.08232.pdf)


## How to Get Started with the Model

Use the code below to get started with the model.

```python

```

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

- MS MARCO Passage Ranking, retrieved by (https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_bi-encoder_mnrl.py#L86)
- SNLI and MultiNLI (https://sbert.net/datasets/AllNLI.tsv.gz)


### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

MS MARCO hard negatives provided by (https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_bi-encoder_mnrl.py#L86).
Negatives for SNLI and MultiNLI are randomly sampled.


#### Training Hyperparameters

- **Training regime:** tf32, BitFit
- **Batch size:** 1024
- **Epochs:** 3
- **Optimizer:** AdamW
- **Learning rate:** 1e-4
- **Scheduler:** constant with warmup.
- **Warmup:** 0.25 epoch


## Evaluation

### Table 1: Massive Text Embedding Benchmark [MTEB](https://huggingface.co/spaces/mteb/leaderboard)

| MTEB | Avg.  |  Class.  | Clust.  | PairClass.  | Rerank.  | Retr.  | STS  | Summ.  |
|-----------------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------|
| #Datasets ➡️  | 56  | 12  | 11  | 3  | 4  | 15  | 10  | 1  |
||
| bge-large-en-v1.5 |  **64.23** |  **75.97** |  46.08|  **87.12** |  **60.03** |  **54.29** |  83.11|  31.61 |
| bge-base-en-v1.5 |  63.55|  75.53|  45.77|  86.55|  58.86|  53.25|  82.4|  31.07 |
| gte-large |  63.13|  73.33|  **46.84** |  85|  59.13|  52.22|  **83.35** |  31.66 |
| gte-base |  62.39|  73.01|  46.2|  84.57|  58.61|  51.14|  82.3|  31.17 |
| e5-large-v2      |  62.25|  75.24|  44.49|  86.03|  56.61|  50.56|  82.05|  30.19 |
| instructor-xl    |  61.79|  73.12|  44.74|  86.62|  57.29|  49.26|  83.06|  32.32 |
| instructor-large |  61.59|  73.86|  45.29|  85.89|  57.54|  47.57|  83.15|  31.84 |
| e5-base-v2       |  61.5 |  73.84|  43.8|  85.73|  55.91|  50.29|  81.05|  30.28 |
| e5-large         |  61.42|  73.14|  43.33|  85.94|  56.53|  49.99|  82.06|  30.97 |
| text-embedding-ada-002 (OpenAI API)         |  60.99|  70.93|  45.9 |  84.89|  56.32|  49.25|  80.97|  30.8  |
| e5-base          |  60.44|  72.63|  42.11|  85.09|  55.7 |  48.75|  80.96|  31.01 |
| SGPT-5.8B-msmarco  |  58.93|  68.13|  40.34|  82   |  56.56|  50.25|  78.1 |  31.46 |
| sgpt-bloom-7b1-msmarco |  57.59|  66.19|  38.93|  81.9 |  55.65|  48.22|  77.74|  **33.6**  |
||
| Udever-bloom-560m |   55.80|    68.04|  36.89|  81.05|  52.60|  41.19|  79.93|  32.06 |
| Udever-bloom-1b1 |   58.28|   70.18|  39.11|  83.11|  54.28|  45.27|  81.52|  31.10 |
| Udever-bloom-3b  |   59.86|     71.91|  40.74|  84.06|  54.90|  47.67|  82.37|  30.62 |
| Udever-bloom-7b1  |  60.63 |    72.13|  40.81|  85.40|  55.91|  49.34|  83.01|  30.97   |


### Table 2: [CodeSearchNet](https://github.com/github/CodeSearchNet)

| CodeSearchNet | Go  | Ruby  | Python  | Java  | JS  | PHP  | Avg. |
|-|-|-|-|-|-|-|-|
| CodeBERT  | 69.3  | 70.6  | 84.0  | 86.8  | 74.8  | 70.6  | 76.0 |
| GraphCodeBERT  | 84.1  | 73.2  | 87.9  | 75.7  | 71.1  | 72.5  | 77.4 |
| cpt-code S  | **97.7**  | **86.3**  | 99.8  | 94.0  | 86.0  | 96.7  | 93.4 |
| cpt-code M  | 97.5  | 85.5  | **99.9**  | **94.4**  | **86.5**  | **97.2**  | **93.5** |
| sgpt-bloom-7b1-msmarco  | 76.79  | 69.25  | 95.68  | 77.93  | 70.35  | 73.45  | 77.24 |
||
| Udever-bloom-560m   | 75.38  | 66.67  | 96.23  | 78.99  | 69.39  | 73.69  | 76.73 |
| Udever-bloom-1b1   | 78.76  | 72.85  | 97.67  | 82.77  | 74.38  | 78.97  | 80.90 |
| Udever-bloom-3b   | 80.63  | 75.40  | 98.02  | 83.88  | 76.18  | 79.67  | 82.29 |
| Udever-bloom-7b1   | 79.37  | 76.59  | 98.38  | 84.68  | 77.49  | 80.03  | 82.76 |


### Table 3: Chinese multi-domain retrieval [Multi-cpr](https://dl.acm.org/doi/10.1145/3477495.3531736)

| | | |E-commerce | | Entertainment video | | Medical |   |
|--|--|--|--|--|--|--|--|--|
| Model | Train | Backbone | MRR@10 | Recall@1k | MRR@10 | Recall@1k | MRR@10 | Recall@1k |
||
| BM25 | - | - | 0.225 | 0.815 | 0.225 | 0.780 | 0.187 | 0.482 |
| Doc2Query | - | - | 0.239 | 0.826 | 0.238 | 0.794 | 0.210 | 0.505 |
| DPR-1 | In-Domain | BERT | 0.270 | 0.921 | 0.254 | 0.934 | 0.327 | 0.747 |
| DPR-2 | In-Domain | BERT-CT | 0.289 | **0.926** | 0.263 | **0.935** | 0.339  | **0.769** |
| text-embedding-ada-002 | General | GPT | 0.183 | 0.825 | 0.159 | 0.786 | 0.245 | 0.593 |
| sgpt-bloom-7b1-msmarco | General | BLOOM | 0.242 | 0.840 | 0.227 | 0.829 | 0.311 | 0.675 |
||
 | Udever-bloom-560m | General | BLOOM | 0.156 | 0.802 | 0.149 | 0.749 | 0.245  | 0.571 |
 | Udever-bloom-1b1 | General | BLOOM | 0.244 | 0.863 | 0.208 | 0.815 | 0.241  | 0.557 |
 | Udever-bloom-3b | General | BLOOM | 0.267 | 0.871 | 0.228 | 0.836 | 0.288  | 0.619 |
 | Udever-bloom-7b1 | General | BLOOM | **0.296** | 0.889 | **0.267** | 0.907 | **0.343**  | 0.705 |

#### More results refer to [paper](https://arxiv.org/pdf/2310.08232.pdf) section 3.


## Technical Specifications

### Model Architecture and Objective

- Model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m).
- Objective: Constrastive loss with hard negatives (refer to [paper](https://arxiv.org/pdf/2310.08232.pdf) section 2.2).


### Compute Infrastructure

- Nvidia A100 SXM4 80GB.
- torch 2.0.0, transformers 4.29.2.


## Citation

**BibTeX:**

```BibTeX
@article{zhang2023language,
  title={Language Models are Universal Embedders},
  author={Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min},
  journal={arXiv preprint arXiv:2310.08232},
  year={2023}
}
```