Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
tags:
|
4 |
- gte
|
5 |
- mteb
|
@@ -2627,7 +2627,7 @@ We also present the [`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-N
|
|
2627 |
| Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
|
2628 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
|
2629 |
|[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
|
2630 |
-
|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English |
|
2631 |
|[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
|
2632 |
|
2633 |
|
@@ -2665,7 +2665,7 @@ print(scores.tolist())
|
|
2665 |
**It is recommended to install xformers and enable unpadding for acceleration, refer to [enable-unpadding-and-xformers](https://huggingface.co/Alibaba-NLP/test-impl#recommendation-enable-unpadding-and-acceleration-with-xformers).**
|
2666 |
|
2667 |
|
2668 |
-
Use with sentence-transformers
|
2669 |
|
2670 |
```python
|
2671 |
from sentence_transformers import SentenceTransformer
|
@@ -2673,7 +2673,7 @@ from sentence_transformers.util import cos_sim
|
|
2673 |
|
2674 |
sentences = ['That is a happy person', 'That is a very happy person']
|
2675 |
|
2676 |
-
model = SentenceTransformer('Alibaba-NLP/gte-base-en-v1.5')
|
2677 |
embeddings = model.encode(sentences)
|
2678 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2679 |
```
|
@@ -2686,8 +2686,13 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
2686 |
- Weak-supervised contrastive (WSC) pre-training: GTE pre-training data
|
2687 |
- Supervised contrastive fine-tuning: GTE fine-tuning data
|
2688 |
|
2689 |
-
### Training Procedure
|
2690 |
|
|
|
|
|
|
|
|
|
|
|
2691 |
- MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000
|
2692 |
- MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000
|
2693 |
- WSC: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000
|
@@ -2701,11 +2706,11 @@ print(cos_sim(embeddings[0], embeddings[1]))
|
|
2701 |
|
2702 |
The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
|
2703 |
|
2704 |
-
The gte
|
2705 |
|
2706 |
| Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
|
2707 |
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
2708 |
-
| [**gte-large-en-v1.5**](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) |
|
2709 |
| [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 |
|
2710 |
| [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 |
|
2711 |
| [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 |
|
|
|
1 |
---
|
2 |
+
library_name: sentence-transformers
|
3 |
tags:
|
4 |
- gte
|
5 |
- mteb
|
|
|
2627 |
| Models | Language | Model Size | Max Seq. Length | Dimension | MTEB-en | LoCo |
|
2628 |
|:-----: | :-----: |:-----: |:-----: |:-----: | :-----: | :-----: |
|
2629 |
|[`gte-Qwen1.5-7B-instruct`](https://huggingface.co/Alibaba-NLP/gte-Qwen1.5-7B-instruct)| English | 7720 | 32768 | 4096 | 67.34 | 87.57 |
|
2630 |
+
|[`gte-large-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | English | 434 | 8192 | 1024 | 65.39 | 86.71 |
|
2631 |
|[`gte-base-en-v1.5`](https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5) | English | 137 | 8192 | 768 | 64.11 | 87.44 |
|
2632 |
|
2633 |
|
|
|
2665 |
**It is recommended to install xformers and enable unpadding for acceleration, refer to [enable-unpadding-and-xformers](https://huggingface.co/Alibaba-NLP/test-impl#recommendation-enable-unpadding-and-acceleration-with-xformers).**
|
2666 |
|
2667 |
|
2668 |
+
Use with `sentence-transformers`:
|
2669 |
|
2670 |
```python
|
2671 |
from sentence_transformers import SentenceTransformer
|
|
|
2673 |
|
2674 |
sentences = ['That is a happy person', 'That is a very happy person']
|
2675 |
|
2676 |
+
model = SentenceTransformer('Alibaba-NLP/gte-base-en-v1.5', trust_remote_code=True)
|
2677 |
embeddings = model.encode(sentences)
|
2678 |
print(cos_sim(embeddings[0], embeddings[1]))
|
2679 |
```
|
|
|
2686 |
- Weak-supervised contrastive (WSC) pre-training: GTE pre-training data
|
2687 |
- Supervised contrastive fine-tuning: GTE fine-tuning data
|
2688 |
|
2689 |
+
### Training Procedure
|
2690 |
|
2691 |
+
To enable the backbone model to support a context length of 8192, we adopted a multi-stage training strategy.
|
2692 |
+
The model first undergoes preliminary MLM pre-training on shorter lengths.
|
2693 |
+
And then, we resample the data, reducing the proportion of short texts, and continue the MLM pre-training.
|
2694 |
+
|
2695 |
+
The entire training process is as follows:
|
2696 |
- MLM-2048: lr 5e-4, mlm_probability 0.3, batch_size 4096, num_steps 70000, rope_base 10000
|
2697 |
- MLM-8192: lr 5e-5, mlm_probability 0.3, batch_size 1024, num_steps 20000, rope_base 500000
|
2698 |
- WSC: max_len 512, lr 2e-4, batch_size 32768, num_steps 100000
|
|
|
2706 |
|
2707 |
The results of other models are retrieved from [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard).
|
2708 |
|
2709 |
+
The gte evaluation setting: `mteb==1.2.0, fp16 auto mix precision, max_length=8192`, and set ntk scaling factor to 2 (equivalent to rope_base * 2).
|
2710 |
|
2711 |
| Model Name | Param Size (M) | Dimension | Sequence Length | Average (56) | Class. (12) | Clust. (11) | Pair Class. (3) | Reran. (4) | Retr. (15) | STS (10) | Summ. (1) |
|
2712 |
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
2713 |
+
| [**gte-large-en-v1.5**](https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) | 434 | 1024 | 8192 | **65.39** | 77.75 | 47.95 | 84.63 | 58.50 | 57.91 | 81.43 | 30.91 |
|
2714 |
| [mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) | 335 | 1024 | 512 | 64.68 | 75.64 | 46.71 | 87.2 | 60.11 | 54.39 | 85 | 32.71 |
|
2715 |
| [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) | 560 | 1024 | 514 | 64.41 | 77.56 | 47.1 | 86.19 | 58.58 | 52.47 | 84.78 | 30.39 |
|
2716 |
| [bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5)| 335 | 1024 | 512 | 64.23 | 75.97 | 46.08 | 87.12 | 60.03 | 54.29 | 83.11 | 31.61 |
|