Dang Phuong Nam
commited on
Commit
•
938bb7b
1
Parent(s):
d2e9205
Update README.md
Browse files
README.md
CHANGED
@@ -47,7 +47,7 @@ Get relevance scores (higher scores indicate more relevance):
|
|
47 |
```python
|
48 |
from FlagEmbedding import FlagReranker
|
49 |
|
50 |
-
reranker = FlagReranker('namdp/
|
51 |
use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
|
52 |
|
53 |
score = reranker.compute_score(['tỉnh nào có diện tích lớn nhất việt nam', 'nghệ an có diện tích lớn nhất việt nam'])
|
@@ -89,8 +89,8 @@ Get relevance scores (higher scores indicate more relevance):
|
|
89 |
import torch
|
90 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
91 |
|
92 |
-
tokenizer = AutoTokenizer.from_pretrained('namdp/
|
93 |
-
model = AutoModelForSequenceClassification.from_pretrained('namdp/
|
94 |
model.eval()
|
95 |
|
96 |
pairs = [
|
@@ -115,4 +115,32 @@ Train data should be a json file, where each line is a dict like this:
|
|
115 |
|
116 |
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the
|
117 |
relationship between query and texts. If you have no negative texts for a query, you can random sample some from the
|
118 |
-
entire corpus as the negatives.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
47 |
```python
|
48 |
from FlagEmbedding import FlagReranker
|
49 |
|
50 |
+
reranker = FlagReranker('namdp/ViRanker',
|
51 |
use_fp16=True) # Setting use_fp16 to True speeds up computation with a slight performance degradation
|
52 |
|
53 |
score = reranker.compute_score(['tỉnh nào có diện tích lớn nhất việt nam', 'nghệ an có diện tích lớn nhất việt nam'])
|
|
|
89 |
import torch
|
90 |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
|
91 |
|
92 |
+
tokenizer = AutoTokenizer.from_pretrained('namdp/ViRanker')
|
93 |
+
model = AutoModelForSequenceClassification.from_pretrained('namdp/ViRanker')
|
94 |
model.eval()
|
95 |
|
96 |
pairs = [
|
|
|
115 |
|
116 |
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the
|
117 |
relationship between query and texts. If you have no negative texts for a query, you can random sample some from the
|
118 |
+
entire corpus as the negatives.
|
119 |
+
|
120 |
+
## Performance
|
121 |
+
|
122 |
+
In the following table, we provide various pre-trained Cross-Encoders together with their performance on
|
123 |
+
the [MS MMarco Passage Reranking - Vi - Dev](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
|
124 |
+
|
125 |
+
| Model-Name | NDCG@3 | MRR@3 | NDCG@5 | MRR@5 | NDCG@10 | MRR@10 | Docs / Sec |
|
126 |
+
|-----------------------------------------------------------------------------------------------------------------------------------------|:-----------|:-----------|:-----------|:-----------|:-----------|:-----------|:-----------|
|
127 |
+
| [namdp/ViRanker](https://huggingface.co/namdp/ViRanker) | **0.6685** | **0.6564** | 0.6842 | **0.6811** | 0.7278 | **0.6985** | 2.02
|
128 |
+
| [itdainb/PhoRankere](https://huggingface.co/itdainb/PhoRanker) | 0.6625 | 0.6458 | **0.7147** | 0.6731 | **0.7422** | 0.6830 | **15**
|
129 |
+
| [kien-vu-uet/finetuned-phobert-passage-rerank-best-eval](https://huggingface.co/kien-vu-uet/finetuned-phobert-passage-rerank-best-eval) | 0.0963 | 0.0883 | 0.1396 | 0.1131 | 0.1681 | 0.1246 | **15**
|
130 |
+
| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | 0.6087 | 0.5841 | 0.6513 | 0.6062 | 0.6872 | 0.62091 | 3.51
|
131 |
+
| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) | 0.6088 | 0.5908 | 0.6446 | 0.6108 | 0.6785 | 0.6249 | 1.29
|
132 |
+
|
133 |
+
## Citation
|
134 |
+
|
135 |
+
Please cite as
|
136 |
+
|
137 |
+
```Plaintext
|
138 |
+
@misc{ViRanker,
|
139 |
+
title={ViRanker: A Cross-encoder Model for Vietnamese Text Ranking},
|
140 |
+
author={Nam Dang Phuong},
|
141 |
+
year={2024},
|
142 |
+
publisher={Huggingface},
|
143 |
+
journal={huggingface repository},
|
144 |
+
howpublished={\url{https://huggingface.co/namdp/ViRanker}},
|
145 |
+
}
|
146 |
+
```
|