namdp-ptit
commited on
Commit
•
d0fa1c4
1
Parent(s):
ee6bccb
Update README.md
Browse files
README.md
CHANGED
@@ -1,22 +1,22 @@
|
|
1 |
---
|
2 |
language:
|
3 |
-
- vi
|
4 |
license: apache-2.0
|
5 |
library_name: transformers
|
6 |
tags:
|
7 |
-
- transformers
|
8 |
-
- cross-encoder
|
9 |
-
- rerank
|
10 |
datasets:
|
11 |
-
- unicamp-dl/mmarco
|
12 |
pipeline_tag: text-classification
|
13 |
widget:
|
14 |
-
- text: tỉnh nào có diện tích lớn nhất việt nam
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
---
|
21 |
|
22 |
# Reranker
|
@@ -27,6 +27,8 @@ widget:
|
|
27 |
* [Fine tune](#fine-tune)
|
28 |
* [Data format](#data-format)
|
29 |
* [Performance](#performance)
|
|
|
|
|
30 |
* [Citation](#citation)
|
31 |
|
32 |
Different from embedding model, reranker uses question and document as input and directly output similarity instead of
|
@@ -116,7 +118,8 @@ Train data should be a json file, where each line is a dict like this:
|
|
116 |
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts. If you have no negative
|
117 |
texts for a query, you can random sample some from the entire corpus as the negatives.
|
118 |
|
119 |
-
Besides, for each query in the train data, we used LLMs to generate hard negative for them by asking LLMs to create a
|
|
|
120 |
|
121 |
## Performance
|
122 |
|
@@ -131,6 +134,28 @@ the [MS MMarco Passage Reranking - Vi - Dev](https://huggingface.co/datasets/uni
|
|
131 |
| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | 0.6087 | 0.5841 | 0.6513 | 0.6062 | 0.6872 | 0.62091 | 3.51
|
132 |
| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) | 0.6088 | 0.5908 | 0.6446 | 0.6108 | 0.6785 | 0.6249 | 1.29
|
133 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
## Citation
|
135 |
|
136 |
Please cite as
|
|
|
1 |
---
|
2 |
language:
|
3 |
+
- vi
|
4 |
license: apache-2.0
|
5 |
library_name: transformers
|
6 |
tags:
|
7 |
+
- transformers
|
8 |
+
- cross-encoder
|
9 |
+
- rerank
|
10 |
datasets:
|
11 |
+
- unicamp-dl/mmarco
|
12 |
pipeline_tag: text-classification
|
13 |
widget:
|
14 |
+
- text: tỉnh nào có diện tích lớn nhất việt nam
|
15 |
+
output:
|
16 |
+
- label: nghệ an có diện tích lớn nhất việt nam
|
17 |
+
score: 0.99999
|
18 |
+
- label: bắc ninh có diện tích nhỏ nhất việt nam
|
19 |
+
score: 0.0001
|
20 |
---
|
21 |
|
22 |
# Reranker
|
|
|
27 |
* [Fine tune](#fine-tune)
|
28 |
* [Data format](#data-format)
|
29 |
* [Performance](#performance)
|
30 |
+
* [Contact](#contact)
|
31 |
+
* [Support The Project](#support-the-project)
|
32 |
* [Citation](#citation)
|
33 |
|
34 |
Different from embedding model, reranker uses question and document as input and directly output similarity instead of
|
|
|
118 |
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts. If you have no negative
|
119 |
texts for a query, you can random sample some from the entire corpus as the negatives.
|
120 |
|
121 |
+
Besides, for each query in the train data, we used LLMs to generate hard negative for them by asking LLMs to create a
|
122 |
+
document that is the opposite one of the documents in 'pos'.
|
123 |
|
124 |
## Performance
|
125 |
|
|
|
134 |
| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | 0.6087 | 0.5841 | 0.6513 | 0.6062 | 0.6872 | 0.62091 | 3.51
|
135 |
| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) | 0.6088 | 0.5908 | 0.6446 | 0.6108 | 0.6785 | 0.6249 | 1.29
|
136 |
|
137 |
+
## Contact
|
138 |
+
|
139 |
+
Email: [email protected]
|
140 |
+
|
141 |
+
LinkedIn: [Dang Phuong Nam](https://www.linkedin.com/in/dang-phuong-nam-157912288/)
|
142 |
+
|
143 |
+
Facebook: [Phương Nam](https://www.facebook.com/phuong.namdang.7146557)
|
144 |
+
|
145 |
+
## Support The Project
|
146 |
+
|
147 |
+
If you find this project helpful and wish to support its ongoing development, here are some ways you can contribute:
|
148 |
+
|
149 |
+
1. **Star the Repository**: Show your appreciation by starring the repository. Your support motivates further
|
150 |
+
development
|
151 |
+
and enhancements.
|
152 |
+
2. **Contribute**: We welcome your contributions! You can help by reporting bugs, submitting pull requests, or
|
153 |
+
suggesting new features.
|
154 |
+
3. **Donate**: If you’d like to support financially, consider making a donation. You can donate through:
|
155 |
+
- Vietcombank: 9912692172 - DANG PHUONG NAM
|
156 |
+
|
157 |
+
Thank you for your support!
|
158 |
+
|
159 |
## Citation
|
160 |
|
161 |
Please cite as
|