Post
2733
I've just stumbled upon some excellent work on (π«π· French) retrieval models by
@antoinelouis
. Kudos to him!
- French Embedding Models: https://huggingface.co/collections/antoinelouis/dense-single-vector-bi-encoders-651523c0c75a3d4c44fc864d
- French Reranker Models: antoinelouis/cross-encoder-rerankers-651523f16efa656d1788a239
- French Multi-vector Models: https://huggingface.co/collections/antoinelouis/dense-multi-vector-bi-encoders-6589a8ee6b17c06872e9f075
- Multilingual Models: https://huggingface.co/collections/antoinelouis/modular-retrievers-65d53d0db64b1d644aea620c
A lot of these models use the MS MARCO Hard Negatives dataset, which I'm currently reformatting to be more easily usable. Notably, they should work out of the box without any pre-processing for training embedding models in the upcoming Sentence Transformers v3.
- French Embedding Models: https://huggingface.co/collections/antoinelouis/dense-single-vector-bi-encoders-651523c0c75a3d4c44fc864d
- French Reranker Models: antoinelouis/cross-encoder-rerankers-651523f16efa656d1788a239
- French Multi-vector Models: https://huggingface.co/collections/antoinelouis/dense-multi-vector-bi-encoders-6589a8ee6b17c06872e9f075
- Multilingual Models: https://huggingface.co/collections/antoinelouis/modular-retrievers-65d53d0db64b1d644aea620c
A lot of these models use the MS MARCO Hard Negatives dataset, which I'm currently reformatting to be more easily usable. Notably, they should work out of the box without any pre-processing for training embedding models in the upcoming Sentence Transformers v3.