Post
Keep stacking cool stuff and getting better results! After I changed the standard vision encoder to SigLIP, NLLB-CLIP got a 10% average performance improvement. And now, I added matryoshka layers (https://arxiv.org/abs/2205.13147) to enable smaller embeddings and got another 6% performance boost! Plus, thanks to MRL, 4.5x smaller embeddings retain 90%+ quality.
The large model is finally SoTA for both image and text multilingual retrieval!
The models are available on the hub:
- visheratin/nllb-siglip-mrl-base
- visheratin/nllb-siglip-mrl-large
The large model is finally SoTA for both image and text multilingual retrieval!
The models are available on the hub:
- visheratin/nllb-siglip-mrl-base
- visheratin/nllb-siglip-mrl-large