Edit model card

akhooli/arabic-colbertv2-250k-norm

This Arabic ColBERT model is reasonably, but not fully, trained on 250k normalized queries sampled from the Arabic mMARCO dataset. Training parameters are in the metadata file. See https://www.linkedin.com/posts/akhooli_arabic-bert-tokenizers-you-may-need-to-normalize-activity-7225747473523216384-D1oH
Please note that there is another model trained (partially) on normalized 711k dataset: akhooli/arabic-colbertv2-711k-norm.

This model should be good for ranking and retrieval but not for critical tasks. A demo example using it is the Quran Semantic Search. If you downloaded it before Aug. 6, 2024, you are advised to refresh your copy.

You need to normalize your query and document(s) for better results:

from unicodedata import normalize
query_n = normalize('NFKC', query)
Downloads last month
49
Safetensors
Model size
135M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .