This is an improved version of AfriCOMET-STL (single task) evaluation model: It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference. Different from the original AfriCOMET-STL, this MT evaluation model is based on an improved African enhanced encoder, afro-xlmr-large-76L, which leads better performance on African-related machine translation evaluation, verified in WMT 2024 Metrics Shared Task.
Paper
AfriMTE and AfriCOMET: Empowering COMET to Embrace Under-resourced African Languages (Wang et al., arXiv 2023)
License
Apache-2.0
Usage (AfriCOMET)
Using this model requires unbabel-comet to be installed:
pip install --upgrade pip # ensures that pip is current
pip install unbabel-comet
Then you can use it through comet CLI:
comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model masakhane/africomet-stl
Or using Python:
from comet import download_model, load_from_checkpoint
model_path = download_model("masakhane/africomet-stl-1.1")
model = load_from_checkpoint(model_path)
data = [
{
"src": "Nadal sàkọọ́lẹ̀ ìforígbárí o ní àmì méje sóódo pẹ̀lú ilẹ̀ Canada.",
"mt": "Nadal's head to head record against the Canadian is 7–2.",
"ref": "Nadal scored seven unanswered points against Canada."
},
{
"src": "Laipe yi o padanu si Raoniki ni ere Sisi Brisbeni.",
"mt": "He recently lost against Raonic in the Brisbane Open.",
"ref": "He recently lost to Raoniki in the game Sisi Brisbeni."
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print (model_output)
Intended uses
Our model is intented to be used for MT evaluation.
Given a triplet with (source sentence, translation, reference translation), it outputs a single score between 0 and 1 where 1 represents a perfect translation.
Languages Covered:
There are 76 languages available :
- English (eng)
- Amharic (amh)
- Arabic (ara)
- Somali (som)
- Kiswahili (swa)
- Portuguese (por)
- Afrikaans (afr)
- French (fra)
- isiZulu (zul)
- Malagasy (mlg)
- Hausa (hau)
- chiShona (sna)
- Egyptian Arabic (arz)
- Chichewa (nya)
- Igbo (ibo)
- isiXhosa (xho)
- Yorùbá (yor)
- Sesotho (sot)
- Kinyarwanda (kin)
- Tigrinya (tir)
- Tsonga (tso)
- Oromo (orm)
- Rundi (run)
- Northern Sotho (nso)
- Ewe (ewe)
- Lingala (lin)
- Twi (twi)
- Nigerian Pidgin (pcm)
- Ga (gaa)
- Lozi (loz)
- Luganda (lug)
- Gun (guw)
- Bemba (bem)
- Efik (efi)
- Luvale (lue)
- Luba-Lulua (lua)
- Tonga (toi)
- Tshivenḓa (ven)
- Tumbuka (tum)
- Tetela (tll)
- Isoko (iso)
- Kaonde (kqn)
- Zande (zne)
- Umbundu (umb)
- Mossi (mos)
- Tiv (tiv)
- Luba-Katanga (lub)
- Fula (fuv)
- San Salvador Kongo (kwy)
- Baoulé (bci)
- Ruund (rnd)
- Luo (luo)
- Wolaitta (wal)
- Swazi (ssw)
- Lunda (lun)
- Wolof (wol)
- Nyaneka (nyk)
- Kwanyama (kua)
- Kikuyu (kik)
- Fon (fon)
- Bambara (bam)
- Chokwe (cjk)
- Dinka (dik)
- Dyula (dyu)
- Kabyle (kab)
- Kamba (kam)
- Kabiyè (kbp)
- Kanuri (knc)
- Kimbundu (kmb)
- Kikongo (kon)
- Nuer (nus)
- Sango (sag)
- Tamasheq (taq)
- Tamazight (tzm)
- N'ko (nqo)