--- language: - ar pipeline_tag: token-classification tags: - NER - Darija widget: - text: "دونالد طرامب هو الرئيس لفايت د ميريكان" - text: "لمقار ديال OPEC كاين ف فيينا العاصمة ديال لوتريش" - text: "عوينة يغومان جماعة ترابية قروية كاينة ف إقليم آسا الزاݣ" --- # darija-ner This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4. ### Model Description - **Developed by:** Hanane Nour Moussa - **Model type:** Token classification - **Language(s) (NLP):** Arabic, Darija ### Model Sources - **Repository:** https://github.com/HananeNourMoussa/darija-ner - **Paper (dataset):** Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief #### Metrics F1 score. ### Results DarNERcorp_test: F1 = 66.06% MixedNERcorp_test: F1 = 70.06% ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** NVIDIA T4 - **Hours used:** 0.7 - **Cloud Provider:** Google Cloud - **Compute Region:** europe-west1 - **Carbon Emitted:** 0.01 kg ## Citation If you use DarNERcorp dataset to train your models, cite the following paper: Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief, Volume 48, 2023, 109234, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.109234. (https://www.sciencedirect.com/science/article/pii/S2352340923003530) ## GitHub Repo: Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner