benjamin's picture
initial commit
bcfd134
|
raw
history blame
1.93 kB
metadata
language: sw
license: mit

gpt2-wechsel-swahili

Model trained with WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

See the code here: https://github.com/CPJKU/wechsel

And the paper here: https://arxiv.org/abs/2112.06598

Performance

RoBERTa

Model NLI Score NER Score Avg Score
roberta-base-wechsel-french 82.43 90.88 86.65
camembert-base 80.88 90.26 85.57
Model NLI Score NER Score Avg Score
roberta-base-wechsel-german 81.79 89.72 85.76
deepset/gbert-base 78.64 89.46 84.05
Model NLI Score NER Score Avg Score
roberta-base-wechsel-chinese 78.32 80.55 79.44
bert-base-chinese 76.55 82.05 79.30
Model NLI Score NER Score Avg Score
roberta-base-wechsel-swahili 75.05 87.39 81.22
xlm-roberta-base 69.18 87.37 78.28

GPT2

Model PPL
gpt2-wechsel-french 19.71
gpt2 (retrained from scratch) 20.47
Model PPL
gpt2-wechsel-german 26.8
gpt2 (retrained from scratch) 27.63
Model PPL
gpt2-wechsel-chinese 51.97
gpt2 (retrained from scratch) 52.98
Model PPL
gpt2-wechsel-swahili 10.14
gpt2 (retrained from scratch) 10.58

See our paper for details.

Citation

Please cite WECHSEL as

@misc{minixhofer2021wechsel,
      title={WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models}, 
      author={Benjamin Minixhofer and Fabian Paischer and Navid Rekabsaz},
      year={2021},
      eprint={2112.06598},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}