metadata

language: sw
license: mit

gpt2-wechsel-swahili

Model trained with WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

See the code here: https://github.com/CPJKU/wechsel

And the paper here: https://arxiv.org/abs/2112.06598

Performance

RoBERTa

Model	NLI Score	NER Score	Avg Score
`roberta-base-wechsel-french`	82.43	90.88	86.65
`camembert-base`	80.88	90.26	85.57

Model	NLI Score	NER Score	Avg Score
`roberta-base-wechsel-german`	81.79	89.72	85.76
`deepset/gbert-base`	78.64	89.46	84.05

Model	NLI Score	NER Score	Avg Score
`roberta-base-wechsel-chinese`	78.32	80.55	79.44
`bert-base-chinese`	76.55	82.05	79.30

Model	NLI Score	NER Score	Avg Score
`roberta-base-wechsel-swahili`	75.05	87.39	81.22
`xlm-roberta-base`	69.18	87.37	78.28

GPT2

Model	PPL
`gpt2-wechsel-french`	19.71
`gpt2` (retrained from scratch)	20.47

Model	PPL
`gpt2-wechsel-german`	26.8
`gpt2` (retrained from scratch)	27.63

Model	PPL
`gpt2-wechsel-chinese`	51.97
`gpt2` (retrained from scratch)	52.98

Model	PPL
`gpt2-wechsel-swahili`	10.14
`gpt2` (retrained from scratch)	10.58

See our paper for details.

Citation

Please cite WECHSEL as

@misc{minixhofer2021wechsel,
      title={WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models}, 
      author={Benjamin Minixhofer and Fabian Paischer and Navid Rekabsaz},
      year={2021},
      eprint={2112.06598},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}