arxiv:2012.05628

As Good as New. How to Successfully Recycle English GPT-2 to Make Models for Other Languages

Published on Dec 10, 2020

Upvote

Authors:

Wietse de Vries ,

Malvina Nissim

Abstract

Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch. Though on average these sentences are still identifiable as artificial by humans, they are assessed on par with sentences generated by a GPT-2 model fully trained from scratch.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 18

Browse 18 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2012.05628 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.