As Good As New. How To Successfully Recycle English GPT-2 To Make Models For Other Languages

Wietse De Vries, Malvina Nissim. Findings of the Association for Computational Linguistics ACL-IJCNLP 2021 2020 – 23 citations

[Paper]
Training Techniques Transformer GPT RAG Model Architecture

Large generative language models have been very successful for English, but other languages lag behind, in part due to data and computational limitations. We propose a method that may overcome these problems by adapting existing pre-trained models to new languages. Specifically, we describe the adaptation of English GPT-2 to Italian and Dutch by retraining lexical embeddings without tuning the Transformer layers. As a result, we obtain lexical embeddings for Italian and Dutch that are aligned with the original English lexical embeddings. Additionally, we scale up complexity by transforming relearned lexical embeddings of GPT-2 small to the GPT-2 medium embedding space. This method minimises the amount of training and prevents losing information during adaptation that was learned by GPT-2. English GPT-2 models with relearned lexical embeddings can generate realistic sentences in Italian and Dutch. Though on average these sentences are still identifiable as artificial by humans, they are assessed on par with sentences generated by a GPT-2 model fully trained from scratch.

The Large Language Model Bible

As Good As New. How To Successfully Recycle English GPT-2 To Make Models For Other Languages

Wietse De Vries, Malvina Nissim. Findings of the Association for Computational Linguistics ACL-IJCNLP 2021 2020 – 23 citations

Similar Work