On The Multilingual Capabilities Of Very Large-scale English Language Models

Armengol-estapé Jordi, Bonet Ona De Gibert, Melero Maite. Arxiv 2021

[Paper]
Few Shot GPT Language Modeling Model Architecture Pretraining Methods Reinforcement Learning Training Techniques Transformer

Generative Pre-trained Transformers (GPTs) have recently been scaled to unprecedented sizes in the history of machine learning. These models, solely trained on the language modeling objective, have been shown to exhibit outstanding few-shot learning capabilities in a number of different tasks. Nevertheless, aside from anecdotal experiences, little is known regarding their multilingual capabilities, given the fact that the pre-training corpus is almost entirely composed of English text. In this work, we investigate the multilingual skills of GPT-3, focusing on one language that barely appears in the pre-training corpus, Catalan, which makes the results especially meaningful; we assume that our results may be relevant for other languages as well. We find that the model shows an outstanding performance, particularly in generative tasks, with predictable limitations mostly in language understanding tasks but still with remarkable results given the zero-shot scenario. We investigate its potential and limits in extractive question-answering and natural language generation, as well as the effect of scale in terms of model size.

The Large Language Model Bible

On The Multilingual Capabilities Of Very Large-scale English Language Models

Armengol-estapé Jordi, Bonet Ona De Gibert, Melero Maite. Arxiv 2021

Similar Work