Neural Machine Translation Data Generation And Augmentation Using Chatgpt · The Large Language Model Bible Contribute to LLM-Bible

Neural Machine Translation Data Generation And Augmentation Using Chatgpt

Yang Wayne, Nicolai Garrett. Arxiv 2023

[Paper]    
Applications GPT Model Architecture RAG

Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.

Similar Work