"vorbe\c{s}ti Rom\^ane\c{s}te?" A Recipe To Train Powerful Romanian Llms With English Instructions · The Large Language Model Bible Contribute to LLM-Bible

"vorbe\c{s}ti Rom\^ane\c{s}te?" A Recipe To Train Powerful Romanian Llms With English Instructions

Masala Mihai, Ilie-ablachim Denis C., Dima Alexandru, Corlatescu Dragos, Zavelca Miruna, Olaru Ovio, Terian Simina, Terian Andrei, Leordeanu Marius, Velicu Horia, Popescu Marius, Dascalu Mihai, Rebedea Traian. Arxiv 2024

[Paper]    
RAG Reinforcement Learning Training Techniques

In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and train, evaluate, and release open-source LLMs tailored for Romanian. We evaluate our methods on four different categories, including academic benchmarks, MT-Bench (manually translated), and a professionally built historical, cultural, and social benchmark adapted to Romanian. We argue for the usefulness and high performance of RoLLMs by obtaining state-of-the-art results across the board. We publicly release all resources (i.e., data, training and evaluation code, models) to support and encourage research on Romanian LLMs while concurrently creating a generalizable recipe, adequate for other low or less-resourced languages.

Similar Work