Moral: Moe Augmented Lora For Llms' Lifelong Learning

Yang Shu, Ali Muhammad Asif, Wang Cheng-long, Hu Lijie, Wang Di. Arxiv 2024

[Paper]
Fine Tuning Pretraining Methods Training Techniques

Adapting large language models (LLMs) to new domains/tasks and enabling them to be efficient lifelong learners is a pivotal challenge. In this paper, we propose MoRAL, i.e., Mixture-of-Experts augmented Low-Rank Adaptation for Lifelong Learning. MoRAL combines the multi-tasking abilities of MoE with the fine-tuning abilities of LoRA for effective life-long learning of LLMs. In contrast to the conventional approaches that use factual triplets as inputs MoRAL relies on simple question-answer pairs, which is a more practical and effective strategy for robust and efficient learning. Owing to new data settings, we introduce a new evaluation benchmark namely: Life Long Learning of LLM (5L-bench) encompassing a newly curated dataset of question-answer pairs, and a set of evaluation metrics for rigorous evaluation of MoRAL in open-book and closed-book settings. Experimental evaluation shows (i) LLMs learn fast in open-book settings with up to 30.15% improvement in “RA” for Phi-2-2.7B compared to closed-book (for models fine-tuned with MoRAL); (ii) MoRAL shows higher performance improvement for models with a greater number of parameters; (iii) MoRAL is robust to catastrophic forgetting offering better knowledge retention compared to baselines.

The Large Language Model Bible

Moral: Moe Augmented Lora For Llms' Lifelong Learning

Yang Shu, Ali Muhammad Asif, Wang Cheng-long, Hu Lijie, Wang Di. Arxiv 2024

Similar Work