Kwaiyiimath: Technical Report · The Large Language Model Bible Contribute to LLM-Bible

Kwaiyiimath: Technical Report

Fu Jiayi, Lin Lei, Gao Xiaoyang, Liu Pengli, Chen Zhengzong, Yang Zhirui, Zhang Shengnan, Zheng Xue, Li Yan, Liu Yuliang, Ye Xucheng, Liao Yiqiao, Liao Chao, Chen Bin, Song Chengru, Wan Junchen, Lin Zijia, Zhang Fuzheng, Wang Zhongyuan, Zhang Di, Gai Kun. Arxiv 2023

[Paper]    
Fine Tuning Pretraining Methods Reinforcement Learning Training Techniques

Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.

Similar Work