Internlm2 Technical Report
Cai Zheng, Cao Maosong, Chen Haojiong, Chen Kai, Chen Keyu, Chen Xin, Chen Xun, Chen Zehui, Chen Zhi, Chu Pei, Dong Xiaoyi, Duan Haodong, Fan Qi, Fei Zhaoye, Gao Yang, Ge Jiaye, Gu Chenya, Gu Yuzhe, Gui Tao, Guo Aijia, Guo Qipeng, He Conghui, Hu Yingfan, Huang Ting, Jiang Tao, Jiao Penglong, Jin Zhenjiang, Lei Zhikai, Li Jiaxing, Li Jingwen, Li Linyang, Li Shuaibin, Li Wei, Li Yining, Liu Hongwei, Liu Jiangning, Hong Jiawei, Liu Kaiwen, Liu Kuikun, Liu Xiaoran, Lv Chengqi, Lv Haijun, Lv Kai, Ma Li, Ma Runyuan, Ma Zerun, Ning Wenchang, Ouyang Linke, Qiu Jiantao, Qu Yuan, Shang Fukai, Shao Yunfan, Song Demin, Song Zifan, Sui Zhihao, Sun Peng, Sun Yu, Tang Huanze, Wang Bin, Wang Guoteng, Wang Jiaqi, Wang Jiayu, Wang Rui, Wang Yudong, Wang Ziyi, Wei Xingjian, Weng Qizhen, Wu Fan, Xiong Yingtong, Xu Chao, Xu Ruiliang, Yan Hang, Yan Yirong, Yang Xiaogui, Ye Haochen, Ying Huaiyuan, Yu Jia, Yu Jing, Zang Yuhang, Zhang Chuyu, Zhang Li, Zhang Pan, Zhang Peng, Zhang Ruijie, Zhang Shuo, Zhang Songyang, Zhang Wenjian, Zhang Wenwei, Zhang Xingcheng, Zhang Xinyue, Zhao Hui, Zhao Qian, Zhao Xiaomeng, Zhou Fengzhe, Zhou Zaida, Zhuo Jingming, Zou Yicheng, Qiu Xipeng, Qiao Yu, Lin Dahua. Arxiv 2024
[Paper]
Agentic
Efficiency And Optimization
Fine Tuning
GPT
Model Architecture
Pretraining Methods
Reinforcement Learning
Training Techniques
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has
sparked discussions on the advent of Artificial General Intelligence (AGI).
However, replicating such advancements in open-source models has been
challenging. This paper introduces InternLM2, an open-source LLM that
outperforms its predecessors in comprehensive evaluations across 6 dimensions
and 30 benchmarks, long-context modeling, and open-ended subjective evaluations
through innovative pre-training and optimization techniques. The pre-training
process of InternLM2 is meticulously detailed, highlighting the preparation of
diverse data types including text, code, and long-context data. InternLM2
efficiently captures long-term dependencies, initially trained on 4k tokens
before advancing to 32k tokens in pre-training and fine-tuning stages,
exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack” test.
InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel
Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF)
strategy that addresses conflicting human preferences and reward hacking. By
releasing InternLM2 models in different training stages and model sizes, we
provide the community with insights into the model’s evolution.
Similar Work