Hyperclova X Technical Report · The Large Language Model Bible Contribute to LLM-Bible

Hyperclova X Technical Report

Yoo Kang Min, Han Jaegeun, In Sookyo, Jeon Heewon, Jeong Jisu, Kang Jaewook, Kim Hyunwook, Kim Kyung-min, Kim Munhyong, Kim Sungju, Kwak Donghyun, Kwak Hanock, Kwon Se Jung, Lee Bado, Lee Dongsoo, Lee Gichang, Lee Jooho, Park Baeseong, Shin Seongjin, Yu Joonsang, Baek Seolki, Byeon Sumin, Cho Eungsup, Choe Dooseok, Han Jeesung, Jin Youngkyun, Jun Hyein, Jung Jaeseung, Kim Chanwoong, Kim Jinhong, Kim Jinuk, Lee Dokyeong, Park Dongwook, Sohn Jeong Min, Han Sujung, Heo Jiae, Hong Sungju, Jeon Mina, Jung Hyunhoon, Jung Jungeun, Jung Wangkyo, Kim Chungjoon, Kim Hyeri, Kim Jonghyun, Kim Min Young, Lee Soeun, Park Joonhee, Shin Jieun, Yang Sojin, Yoon Jungsoon, Lee Hwaran, Bae Sanghwan, Cha Jeehwan, Gylleus Karl, Ham Donghoon, Hong Mihak, Hong Youngki, Hong Yunki, Jang Dahyun, Jeon Hyojun, Jeon Yujin, Jeong Yeji, Ji Myunggeun, Jin Yeguk, Jo Chansong, Joo Shinyoung, Jung Seunghwan, Kim Adrian Jungmyung, Kim Byoung Hoon, Kim Hyomin, Kim Jungwhan, Kim Minkyoung, Kim Minseung, Kim Sungdong, Kim Yonghee, Kim Youngjun, Kim Youngkwan, Ko Donghyeon, Lee Dughyun, Lee Ha Young, Lee Jaehong, Lee Jieun, Lee Jonghyun, Lee Jongjin, Lee Min Young, Lee Yehbin, Min Taehong, Min Yuri, Moon Kiyoon, Oh Hyangnam, Park Jaesun, Park Kyuyon, Park Younghun, Seo Hanbae, Seo Seunghyun, Sim Mihyun, Son Gyubin, Yeo Matt, Yeom Kyung Hoon, Yoo Wonjoon, You Myungin, Ahn Doheon, Ahn Homin, Ahn Joohee, Ahn Seongmin, An Chanwoo, An Hyeryun, An Junho, An Sang-min, Byun Boram, Byun Eunbin, Cha Jongho, Chang Minji, Chang Seunggyu, Cho Haesong, Cho Youngdo, Choi Dalnim, Choi Daseul, Choi Hyoseok, Choi Minseong, Choi Sangho, Choi Seongjae, Choi Wooyong, Chun Sewhan, Go Dong Young, Ham Chiheon, Han Danbi, Han Jaemin, Hong Moonyoung, Hong Sung Bum, Hwang Dong-hyun, Hwang Seongchan, Im Jinbae, Jang Hyuk Jin, Jang Jaehyung, Jang Jaeni, Jang Sihyeon, Jang Sungwon, Jeon Joonha, Jeong Daun, Jeong Joonhyun, Jeong Kyeongseok, Jeong Mini, Jin Sol, Jo Hanbyeol, Jo Hanju, Jo Minjung, Jung Chaeyoon, Jung Hyungsik, Jung Jaeuk, Jung Ju Hwan, Jung Kwangsun, Jung Seungjae, Ka Soonwon, Kang Donghan, Kang Soyoung, Kil Taeho, Kim Areum, Kim Beomyoung, Kim Byeongwook, Kim Daehee, Kim Dong-gyun, Kim Donggook, Kim Donghyun, Kim Euna, Kim Eunchul, Kim Geewook, Kim Gyu Ri, Kim Hanbyul, Kim Heesu, Kim Isaac, Kim Jeonghoon, Kim Jihye, Kim Joonghoon, Kim Minjae, Kim Minsub, Kim Pil Hwan, Kim Sammy, Kim Seokhun, Kim Seonghyeon, Kim Soojin, Kim Soong, Kim Soyoon, Kim Sunyoung, Kim Taeho, Kim Wonho, Kim Yoonsik, Kim You Jin, Kim Yuri, Kwon Beomseok, Kwon Ohsung, Kwon Yoo-hwan, Lee Anna, Lee Byungwook, Lee Changho, Lee Daun, Lee Dongjae, Lee Ha-ram, Lee Hodong, Lee Hwiyeong, Lee Hyunmi, Lee Injae, Lee Jaeung, Lee Jeongsang, Lee Jisoo, Lee Jongsoo, Lee Joongjae, Lee Juhan, Lee Jung Hyun, Lee Junghoon, Lee Junwoo, Lee Se Yun, Lee Sujin, Lee Sungjae, Lee Sungwoo, Lee Wonjae, Lee Zoo Hyun, Lim Jong Kun, Lim Kun, Lim Taemin, Na Nuri, Nam Jeongyeon, Nam Kyeong-min, Noh Yeonseog, Oh Biro, Oh Jung-sik, Oh Solgil, Oh Yeontaek, Park Boyoun, Park Cheonbok, Park Dongju, Park Hyeonjin, Park Hyun Tae, Park Hyunjung, Park Jihye, Park Jooseok, Park Junghwan, Park Jungsoo, Park Miru, Park Sang Hee, Park Seunghyun, Park Soyoung, Park Taerim, Park Wonkyeong, Ryu Hyunjoon, Ryu Jeonghun, Ryu Nahyeon, Seo Soonshin, Seo Suk Min, Shim Yoonjeong, Shin Kyuyong, Shin Wonkwang, Sim Hyun, Sim Woongseob, Soh Hyejin, Son Bokyong, Son Hyunjun, Son Seulah, Song Chi-yun, Song Chiyoung, Song Ka Yeon, Song Minchul, Song Seungmin, Wang Jisung, Yeo Yonggoo, Yi Myeong Yeon, Yim Moon Bin, Yoo Taehwan, Yoo Youngjoon, Yoon Sungmin, Yoon Young Jin, Yu Hangyeol, Yu Ui Seon, Zuo Xingdong, Bae Jeongin, Bae Joungeun, Cho Hyunsoo, Cho Seonghyun, Cho Yongjin, Choi Taekyoon, Choi Yera, Chung Jiwan, Han Zhenghui, Heo Byeongho, Hong Euisuk, Hwang Taebaek, Im Seonyeol, Jegal Sumin, Jeon Sumin, Jeong Yelim, Jeong Yonghyun, Jiang Can, Jiang Juyong, Jin Jiho, Jo Ara, Jo Younghyun, Jung Hoyoun, Jung Juyoung, Kang Seunghyeong, Kim Dae Hee, Kim Ginam, Kim Hangyeol, Kim Heeseung, Kim Hyojin, Kim Hyojun, Kim Hyun-ah, Kim Jeehye, Kim Jin-hwa, Kim Jiseon, Kim Jonghak, Kim Jung Yoon, Kim Rak Yeong, Kim Seongjin, Kim Seoyoon, Kim Sewon, Kim Sooyoung, Kim Sukyoung, Kim Taeyong, Ko Naeun, Koo Bonseung, Kwak Heeyoung, Kwon Haena, Kwon Youngjin, Lee Boram, Lee Bruce W., Lee Dagyeong, Lee Erin, Lee Euijin, Lee Ha Gyeong, Lee Hyojin, Lee Hyunjeong, Lee Jeeyoon, Lee Jeonghyun, Lee Jongheok, Lee Joonhyung, Lee Junhyuk, Lee Mingu, Lee Nayeon, Lee Sangkyu, Lee Se Young, Lee Seulgi, Lee Seung Jin, Lee Suhyeon, Lee Yeonjae, Lee Yesol, Lee Youngbeom, Lee Yujin, Li Shaodong, Liu Tianyu, Moon Seong-eun, Moon Taehong, Nihlenramstroem Max-lasse, Oh Wonseok, Oh Yuri, Park Hongbeen, Park Hyekyung, Park Jaeho, Park Nohil, Park Sangjin, Ryu Jiwon, Ryu Miru, Ryu Simo, Seo Ahreum, Seo Hee, Seo Kangdeok, Shin Jamin, Shin Seungyoun, Sin Heetae, Wang Jiangping, Wang Lei, Xiang Ning, Xiao Longxiang, Xu Jing, Yi Seonyeong, Yoo Haanju, Yoo Haneul, Yoo Hwanhee, Yu Liang, Yu Youngjae, Yuan Weijie, Zeng Bo, Zhou Qian, Cho Kyunghyun, Ha Jung-woo, Park Joonsuk, Hwang Jihyun, Kwon Hyoung Jo, Kwon Soonyong, Lee Jungyeon, Lee Seungho, Lim Seonghyeon, Noh Hyunkyung, Choi Seungho, Lee Sang-woo, Lim Jung Hwa, Sung Nako. Arxiv 2024

[Paper]    
Applications Ethics And Bias Fine Tuning Responsible AI

We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model’s cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.

Similar Work