\(\textit{latent}\)-glat: Glancing At Latent Variables For Parallel Text Generation

Bao Yu, Zhou Hao, Huang Shujian, Wang Dongqi, Qian Lihua, Dai Xinyu, Chen Jiajun, Li Lei. Arxiv 2022

[Paper]
Applications Attention Mechanism Efficiency And Optimization GPT Language Modeling Model Architecture Pretraining Methods Training Techniques

Recently, parallel text generation has received widespread attention due to its success in generation efficiency. Although many advanced techniques are proposed to improve its generation quality, they still need the help of an autoregressive model for training to overcome the one-to-many multi-modal phenomenon in the dataset, limiting their applications. In this paper, we propose \(\textit{latent}\)-GLAT, which employs the discrete latent variables to capture word categorical information and invoke an advanced curriculum learning technique, alleviating the multi-modality problem. Experiment results show that our method outperforms strong baselines without the help of an autoregressive model, which further broadens the application scenarios of the parallel decoding paradigm.

The Large Language Model Bible

\(\textit{latent}\)-glat: Glancing At Latent Variables For Parallel Text Generation

Bao Yu, Zhou Hao, Huang Shujian, Wang Dongqi, Qian Lihua, Dai Xinyu, Chen Jiajun, Li Lei. Arxiv 2022

Similar Work