Taking A Deep Breath: Enhancing Language Modeling Of Large Language Models With Sentinel Tokens
Luo Weiyao, Zheng Suncong, Xia Heming, Wang Weikang, Lei Yan, Liu Tianyu, Chen Shuang, Sui Zhifang. Arxiv 2024
[Paper]
Attention Mechanism
Language Modeling
Model Architecture
Pretraining Methods
RAG
Tools
Transformer
Large language models (LLMs) have shown promising efficacy across various
tasks, becoming powerful tools in numerous aspects of human life. However,
Transformer-based LLMs suffer a performance degradation when modeling long-term
contexts due to they discard some information to reduce computational overhead.
In this work, we propose a simple yet effective method to enable LLMs to take a
deep breath, encouraging them to summarize information contained within
discrete text chunks. Specifically, we segment the text into multiple chunks
and insert special token at the end of each chunk. We then modify the
attention mask to integrate the chunk's information into the corresponding
token. This facilitates LLMs to interpret information not only from historical
individual tokens but also from the token, aggregating the chunk's
semantic information. Experiments on language modeling and out-of-domain
downstream tasks validate the superiority of our approach.
Similar Work