A Comprehensive Survey Of Accelerated Generation Techniques In Large Language Models

Khoshnoodi Mahsa, Jain Vinija, Gao Mingye, Srikanth Malavika, Chadha Aman. Arxiv 2024

[Paper]
Applications Efficiency And Optimization GPT Language Modeling Pretraining Methods Reinforcement Learning Survey Paper

Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category’s underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing.

The Large Language Model Bible

A Comprehensive Survey Of Accelerated Generation Techniques In Large Language Models

Khoshnoodi Mahsa, Jain Vinija, Gao Mingye, Srikanth Malavika, Chadha Aman. Arxiv 2024

Similar Work