Matmul Or No Matmul In The Era Of 1-bit Llms

Malekar Jinendra, Elbtity Mohammed E., Zand Ramtin. Arxiv 2024

[Paper]
Attention Mechanism Efficiency And Optimization Model Architecture Quantization Reinforcement Learning

The advent of 1-bit large language models (LLMs) has attracted considerable attention and opened up new research opportunities. However, 1-bit LLMs only improve a fraction of models by applying extreme quantization to the projection layers while leaving attention heads unchanged. Therefore, to avoid fundamentally wrong choices of goals in future research, it is crucial to understand the actual improvements in computation and memory usage that 1-bit LLMs can deliver. In this work, we present an adaptation of Amdahl’s Law tailored for the 1-bit LLM context, which illustrates how partial improvements in 1-bit LLMs impact overall model performance. Through extensive experiments, we uncover key nuances across different model architectures and hardware configurations, offering a roadmap for future research in the era of 1-bit LLMs.

The Large Language Model Bible

Matmul Or No Matmul In The Era Of 1-bit Llms

Malekar Jinendra, Elbtity Mohammed E., Zand Ramtin. Arxiv 2024

Similar Work