Attentionlego: An Open-source Building Block For Spatially-scalable Large Language Model Accelerator With Processing-in-memory Technology · The Large Language Model Bible Contribute to LLM-Bible

Attentionlego: An Open-source Building Block For Spatially-scalable Large Language Model Accelerator With Processing-in-memory Technology

Cong Rongqing, He Wenyang, Li Mingxuan, Luo Bangning, Yang Zebin, Yang Yuchao, Huang Ru, Yan Bonan. Arxiv 2024

[Paper] [Code]    
Agentic Attention Mechanism Has Code Model Architecture Multimodal Models Pretraining Methods Transformer

Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O bandwidth for transferring intermediate calculation results between memories and processing units. To tackle this challenge, this work develops a fully customized vanilla self-attention accelerator, AttentionLego, as the basic building block for constructing spatially expandable LLM processors. AttentionLego provides basic implementation with fully-customized digital logic incorporating Processing-In-Memory (PIM) technology. It is based on PIM-based matrix-vector multiplication and look-up table-based Softmax design. The open-source code is available online: https://bonany.cc/attentionleg.

Similar Work