Grad-sam: Explaining Transformers Via Gradient Self-attention Maps · The Large Language Model Bible Contribute to LLM-Bible

Grad-sam: Explaining Transformers Via Gradient Self-attention Maps

Barkan Oren, Hauon Edan, Caciularu Avi, Katz Ori, Malkiel Itzik, Armstrong Omri, Koenigstein Noam. Arxiv 2022

[Paper]    
Attention Mechanism Model Architecture Pretraining Methods Transformer

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model’s prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

Similar Work