Measuring And Controlling Instruction (in)stability In Language Model Dialogs · The Large Language Model Bible Contribute to LLM-Bible

Measuring And Controlling Instruction (in)stability In Language Model Dialogs

Li Kenneth, Liu Tianle, Bashkansky Naomi, Bau David, ViƩgas Fernanda, Pfister Hanspeter, Wattenberg Martin. Arxiv 2024

[Paper]    
Attention Mechanism GPT Model Architecture Pretraining Methods Prompting Transformer

System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating instruction stability via self-chats between two instructed chatbots. Testing popular models like LLaMA2-chat-70B and GPT-3.5, we reveal a significant instruction drift within eight rounds of conversations. An empirical and theoretical analysis of this phenomenon suggests the transformer attention mechanism plays a role, due to attention decay over long exchanges. To combat attention decay and instruction drift, we propose a lightweight method called split-softmax, which compares favorably against two strong baselines.

Similar Work