Inference To The Best Explanation In Large Language Models · The Large Language Model Bible Contribute to LLM-Bible

Inference To The Best Explanation In Large Language Models

Dalal Dhairya, Valentino Marco, Freitas André, Buitelaar Paul. Arxiv 2024

[Paper]    
Applications GPT Interpretability And Explainability Model Architecture Reinforcement Learning Tools

While Large Language Models (LLMs) have found success in real-world applications, their underlying explanatory process is still poorly understood. This paper proposes IBE-Eval, a framework inspired by philosophical accounts on Inference to the Best Explanation (IBE) to advance the interpretation and evaluation of LLMs’ explanations. IBE-Eval estimates the plausibility of natural language explanations through a combination of explicit logical and linguistic features including: consistency, parsimony, coherence, and uncertainty. Extensive experiments are conducted on Causal Question Answering (CQA), where \textit{IBE-Eval} is tasked to select the most plausible causal explanation amongst competing ones generated by LLMs (i.e., GPT 3.5 and Llama 2). The experiments reveal that IBE-Eval can successfully identify the best explanation with up to 77% accuracy (\(\approx 27%\) above random), improving upon a GPT 3.5-as-a-Judge baseline (\(\approx+17%\)) while being intrinsically more efficient and interpretable. Additional analyses suggest that, despite model-specific variances, LLM-generated explanations tend to conform to IBE criteria and that IBE-Eval is significantly correlated with human judgment, opening up opportunities for future development of automated explanation verification tools.

Similar Work