Getting More Out Of Mixture Of Language Model Reasoning Experts

Si Chenglei, Shi Weijia, Zhao Chen, Zettlemoyer Luke, Boyd-graber Jordan. Arxiv 2023

[Paper]
Applications Merging Prompting RAG Tools

While recent large language models (LLMs) improve on various question answering (QA) datasets, it remains difficult for a single model to generalize across question types that require distinct reasoning abilities. We provide empirical evidence that state-of-the-art LLMs suffer from poor generalizability on reasoning types beyond those seen in the prompt. To remedy this, we propose a Mixture-of-Reasoning-Experts (MoRE) framework that ensembles diverse specialized language models. We specialize the backbone language model with prompts optimized for different reasoning categories, including factual, multihop, mathematical, and commonsense reasoning. Our key insight is to leverage agreement among the specialized experts to select the best answer for each question, or to abstain from answering. This gives MoRE higher accuracy than any single specialized model on a collection of 12 QA datasets from four reasoning types. Beyond generalizability, the interpretable design of MoRE improves selective question answering results compared to baselines without incorporating inter-expert agreement. This framework is also more interpretable and useful to human consumers of QA outputs. Our human study confirms that presenting expert predictions and the answer selection process helps annotators more accurately calibrate when to trust the system’s output. We release all code and data to facilitate future work.

The Large Language Model Bible

Getting More Out Of Mixture Of Language Model Reasoning Experts

Si Chenglei, Shi Weijia, Zhao Chen, Zettlemoyer Luke, Boyd-graber Jordan. Arxiv 2023

Similar Work