Metric-aware LLM Inference For Regression And Scoring

Lukasik Michal, Narasimhan Harikrishna, Menon Aditya Krishna, Yu Felix, Kumar Sanjiv. Arxiv 2024

[Paper]
GPT Pretraining Methods Reinforcement Learning

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM’s underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose metric aware LLM inference: a decision theoretic approach optimizing for custom regression and scoring metrics at inference time. We report improvements over baselines on academic benchmarks and publicly available models.

The Large Language Model Bible

Metric-aware LLM Inference For Regression And Scoring

Lukasik Michal, Narasimhan Harikrishna, Menon Aditya Krishna, Yu Felix, Kumar Sanjiv. Arxiv 2024

Similar Work