Evaluating Large Language Models With Fmeval · The Large Language Model Bible Contribute to LLM-Bible

Evaluating Large Language Models With Fmeval

Schwöbel Pola, Franceschi Luca, Zafar Muhammad Bilal, Vasist Keerthan, Malhotra Aman, Shenhar Tomer, Tailor Pinal, Yilmaz Pinar, Diamond Michael, Donini Michele. Arxiv 2024

[Paper] [Code]    
Applications Ethics And Bias Has Code RAG Reinforcement Learning Responsible AI Tools

fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.

Similar Work