Evaluating Large Language Models With Fmeval

Schwöbel Pola, Franceschi Luca, Zafar Muhammad Bilal, Vasist Keerthan, Malhotra Aman, Shenhar Tomer, Tailor Pinal, Yilmaz Pinar, Diamond Michael, Donini Michele. Arxiv 2024

[Paper] [Code]
Applications Ethics And Bias Has Code RAG Reinforcement Learning Responsible AI Tools

fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.

The Large Language Model Bible

Evaluating Large Language Models With Fmeval

Schwöbel Pola, Franceschi Luca, Zafar Muhammad Bilal, Vasist Keerthan, Malhotra Aman, Shenhar Tomer, Tailor Pinal, Yilmaz Pinar, Diamond Michael, Donini Michele. Arxiv 2024

Similar Work