\'evaluation Des Capacit\'es De R\'eponse De Larges Mod\`eles De Langage (LLM) Pour Des Questions D'historiens

Chartier Mathieu, Dakkoune Nabil, Bourgeois Guillaume, Jean Stéphane. Arxiv 2024

[Paper]
Applications GPT Model Architecture

Large Language Models (LLMs) like ChatGPT or Bard have revolutionized information retrieval and captivated the audience with their ability to generate custom responses in record time, regardless of the topic. In this article, we assess the capabilities of various LLMs in producing reliable, comprehensive, and sufficiently relevant responses about historical facts in French. To achieve this, we constructed a testbed comprising numerous history-related questions of varying types, themes, and levels of difficulty. Our evaluation of responses from ten selected LLMs reveals numerous shortcomings in both substance and form. Beyond an overall insufficient accuracy rate, we highlight uneven treatment of the French language, as well as issues related to verbosity and inconsistency in the responses provided by LLMs.

The Large Language Model Bible

\'evaluation Des Capacit\'es De R\'eponse De Larges Mod\`eles De Langage (LLM) Pour Des Questions D'historiens

Chartier Mathieu, Dakkoune Nabil, Bourgeois Guillaume, Jean Stéphane. Arxiv 2024

Similar Work