INDIC QA BENCHMARK: A Multilingual Benchmark To Evaluate Question Answering Capability Of Llms For Indic Languages · The Large Language Model Bible Contribute to LLM-Bible

INDIC QA BENCHMARK: A Multilingual Benchmark To Evaluate Question Answering Capability Of Llms For Indic Languages

Singh Abhishek Kumar, Murthy Rudra, Kumar Vishwajeet, Sen Jaydeep, Ramakrishnan Ganesh. Arxiv 2024

[Paper]    
Applications Few Shot Reinforcement Learning

Large Language Models (LLMs) have demonstrated remarkable zero-shot and few-shot capabilities in unseen tasks, including context-grounded question answering (QA) in English. However, the evaluation of LLMs’ capabilities in non-English languages for context-based QA is limited by the scarcity of benchmarks in non-English languages. To address this gap, we introduce Indic-QA, the largest publicly available context-grounded question-answering dataset for 11 major Indian languages from two language families. The dataset comprises both extractive and abstractive question-answering tasks and includes existing datasets as well as English QA datasets translated into Indian languages. Additionally, we generate a synthetic dataset using the Gemini model to create question-answer pairs given a passage, which is then manually verified for quality assurance. We evaluate various multilingual Large Language Models and their instruction-fine-tuned variants on the benchmark and observe that their performance is subpar, particularly for low-resource languages. We hope that the release of this dataset will stimulate further research on the question-answering abilities of LLMs for low-resource languages.

Similar Work