Automated Educational Question Generation At Different Bloom's Skill Levels Using Large Language Models: Strategies And Evaluation · The Large Language Model Bible Contribute to LLM-Bible

Automated Educational Question Generation At Different Bloom's Skill Levels Using Large Language Models: Strategies And Evaluation

Scaria Nicy, Chenna Suma Dharani, Subramani Deepak. Artificial Intelligence in Education. AIED 2024

[Paper]    
Prompting

Developing questions that are pedagogically sound, relevant, and promote learning is a challenging and time-consuming task for educators. Modern-day large language models (LLMs) generate high-quality content across multiple domains, potentially helping educators to develop high-quality questions. Automated educational question generation (AEQG) is important in scaling online education catering to a diverse student population. Past attempts at AEQG have shown limited abilities to generate questions at higher cognitive levels. In this study, we examine the ability of five state-of-the-art LLMs of different sizes to generate diverse and high-quality questions of different cognitive levels, as defined by Bloom’s taxonomy. We use advanced prompting techniques with varying complexity for AEQG. We conducted expert and LLM-based evaluations to assess the linguistic and pedagogical relevance and quality of the questions. Our findings suggest that LLms can generate relevant and high-quality educational questions of different cognitive levels when prompted with adequate information, although there is a significant variance in the performance of the five LLms considered. We also show that automated evaluation is not on par with human evaluation.

Similar Work