[Paper]
The advent of generative AI models holds tremendous potential for aiding teachers in the generation of pedagogical materials. However, numerous knowledge gaps concerning the behavior of these models obfuscate the generation of research-informed guidance for their effective usage. Here we assess trends in prompt specificity, variability, and weaknesses in foreign language teacher lesson plans generated by zero-shot prompting in ChatGPT. Iterating a series of prompts that increased in complexity, we found that output lesson plans were generally high quality, though additional context and specificity to a prompt did not guarantee a concomitant increase in quality. Additionally, we observed extreme cases of variability in outputs generated by the same prompt. In many cases, this variability reflected a conflict between 20th century versus 21st century pedagogical practices. These results suggest that the training of generative AI models on classic texts concerning pedagogical practices may represent a currently underexplored topic with the potential to bias generated content towards teaching practices that have been long refuted by research. Collectively, our results offer immediate translational implications for practicing and training foreign language teachers on the use of AI tools. More broadly, these findings reveal the existence of generative AI output trends that have implications for the generation of pedagogical materials across a diversity of content areas.