Testing LLM Performance On The Physics GRE: Some Observations

Gupta Pranav. Arxiv 2023

With the recent developments in large language models (LLMs) and their widespread availability through open source models and/or low-cost APIs, several exciting products and applications are emerging, many of which are in the field of STEM educational technology for K-12 and university students. There is a need to evaluate these powerful language models on several benchmarks, in order to understand their risks and limitations. In this short paper, we summarize and analyze the performance of Bard, a popular LLM-based conversational service made available by Google, on the standardized Physics GRE examination.

The Large Language Model Bible

Testing LLM Performance On The Physics GRE: Some Observations

Gupta Pranav. Arxiv 2023

Similar Work