[Paper]
Purpose: Enhanced health literacy has been linked to better health outcomes; however, few interventions have been studied. We investigate whether large language models (LLMs) can serve as a medium to improve health literacy in children and other populations. Methods: We ran 288 conditions using 26 different prompts through ChatGPT-3.5, Microsoft Bing, and Google Bard. Given constraints imposed by rate limits, we tested a subset of 150 conditions through ChatGPT-4. The primary outcome measurements were the reading grade level (RGL) and word counts of output. Results: Across all models, output for basic prompts such as “Explain” and “What is (are)” were at, or exceeded, a 10th-grade RGL. When prompts were specified to explain conditions from the 1st to 12th RGL, we found that LLMs had varying abilities to tailor responses based on RGL. ChatGPT-3.5 provided responses that ranged from the 7th-grade to college freshmen RGL while ChatGPT-4 outputted responses from the 6th-grade to the college-senior RGL. Microsoft Bing provided responses from the 9th to 11th RGL while Google Bard provided responses from the 7th to 10th RGL. Discussion: ChatGPT-3.5 and ChatGPT-4 did better in achieving lower-grade level outputs. Meanwhile Bard and Bing tended to consistently produce an RGL that is at the high school level regardless of prompt. Additionally, Bard’s hesitancy in providing certain outputs indicates a cautious approach towards health information. LLMs demonstrate promise in enhancing health communication, but future research should verify the accuracy and effectiveness of such tools in this context. Implications: LLMs face challenges in crafting outputs below a sixth-grade reading level. However, their capability to modify outputs above this threshold provides a potential mechanism to improve health literacy and communication in a pediatric population and beyond.