Applying And Evaluating Large Language Models In Mental Health Care: A Scoping Review Of Human-assessed Generative Tasks · The Large Language Model Bible Contribute to LLM-Bible

Applying And Evaluating Large Language Models In Mental Health Care: A Scoping Review Of Human-assessed Generative Tasks

Hua Yining, Na Hongbin, Li Zehan, Liu Fenglin, Fang Xiao, Clifton David, Torous John. Arxiv 2024

[Paper]    
Applications Bias Mitigation Ethics And Bias Fairness GPT Merging Model Architecture Reinforcement Learning Responsible AI Security Survey Paper Tools

Large language models (LLMs) are emerging as promising tools for mental health care, offering scalable support through their ability to generate human-like responses. However, the effectiveness of these models in clinical settings remains unclear. This scoping review aimed to assess the current generative applications of LLMs in mental health care, focusing on studies where these models were tested with human participants in real-world scenarios. A systematic search across APA PsycNet, Scopus, PubMed, and Web of Science identified 726 unique articles, of which 17 met the inclusion criteria. These studies encompassed applications such as clinical assistance, counseling, therapy, and emotional support. However, the evaluation methods were often non-standardized, with most studies relying on ad hoc scales that limit comparability and robustness. Privacy, safety, and fairness were also frequently underexplored. Moreover, reliance on proprietary models, such as OpenAI’s GPT series, raises concerns about transparency and reproducibility. While LLMs show potential in expanding mental health care access, especially in underserved areas, the current evidence does not fully support their use as standalone interventions. More rigorous, standardized evaluations and ethical oversight are needed to ensure these tools can be safely and effectively integrated into clinical practice.

Similar Work