RTP-LX: Can Llms Evaluate Toxicity In Multilingual Scenarios? · The Large Language Model Bible Contribute to LLM-Bible

RTP-LX: Can Llms Evaluate Toxicity In Multilingual Scenarios?

De Wynter Adrian, Watts Ishaan, Altıntoprak Nektar Ege, Wongsangaroonsri Tua, Zhang Minghui, Farra Noura, Baur Lena, Claudet Samantha, Gajdusek Pavel, Gören Can, Gu Qilong, Kaminska Anna, Kaminski Tomasz, Kuo Ruby, Kyuba Akiko, Lee Jongho, Mathur Kartik, Merok Petter, Milovanović Ivana, Paananen Nani, Paananen Vesa-matti, Pavlenko Anna, Vidal Bruno Pereira, Strika Luciano, Tsao Yueh, Turcato Davide, Vakhno Oleksandr, Velcsov Judit, Vickers Anna, Visser Stéphanie, Widarmanto Herdyan, Zaikin Andrey, Chen Si-qing. Arxiv 2024

[Paper]    
Ethics And Bias Prompting Reinforcement Learning Responsible AI

Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microagressions, bias). We release of this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.

Similar Work