Do Llms Have Consistent Values?

Rozen Naama, Elidan Gal, Globerson Amir, Daniel Ella. Arxiv 2024

[Paper]
Prompting Reinforcement Learning

Values are a basic driving force underlying human behavior. Large Language Models (LLM) technology is constantly improving towards human-like dialogue. However, little research has been done to study the values exhibited in text generated by LLMs. Here we study this question by turning to the rich literature on value structure in psychology. We ask whether LLMs exhibit the same value structure that has been demonstrated in humans, including the ranking of values, and correlation between values. We show that the results of this analysis strongly depend on how the LLM is prompted, and that under a particular prompting strategy (referred to as ‘Value Anchoring’) the agreement with human data is quite compelling. Our results serve both to improve our understanding of values in LLMs, as well as introduce novel methods for assessing consistency in LLM responses.

The Large Language Model Bible

Do Llms Have Consistent Values?

Rozen Naama, Elidan Gal, Globerson Amir, Daniel Ella. Arxiv 2024

Similar Work