[Paper]
Recent progress in Large Language Models (LLMs) has produced models that
exhibit remarkable performance across a variety of NLP tasks. However, it
remains unclear whether the existing focus of NLP research accurately captures
the genuine requirements of human users. This paper provides a comprehensive
analysis of the divergence between current NLP research and the needs of
real-world NLP applications via a large-scale collection of user-GPT
conversations. We analyze a large-scale collection of real user queries to GPT.
We compare these queries against existing NLP benchmark tasks and identify a
significant gap between the tasks that users frequently request from LLMs and
the tasks that are commonly studied in academic research. For example, we find
that tasks such as design'' and
planning’’ are prevalent in user
interactions but are largely neglected or different from traditional NLP
benchmarks. We investigate these overlooked tasks, dissect the practical
challenges they pose, and provide insights toward a roadmap to make LLMs better
aligned with user needs.