Vimq: A Vietnamese Medical Question Dataset For Healthcare Dialogue System Development

Huy Ta Duc, Tu Nguyen Anh, Vu Tran Hoang, Minh Nguyen Phuc, Phan Nguyen, Bui Trung H., Truong Steven Q. H.. Arxiv 2023

[Paper] [Code]
Has Code Reinforcement Learning Training Techniques

Existing medical text datasets usually take the form of question and answer pairs that support the task of natural language generation, but lacking the composite annotations of the medical terms. In this study, we publish a Vietnamese dataset of medical questions from patients with sentence-level and entity-level annotations for the Intent Classification and Named Entity Recognition tasks. The tag sets for two tasks are in medical domain and can facilitate the development of task-oriented healthcare chatbots with better comprehension of queries from patients. We train baseline models for the two tasks and propose a simple self-supervised training strategy with span-noise modelling that substantially improves the performance. Dataset and code will be published at https://github.com/tadeephuy/ViMQ

The Large Language Model Bible

Vimq: A Vietnamese Medical Question Dataset For Healthcare Dialogue System Development

Huy Ta Duc, Tu Nguyen Anh, Vu Tran Hoang, Minh Nguyen Phuc, Phan Nguyen, Bui Trung H., Truong Steven Q. H.. Arxiv 2023

Similar Work