PULSAR At Mediqa-sum 2023: Large Language Models Augmented By Synthetic Dialogue Convert Patient Dialogues To Medical Records

Schlegel Viktor, Li Hao, Wu Yuping, Subramanian Anand, Nguyen Thanh-tung, Kashyap Abhinav Ramesh, Beck Daniel, Zeng Xiaojun, Batista-navarro Riza Theresa, Winkler Stefan, Nenadic Goran. Arxiv 2023

[Paper] [Code]
Has Code Tools Training Techniques

This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

The Large Language Model Bible

PULSAR At Mediqa-sum 2023: Large Language Models Augmented By Synthetic Dialogue Convert Patient Dialogues To Medical Records

Schlegel Viktor, Li Hao, Wu Yuping, Subramanian Anand, Nguyen Thanh-tung, Kashyap Abhinav Ramesh, Beck Daniel, Zeng Xiaojun, Batista-navarro Riza Theresa, Winkler Stefan, Nenadic Goran. Arxiv 2023

Similar Work