Investigating Low-cost LLM Annotation For~spoken Dialogue Understanding Datasets · The Large Language Model Bible Contribute to LLM-Bible

Investigating Low-cost LLM Annotation For~spoken Dialogue Understanding Datasets

Druart Lucas Lia, Vielzeuf Valentin Lia, Estève Yannick Lia. 2024

[Paper]    
Fine Tuning Pretraining Methods Training Techniques

In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users’ requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets’ semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.

Similar Work