Dlgnet: A Transformer-based Model For Dialogue Response Generation
Olabiyi Oluwatobi, Mueller Erik T.. Arxiv 2019
[Paper]
GPT
Model Architecture
Pretraining Methods
RAG
Tokenization
Training Techniques
Transformer
Neural dialogue models, despite their successes, still suffer from lack of
relevance, diversity, and in many cases coherence in their generated responses.
These issues can attributed to reasons including (1) short-range model
architectures that capture limited temporal dependencies, (2) limitations of
the maximum likelihood training objective, (3) the concave entropy profile of
dialogue datasets resulting in short and generic responses, and (4) the
out-of-vocabulary problem leading to generation of a large number of
tokens. On the other hand, transformer-based models such as GPT-2 have
demonstrated an excellent ability to capture long-range structures in language
modeling tasks. In this paper, we present DLGNet, a transformer-based model for
dialogue modeling. We specifically examine the use of DLGNet for multi-turn
dialogue response generation. In our experiments, we evaluate DLGNet on the
open-domain Movie Triples dataset and the closed-domain Ubuntu Dialogue
dataset. DLGNet models, although trained with only the maximum likelihood
objective, achieve significant improvements over state-of-the-art multi-turn
dialogue models. They also produce best performance to date on the two datasets
based on several metrics, including BLEU, ROUGE, and distinct n-gram. Our
analysis shows that the performance improvement is mostly due to the
combination of (1) the long-range transformer architecture with (2) the
injection of random informative paddings. Other contributing factors include
the joint modeling of dialogue context and response, and the 100% tokenization
coverage from the byte pair encoding (BPE).
Similar Work