Tencdm: Understanding The Properties Of Diffusion Model In The Space Of Language Model Encodings

Shabalin Alexander, Meshchaninov Viacheslav, Chimbulatov Egor, Lapikov Vladislav, Kim Roman, Bartosh Grigory, Molchanov Dmitry, Markov Sergey, Vetrov Dmitry. Arxiv 2024

[Paper]
Applications GPT Language Modeling Merging Model Architecture Pretraining Methods Transformer

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate contextual information. In our approach, we also employ a transformer-based decoder, specifically designed to incorporate context in the token prediction process. We conduct a comprehensive examination of the influence of the encoder, decoder, noise scheduler, and self-conditioning on zero-shot generation. Furthermore, we compare TEncDM with previous approaches on three conditional text generation tasks: QQP, XSum, and Wiki-Auto. The results show that TEncDM exhibits superior performance compared to existing non-autoregressive diffusion models.

The Large Language Model Bible

Tencdm: Understanding The Properties Of Diffusion Model In The Space Of Language Model Encodings

Shabalin Alexander, Meshchaninov Viacheslav, Chimbulatov Egor, Lapikov Vladislav, Kim Roman, Bartosh Grigory, Molchanov Dmitry, Markov Sergey, Vetrov Dmitry. Arxiv 2024

Similar Work