Look Ahead Or Look Around? A Theoretical Comparison Between Autoregressive And Masked Pretraining

Zhang Qi, Du Tianqi, Huang Haotian, Wang Yifei, Wang Yisen. Arxiv 2024

[Paper] [Code]
GPT Has Code Pretraining Methods RAG Tools Training Techniques

In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical comparisons between two leading generative SSL paradigms: autoregressive SSL and masked SSL. Through establishing theoretical frameworks, we elucidate the strengths and limitations of autoregressive and masked SSL within the primary evaluation tasks of classification and content generation. Our findings demonstrate that in classification tasks, the flexibility of targeted tokens in masked SSL fosters more inter-sample connections compared to the fixed position of target tokens in autoregressive SSL, which yields superior clustering performance. In content generation tasks, the misalignment between the flexible lengths of test samples and the fixed length of unmasked texts in masked SSL (vs. flexible lengths of conditional texts in autoregressive SSL) hinders its generation performance. To leverage each other’s strengths and mitigate weaknesses, we propose diversity-enhanced autoregressive and variable-length masked objectives, which substantially improve the classification performance of autoregressive SSL and the generation performance of masked SSL. Code is available at https://github.com/PKU-ML/LookAheadLookAround.

The Large Language Model Bible

Look Ahead Or Look Around? A Theoretical Comparison Between Autoregressive And Masked Pretraining

Zhang Qi, Du Tianqi, Huang Haotian, Wang Yifei, Wang Yisen. Arxiv 2024

Similar Work