Exploring Low-cost Transformer Model Compression For Large-scale Commercial Reply Suggestions

Shrivastava Vaishnavi, Gaonkar Radhika, Gupta Shashank, Jha Abhishek. Arxiv 2021

[Paper]
Applications Efficiency And Optimization Fine Tuning Model Architecture Pretraining Methods Quantization Security Training Techniques Transformer

Fine-tuning pre-trained language models improves the quality of commercial reply suggestion systems, but at the cost of unsustainable training times. Popular training time reduction approaches are resource intensive, thus we explore low-cost model compression techniques like Layer Dropping and Layer Freezing. We demonstrate the efficacy of these techniques in large-data scenarios, enabling the training time reduction for a commercial email reply suggestion system by 42%, without affecting the model relevance or user engagement. We further study the robustness of these techniques to pre-trained model and dataset size ablation, and share several insights and recommendations for commercial applications.

The Large Language Model Bible

Exploring Low-cost Transformer Model Compression For Large-scale Commercial Reply Suggestions

Shrivastava Vaishnavi, Gaonkar Radhika, Gupta Shashank, Jha Abhishek. Arxiv 2021

Similar Work