Speech Translation With Large Language Models: An Industrial Practice

Huang Zhichao, Ye Rong, Ko Tom, Dong Qianqian, Cheng Shanbo, Wang Mingxuan, Li Hang. Arxiv 2023

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/.

The Large Language Model Bible

Speech Translation With Large Language Models: An Industrial Practice

Huang Zhichao, Ye Rong, Ko Tom, Dong Qianqian, Cheng Shanbo, Wang Mingxuan, Li Hang. Arxiv 2023

Similar Work