Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization

Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath. Arxiv 2023 – 23 citations

[Paper] [Code]
Ethics and Bias INTERSPEECH RAG Has Code Security Prompting

We investigate the emergent abilities of the recently proposed web-scale speech model Whisper, by adapting it to unseen tasks with prompt engineering. We selected three tasks: audio-visual speech recognition (AVSR), code-switched speech recognition (CS-ASR), and speech translation (ST) on unseen language pairs. We design task-specific prompts, by either leveraging another large-scale model, or simply manipulating the special tokens in the default prompts. Experiments show that compared to the default prompts, our proposed prompts improve performance by 10% to 45% on the three zero-shot tasks, and even outperform SotA supervised models on some datasets. In addition, our experiments reveal many interesting properties of Whisper, including its robustness to prompts, bias on accents, and the multilingual understanding in its latent space. Code is available at https://github.com/jasonppy/PromptingWhisper

The Large Language Model Bible

Prompting The Hidden Talent Of Web-scale Speech Models For Zero-shot Task Generalization

Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath. Arxiv 2023 – 23 citations

Similar Work