Beyond Text: Unveiling Multimodal Proficiency Of Large Language Models With Multiapi Benchmark

Liu Xiao, Lin Jianfeng, Zhang Jiawei. Arxiv 2023

[Paper]
Applications GPT Model Architecture Multimodal Models Prompting Reinforcement Learning Tools

The proliferation of Large Language Models like ChatGPT has significantly advanced language understanding and generation, impacting a broad spectrum of applications. However, these models predominantly excel in text-based tasks, overlooking the complexity of real-world multimodal information. This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset aimed at expanding LLMs’ proficiency in multimodal contexts. Developed collaboratively through ChatGPT, MultiAPI consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks. Through comprehensive experiments, our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation. What’s more, we surprisingly notice that auxiliary context can actually impair the performance. An in-depth error analysis paves the way for a new paradigm to address these challenges, suggesting a potential direction for future LLM research.

The Large Language Model Bible

Beyond Text: Unveiling Multimodal Proficiency Of Large Language Models With Multiapi Benchmark

Liu Xiao, Lin Jianfeng, Zhang Jiawei. Arxiv 2023

Similar Work