Llms Meet Multimodal Generation And Editing: A Survey · The Large Language Model Bible Contribute to LLM-Bible

Llms Meet Multimodal Generation And Editing: A Survey

He Yingqing, Liu Zhaoyang, Chen Jingye, Tian Zeyue, Liu Hongyu, Chi Xiaowei, Liu Runtao, Yuan Ruibin, Xing Yazhou, Wang Wenhai, Dai Jifeng, Zhang Yong, Xue Wei, Liu Qifeng, Guo Yike, Chen Qifeng. Arxiv 2024

[Paper] [Code]    
Agentic Applications Has Code Merging Multimodal Models RAG Reinforcement Learning Responsible AI Survey Paper

With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods. Then, we summarize the various roles of LLMs in multimodal generation and exhaustively investigate the critical technical components behind these methods and the multimodal datasets utilized in these studies. Additionally, we dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction. Lastly, we discuss the advancements in the generative AI safety field, investigate emerging applications, and discuss future prospects. Our work provides a systematic and insightful overview of multimodal generation and processing, which is expected to advance the development of Artificial Intelligence for Generative Content (AIGC) and world models. A curated list of all related papers can be found at https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

Similar Work