Mani-gpt: A Generative Model For Interactive Robotic Manipulation

Zhang Zhe, Chai Wei, Wang Jiankun. Arxiv 2023

[Paper]
GPT Model Architecture Pretraining Methods RAG Reinforcement Learning Transformer

In real-world scenarios, human dialogues are multi-round and diverse. Furthermore, human instructions can be unclear and human responses are unrestricted. Interactive robots face difficulties in understanding human intents and generating suitable strategies for assisting individuals through manipulation. In this article, we propose Mani-GPT, a Generative Pre-trained Transformer (GPT) for interactive robotic manipulation. The proposed model has the ability to understand the environment through object information, understand human intent through dialogues, generate natural language responses to human input, and generate appropriate manipulation plans to assist the human. This makes the human-robot interaction more natural and humanized. In our experiment, Mani-GPT outperforms existing algorithms with an accuracy of 84.6% in intent recognition and decision-making for actions. Furthermore, it demonstrates satisfying performance in real-world dialogue tests with users, achieving an average response accuracy of 70%.

The Large Language Model Bible

Mani-gpt: A Generative Model For Interactive Robotic Manipulation

Zhang Zhe, Chai Wei, Wang Jiankun. Arxiv 2023

Similar Work