Prompting Multi-modal Tokens To Enhance End-to-end Autonomous Driving Imitation Learning With Llms · The Large Language Model Bible Contribute to LLM-Bible

Prompting Multi-modal Tokens To Enhance End-to-end Autonomous Driving Imitation Learning With Llms

Duan Yiqun, Zhang Qiang, Xu Renjing. Published as oral presentation paper atthe 2024

[Paper]    
Agentic Attention Mechanism Ethics And Bias Model Architecture Prompting Reinforcement Learning Tools

The utilization of Large Language Models (LLMs) within the realm of reinforcement learning, particularly as planners, has garnered a significant degree of attention in recent scholarly literature. However, a substantial proportion of existing research predominantly focuses on planning models for robotics that transmute the outputs derived from perception models into linguistic forms, thus adopting a `pure-language’ strategy. In this research, we propose a hybrid End-to-End learning framework for autonomous driving by combining basic driving imitation learning with LLMs based on multi-modality prompt tokens. Instead of simply converting perception results from the separated train model into pure language input, our novelty lies in two aspects. 1) The end-to-end integration of visual and LiDAR sensory input into learnable multi-modality tokens, thereby intrinsically alleviating description bias by separated pre-trained perception models. 2) Instead of directly letting LLMs drive, this paper explores a hybrid setting of letting LLMs help the driving model correct mistakes and complicated scenarios. The results of our experiments suggest that the proposed methodology can attain driving scores of 49.21%, coupled with an impressive route completion rate of 91.34% in the offline evaluation conducted via CARLA. These performance metrics are comparable to the most advanced driving models.

Similar Work