An Effective Data Creation Pipeline To Generate High-quality Financial Instruction Data For Large Language Model · The Large Language Model Bible Contribute to LLM-Bible

An Effective Data Creation Pipeline To Generate High-quality Financial Instruction Data For Large Language Model

Wang Ziao, Wang Jianning, Wu Junda, Zhang Xiaofeng. Arxiv 2023

[Paper]    
Applications GPT Model Architecture Reinforcement Learning

At the beginning era of large language model, it is quite critical to generate a high-quality financial dataset to fine-tune a large language model for financial related tasks. Thus, this paper presents a carefully designed data creation pipeline for this purpose. Particularly, we initiate a dialogue between an AI investor and financial expert using ChatGPT and incorporate the feedback of human financial experts, leading to the refinement of the dataset. This pipeline yielded a robust instruction tuning dataset comprised of 103k multi-turn chats. Extensive experiments have been conducted on this dataset to evaluate the model’s performance by adopting an external GPT-4 as the judge. The promising experimental results verify that our approach led to significant advancements in generating accurate, relevant, and financial-style responses from AI models, and thus providing a powerful tool for applications within the financial sector.

Similar Work