Exploring The Limits Of Chatgpt In Software Security Applications · The Large Language Model Bible Contribute to LLM-Bible

Exploring The Limits Of Chatgpt In Software Security Applications

Wu Fangzhou, Zhang Qingzhao, Bajaj Ati Priya, Bao Tiffany, Zhang Ning, Wang Ruoyu "fish", Xiao Chaowei. Arxiv 2023

[Paper]    
Applications Fine Tuning GPT Model Architecture Security Tools

Large language models (LLMs) have undergone rapid evolution and achieved remarkable results in recent times. OpenAI’s ChatGPT, backed by GPT-3.5 or GPT-4, has gained instant popularity due to its strong capability across a wide range of tasks, including natural language tasks, coding, mathematics, and engaging conversations. However, the impacts and limits of such LLMs in system security domain are less explored. In this paper, we delve into the limits of LLMs (i.e., ChatGPT) in seven software security applications including vulnerability detection/repair, debugging, debloating, decompilation, patching, root cause analysis, symbolic execution, and fuzzing. Our exploration reveals that ChatGPT not only excels at generating code, which is the conventional application of language models, but also demonstrates strong capability in understanding user-provided commands in natural languages, reasoning about control and data flows within programs, generating complex data structures, and even decompiling assembly code. Notably, GPT-4 showcases significant improvements over GPT-3.5 in most security tasks. Also, certain limitations of ChatGPT in security-related tasks are identified, such as its constrained ability to process long code contexts.

Similar Work