Researchers from Trustwave’s Spiderlabs have tested how well ChatGPT can analyse source code and its suggestions for making the code more secure.
The initial tests involved looking at whether ChatGPT can uncover buffer overflows in code, which occurs when the code fails to allocate enough space to hold input data. The researchers said: “We provided ChatGPT with a lot of code to see how it would respond. It often responded with mixed results.”
When asked how to make the code more secure, they said that ChatGPT suggested increasing the size of the buffer. Other recommendations made by ChatGPT include using a more secure function for inputting data and dynamically allocating memory. The researchers found that ChatGPT could refactor the code based on any of the fixes that it suggested, such as by using dynamic memory allocation.
According to the researchers, ChatGPT did quite a good job at identifying potential issues in the sample code. “These examples were chosen because they are relatively unambiguous, so ChatGPT would not have to infer much context beyond the code that it was given,” they said.
However, When supplying it with larger code blocks or less straightforward issues, it didn’t do very well at spotting them. Nevertheless, the researchers noted that human programmers would have similar issues tackling errors in more complex code. They said that for the best results, ChatGPT needs more user input to elicit a contextualised response to illustrate the code’s purpose. In spite of the limitations, the researchers believe it can be used to support analysis of source code.
Trustwave said that while static analysis tools have been used for years to identify vulnerabilities in code, such tools have limitations in terms of their ability to assess broader security aspects – sometimes reporting vulnerabilities that are impossible to exploit. The researchers reported that ChatGPT demonstrates greater contextual awareness and is able to generate exploits that cover a more comprehensive analysis of security risks. “The biggest flaw when using ChatGPT for this type of analysis is that it is incapable of interpreting the human thought process behind the code,” they warned.
Karl Sigler, threat intelligence manager at Trustwave, said: “ChatGPT is OK at code. It’s better than a junior programmer and can be a programmer’s best friend.” He added that since very few developers start building applications from scratch, ChatGPT offers a way for them to supplement the software development process. For instance, he believes it could help developers understand the application programming interfaces and functionality available in new programming libraries. Given it has been designed to understand human language, Sigler sees an opportunity for ChatGPT to sit behind meetings between business people and developers.
Microsoft recently demonstrated integration of ChatGPT with its Copilot product running with the Teams collaboration tool, where the AI keeps track of the discussion, and takes notes and action points. Sigler believes such technology could be applied to help generate a formal specification for an application development project.
This would avoid any misunderstanding that can easily creep in during such discussions. ChatGPT could be used, in theory, to check submitted code against the formal specification and help both the client and the developer to see if there are deviations between what has been delivered and their understanding of the formal specification. Given its ability to understand human language, Sigler said there is a lot of potential to use ChatGPT to help to check misinterpretation in specification documentation and compliance policies.
The researchers from Trustwave said ChatGPT could be particularly useful for generating skeleton code and unit tests since those require a minimal amount of context and are more concerned with the parameters being passed. This, they pointed out, is a task ChatGPT excelled at in their tests. “It is flexible enough to be able to respond to many different requests, but it is not necessarily suited to every job that is asked of it,” they said.
Over the next two to five years, Sigler expects ChatGPT, and other generative AI systems, to become part of the software development lifecycle. One example of how this is being used today is a plugin for the IDA binary code analysis tool. IDA Pro converts binary code into human-readable source code.
However, without documentation, it can take a long time to reverse engineer the source code to understand what it has been designed to do. A Github project, called Gepetto, runs a Python script, using OpenAI’s gpt-3.5-turbo large language model, which the project’s maintainer said is able to provide meaning to functions decompiled by IDA Pro. For instance, it can be used to ask gpt-3.5-turbo to explain what a function in the decompiled code does.
According to Sigler, ChatGPT also allows the open source community to automate some of the auditing effort needed to maintain secure and manageable code.