Modern software applications usually consist of numerous files and several million lines of code. Due to the sheer quantity, finding and correcting faults, known as debugging, is difficult. In many software companies, developers still search for faults manually, which takes up a large proportion of their working time. Studies indicate that this accounts for between 30 and 90 percent of the total development time.
Birgit Hofer and Thomas Hirsch from the Institute of Software Technology at the Graz University of Technology (TU Graz) have developed a solution based on existing natural language processing methods and metrics that can greatly speed up the process of finding faulty code and, thus, debugging.
Fault localization uses up the most time
“As a first step, we conducted surveys among developers to determine the biggest time-wasters when debugging. It turned out that bug fixing is not the big problem, but that programmers mainly get bogged down with locating faults, i.e., narrowing down the search to the right area in the program code,” explains Birgit Hofer.
Based on this realisation, the researchers set about finding a solution to this problem which is also scalable to applications with a lot of code. Although there are efficient model-based approaches in which a program is converted into a logical representation (referred to as a model), this only works for small programs. This is because the computing effort increases exponentially with the size of the code. The approach taken up by Hofer and Hirsch represents certain software properties in numbers – for example, the readability or complexity of code – and can also be used for large amounts of code, as the computational effort only increases linearly.
Comparison of bug description and code
The starting point for fault localization is the bug report, for which testers or users fill out a form describing the observed failure and enter information about the software version, their operating system, the steps they took before the failure occurred, and other relevant information.
Based on this bug report, the combination of natural language processing and metrics analyses the entire code about classes, the names of variables, files, methods or functions, and the calls to methods and functions. The application identifies code sections that best correspond to the bug report.
As a result, the developers receive a list of five to ten files ranked according to the probability of their being responsible for the observed failure. The developers also receive information on the type of fault that is most likely to be involved. This data can be used to locate and fix the bug more quickly.
“The working time of software developers is expensive, yet they often spend more of this expensive time locating and fixing bugs than developing new features,” says Birgit Hofer. “As there are already a number of approaches to eradicating this problem, we have investigated how we can combine and improve them so that there is a basis for commercial application. We have now laid the foundations and the system works. However, in order to integrate it into a company, it would still have to be adapted to the company’s respective needs.”
The debugging system is available on GitHub. The project website contains papers and repositories associated with this research.
In this eBook, contributors around the globe share how earning the CSSLP has helped them succeed in their endeavors and avoid costly errors.