New Vulnerability Detector to Analyze Source Code


Detecting source code vulnerabilities aims to protect software systems from attacks by identifying inherent vulnerabilities. 

Prior studies often oversimplify the problem into binary classification tasks, which poses challenges for deep learning models to effectively learn diverse vulnerability characteristics. 

To address this, the following cybersecurity analysts introduced FGVulDet, a fine-grained vulnerability detector that employs multiple classifiers to discern various vulnerability types:-

  • Shangqing Liu from Nanyang Technological University 
  • Wei Ma from Nanyang Technological University
  • Jian Wang from Nanyang Technological University
  • Xiaofei Xie from Singapore Management University
  • Ruitao Feng from Singapore Management University
  • Yang Liu from Nanyang Technological University

FGVulDet Vulnerability Detector

Each classifier learns type-specific semantics, and researchers propose a novel data augmentation technique to enhance diversity in the training dataset. 

Inspired by graph neural networks, FGVulDet utilizes an edge-aware GGNN to capture program semantics from a large-scale GitHub dataset encompassing five vulnerability types.

Five Vulnerability Types

Previous works have simplified the identification of source code vulnerability into a binary classification problem where all defect-prone functions are labeled as 1.

This approach lacks accuracy because it does not consider types of particular vulnerabilities.

However, in contrast to this, the researchers’ approach focuses on fine-grained vulnerability identification and aims to learn prediction functions for distinct vulnerability types within a dataset. 

Each function is categorized based on its vulnerability type to predict its vulnerability status.

Their framework has three core parts:-

  • Data Collection
  • Vulnerability-preserving Data Augmentation
  • Edge-aware GGNN

On the other hand, researchers train multiple binary classifiers for different vulnerability types and aggregate their predictions through voting during the prediction phase.

This task is difficult as obtaining high-quality datasets covering a broad range of vulnerabilities requires specialist knowledge.

The framework of FGVulDet (Source – Arxiv)

GGNN is a very famous source code modeling approach that is limited to node representations without considering the edge information.

In this case, it’s aimed at proposing an edge-sensitive GGNN that can effectively use edge semantics in vulnerability detection.

Each type of vulnerability has its own binary classifier, which is trained by using datasets of both vulnerable and non-vulnerable functions.

The final prediction is made through majority voting across all the classifiers.

Since the researchers’ dataset includes common vulnerabilities so, it can be extended for detecting others as well.

On the other hand, FGVulDet employs multiple classifiers and a novel data augmentation technique for effective fine-grained vulnerability detection.

Looking to Safeguard Your Company from Advanced Cyber Threats? Deploy TrustNet to Your Radar ASAP.



Source link