Trends and dangers in open-source software dependencies

September 16, 2024 3 min read

A C-suite perspective on potential vulnerabilities within open-source dependencies or software packages reveals that, while remediation costs for dependency risks are perilously high, function-level reachability analysis still offers the best value in this critical area, according to Endor Labs.

The research is based on analysis of Endor Labs vulnerability data, the Open Source Vulnerabilities (OSV) database for comparison, information from customer tenants, and Java Archives (JARs) of hundreds of versions of the top 15 open source dependencies to compute breaking changes.

“A lot of organizations are struggling with managing dependency risks. They’re drowning in vulnerability alerts, many of which don’t represent relevant risk; researching the alerts is expensive for security teams (and software teams), and trying to fix everything is even more expensive. Research shows that analysis-based vulnerability prioritization has become a critical capability because of this, and highlights other trends and challenges related to dependency management,” said Darren Meyer, staff research engineer at Endor Labs.

Prioritization factor cuts remediation costs

For a vulnerability in an open source library to be exploitable, there must be, at minimum, a call path from the application to the vulnerable function in that library. The report finds this to be true in fewer than 9.5% of all vulnerabilities in the seven languages explored—Java, Python, Rust, Go, C#, .NET, Kotlin, and Scala.

Therefore, reducing the number of remediation activities needed can slash remediation costs by over 90.5%. Perhaps best of all, this is done with just this one prioritization factor, which makes it by far the most valuable single noise-reduction strategy available anywhere.

The research also turns a spotlight on the speed of response to emerging risks. It reveals that nearly 70% of vulnerability advisories are published after the corresponding security release, with a median delay of 25 days. This increases the existing window of opportunity for attackers to exploit vulnerable systems.

The problems go even deeper: Across six ecosystems explored, 47% of advisories in public vulnerability databases do not contain any code-level vulnerability information at all; 51% contain one or more references to fix commits; and only 2% contain information about affected functions.

This is a serious drawback because the application of program analysis techniques requires code-level information about vulnerabilities, such as the names of affected functions or the fix commits that were developed by open source project maintainers to overcome a vulnerability. Without this kind of information, it’s effectively impossible to establish whether known-vulnerable functions can be executed in the context of a downstream application.

In this environment, there are several context-based strategies that deserve attention, such as excluding vulnerabilities that are only relevant for non-production code. However, even different combinations of these approaches are not as crucial as function-level reachability.

Key issues in supply chain security

Pinpointing the worst offenders: Prioritization enables organizations to focus on less than 5% of their total vulnerabilities. Within the Python ecosystem, for example, updating the top 20 components to non-vulnerable versions would remove more than 75% of all the vulnerability findings. Results with the other languages are almost as good: Java 60%, and npm 44%. The component TensorFlow has the highest number of reported vulnerabilities, and since it’s often installed without manifest files, it underlines the importance of covering “phantom dependencies”.

Phantom dependencies and other trouble spots: Among select customers scanned for this report, the share of Python phantom dependencies in the universe of dependencies ranges from 0 to 60%. But here’s the most important finding: The share of vulnerabilities in those phantom dependencies (in the total of vulnerabilities) gets as high as 85%. In this regard, ‘rebundling’ is a serious issue across ecosystems—thousands of Python and Java components rebundle binary code from other open source projects.

Finding known-vulnerable code: While identifying connections between apps and vulnerabilities is at the core of strengthening security, numerous technical challenges make it hard to link one to the other within their dependencies. However, building databases that cover this kind of dependency identification, particularly with regard to the quality of given vulnerabilities, is key to avoiding false positives and false negatives.

Remediating known vulnerabilities: 24% of 1250 updates from vulnerable to non-vulnerable component versions (published by the 15 most problematic libraries after 2016) require a major version update, while 6% of 1,250 updates can be done by updating the minor or patch version.

In terms of overall solutions, using the Exploit Predictability Scoring System (EPSS) as a prioritization tool is a strong second-order activity. With this option, 80% of reachable vulnerabilities have a 1% or less predicted chance of being exploited.