What Works and What Doesn’t
A software package is the dream of reusability made possible. Individual developers and organizations of all kinds contributed software components to public repositories, in standardized ‘package’ formats; and package managers allowed anyone to install and integrate those packages into their own software. That is what made open source so pervasive in modern society. The infrastructure consisting of public package registries, package managers, and continuous build and deployment tools helped make the adoption of open source software so widespread. This infrastructure, along with cloud computing, constitutes the perfect foundation for our information-addicted society.
In recent years, wrongdoers of all kinds used open source software registries to deliver malicious behavior. These activities are as old as the open source, but their frequency exploded in the last three years. Last year a whopping 245,000 malicious packages were seen.
Why has this attack vector become so popular with threat actors? Criminals are always looking for novel ways to carry out their attacks. They have found that by infecting a single popular open source component, they can affect thousands of victims. This amplification effect is the primary driver. And the open source infrastructure allows anyone, anywhere to create an ephemeral account in a component registry or collaborative development platform. Zero cost, no vetting, and lots of opportunities to leverage the excessive trust that software teams have traditionally placed in third-party components.
Should we stop using open source components? No. Is the value they provide worth the risk? Sure. Should I keep using open source the same way? Absolutely not.
Why does the infrastructure allow such easy attacks?
Package registries are open, often requiring minimal verification of the publisher’s identity. “Anyone is welcome to publish their software here!” The bar for attackers is set low: they use disposable email addresses and GitHub accounts to create hundreds of malicious packages in short, phishing-like, campaigns. We often witness the creation of a credible GitHub source repository with many stars and commits from multiple fake contributors.
Package managers were designed for ease of use and not for security. They can run installation scripts. Also, package managers install from multiple sources, and sometimes the default is to use public registries, even for internal components. And last, they do not check for a mismatch between the published package and the package sources.
Dependencies are nested, forming a graph that no one can track manually. In certain ecosystems, small-scale dependencies accumulate by the hundreds or thousands. One thing is to have strict control over direct dependencies declared by my software projects, but transitive dependencies are harder to control. Open source followed “the friends of my friends are my friends”. Brotherhood is the norm in the wild Far East! Threat actors know this and hide their malicious behavior deep in obscure, unknown dependencies.
In recent years the open source infrastructure added some security controls, such as multi-factor authentication under some conditions, registration procedures with proof of ownership for organizations, or allowing digital signatures on packages and container images were added, but this has not slowed the pace of attacks at all.
Which malicious behavior?
The most common is infostealer / credentials drainer. Over 90% of the unsophisticated attacks are simple stealers mainly looking for credentials like passwords, access tokens, or API keys. It is probably the simplest to write. The idea is simple: “I publish a stealer for phishing credentials, so I can later use the credentials for launching a directed attack”. Significant ransomware attacks have occurred after credentials were stolen in this way.
The second in popularity are droppers/downloaders of conventional binary malware variants; more than one in a third of malicious components have droppers. They install backdoors, spyware, and crypto drainers, among others.
Financially motivated adversaries are willing to use your cloud assets for running cryptominers. They do not care about the low profit ratio of $1 for every $53 charged to the victim for the stolen cloud infrastructure: they do not pay the bills!
In addition to legitimate and malicious components, we have observed several abuses, including spam packages (easy earnings, snake oil, links to Viagra offerings, and all that), and bug bounty and security research hoaxes.
Some good news? We have not seen (yet) direct ransomware attacks delivered through malicious components. For unknown reasons, cybercriminals seem to prefer more traditional delivery mechanisms.
My organization builds some software. Should I be worried?
The risk is high. First, there is no comprehensive database of malicious packages. Vulnerabilities have a code (CVE ID) assigned, but only a few malicious packages that make headlines are given one. The usual malware tools do not make specific provisions for malicious components (that would be welcome!). It is not easy to determine whether version X of component Y contains malware, directly or indirectly.
A closer look at the problem (we cybersecurity vendors often err on the side of doomsayers) limits a bit the real magnitude of the risk. In numbers, most of the malicious packages are simple and low-risk: they try to “phish” naïve developers with typo-squatiing packages that perform basic credentials exfiltration, often for second stage attacks. 98% of all detected malicious packages are that simple. But behind that background noise of trivial malware (we call them “anchovies”), sometimes different beasts (“sharks” as we name them) emerge out of the blue. They made headlines.
What does not work?
Most security-aware professionals have ideas about how to handle this threat. We have heard security managers saying without hesitation that Software Composition Analysis (SCA) tools already tell you when a package version is malware. Or that they depend on well-known, highly reviewed software components, where any malware would be promptly detected and removed. They use open minor/patch versions for automatically getting vulnerability fixes, and that is the proper, recommended way to lower the risk on open source dependencies, following the “patch early, patch often” principle. These ideas are plainly wrong against malicious software, misconceptions easy to disprove.
SCA tools indeed report (some) malicious components, but after the fact… when probably it is too late if the bad component was used in a software build. Secrets might have been exfiltrated, additional malware downloaded and installed, and perhaps the adversary moved laterally and gained access elsewhere. Malicious components are promptly removed from the public registry only after the registry security team confirms it as malware. When the SCA tools know about this, the malware is in fact disarmed, but it is too late for the victims that installed it!
Another misconception is that blocking installation scripts at build time prevents malicious behavior from open-source components. This is partially true if the malicious behavior happens there, but attackers now mostly run the attack payload at runtime.
There is a tradeoff between “patching early and often” with open versions letting the package manager to automatically install new updates when available for security fixes, and “version pinning”, having all the direct and transitive dependencies for a software at a fixed version. The security principles are stubborn and sometimes contradictory. Some package managers make automatic updates with server ranges the recommended way. Great if you also want to receive the malicious updates! Yes, components must be updated to receive security fixes that close vulnerabilities as soon as possible, but … never let the package manager do this automatically.
And finally, assuming that using popular, trusted components is safe, and that any malicious version would be promptly found, disclosed and removed, is wishful thinking. A component is trusted because it is highly popular, with many eyeballs looking for vulnerabilities, a large number of contributors for maintenance, with multiple core maintainers who diligently review all change requests. The reality is quite different. Most essential components are maintained by a single, unpaid developer. At best, they have a few regular contributors. Attackers also put their many eyeballs on the most popular packages for obvious reasons. And forget about “promptness” with detection and removal: It takes days for a new malicious component to be removed from the public registry. Registries are cautious about removing a component version, for the good. Our experience is that once reported from our side, the median time for the registry to remove the affected version is 39 hours, more than a day and a half. There are malicious components that keep alive a week after our initial reporting. And in some cases, the component is removed only after a victim reports an incident involving the component.
What can I do to be better protected against malicious open source software?
Solid version handling. Version pinning with controlled and informed version bumps is the way to go, to balance the need for removing vulnerabilities without receiving malware. When you need to update a version, you need to know if the new version contains malicious behavior.
This leads to the second line of defense, dubbed “early warning”. One approach to the problem of malicious components is an early warning system (Malware Early Warning or MEW), where new versions published are analyzed by a detection engine, which when enough evidence is found may classify the new version as potentially malicious. Automation is essential here, as it is impossible to manually review all the new components at the current publishing rate. So the detection engine needs to combine a variety of techniques, including static, dynamic, and capability analysis, user reputation, and other evidence.
Another line of defense is known as “dependency firewalling”: to have a comprehensive whitelist of components for all dependency graphs used in your software, so in any build pipeline run in your organization only approved component versions can be installed and used. The “firewall” is enforced using an internal registry where the tarballs for the allowed component versions are served (cached or proxied). Please note that whitelisting will not work unless you have the technology, such as an early warning system, to classify any new version as reasonably safe so it can be added to the whitelist.
And the last approach goes at runtime, analyzing behavior at runtime, by capturing and learning the expected behavior for versions in software and then by detecting and blocking any anomalies in what the software tries to do. The idea is to instrument the runtime for monitoring and blocking software that deviates from normalcy, and it is a promising idea that will be added to the arsenal of protection mechanisms against the malicious component pest.
Concluding Remarks
The recommended strategy needs to combine different techniques in the software development process, taking control of the version updates to block incoming malicious components. We must accommodate version pinning to avoid automatic infection with updating versions to get fixes for the vulnerabilities that matter; a quick and efficient assessment of direct and indirect dependencies during version updates to have enough evidence that they are not malware-ridden. Builds of software that depend on known malicious components must be blocked. And all must be enforced.
Consuming open-source software with safety is not easy, and the malware factor must be fully taken into account, with similar effort put into vulnerability handling. The regulators are putting the liability burden onto software producers, so it is time to think about what we can do to avoid being a distributor of malware in our software.
About the Author
Luis Rodríguez is a physicist and mathematician with 15+ years of experience in cybersecurity, mainly in software security. He has worked on different projects in the domain as a technical evangelist and is currently co-founder and CTO for Xygeni Security.
Luis Rodríguez can be reached online at [email protected] or https://www.linkedin.com/in/luis-rodríguez-xygeni/ and at our company website https://xygeni.io/.