Trellix and GitHub have collectively fixed a total of 61,895 open source projects that were found to be susceptible to a 15 year-old path traversal vulnerability in Python’s tarfile module.
The firm’s Advanced Research Center team highlighted the prevalence of CVE-2007-4559 in September 2022, after they identified that 15 years after it was first discovered, it was in use in an estimated 350,000 open source projects, and an unknowable number of closed source ones.
The team stumbled across the vulnerability while investigating an unrelated issue and at first thought it was a brand-new zero-day, but after they started tugging on the thread, they discovered they were in fact looking at a veteran bug in the “extract” and “extractall” functions in Python’s tarfile module.
When exploited, CVE-2022-4559 lets a user-assisted remote attacker overwrite arbitrary files via a specific sequence in filenames in a TAR archive, achieving arbitrary code execution or control of the target device.
Back in October 2007, the bug was deemed to be of low importance, and it remains widespread in multiple frameworks, including some created by Amazon Web Services, Google, Intel and Netflix, and multiple other applications used for machine learning, automation and Docker containerisation.
Doug McKee, Trellix principal engineer and director of vulnerability research, said that since then, the team had been working on a four-month-long effort to automate the patching of vulnerable open source projects, taking inspiration from a talk at DEFCON 2022 by researcher Jonathan Leitschuh. “Through GitHub, developers and community members are able to push code to projects or repositories on the platform via a process called pull request,” he said. “Once a request is opened, the project maintainers review the suggested code, request collaboration or clarification if needed, and accept the new code. In our case, the code pushed via pull request delivered unique patches to each of the vulnerable GitHub projects.
“As we outlined a process to automate patching … our Advanced Research Center vulnerability team was able to automate most of the processes, except for quality control. We broke the process into two steps, the patching phase and the pull request phase, both of which were automated and simply needed to be executed.”
After getting a list of repositories and files containing the keyword “import tarfile” from GitHub, the Trellix team compiled a unique list of repositories to scan, and cloned and scanned each one using an app vulnerability checking tool called Creosote that it created for the purpose. If Creosote found a vulnerable repository, the team patched the file and created a local patch diff containing the patched file, so that they could be compared, the original file, and repository metadata.
This done, the team reviewed the list of local path diffs, created a fork of the vulnerable repository, cloned it, then replaced the original file with the patched file if they found it had not changed in the meantime – if it had, they took pains not to overwrite any other changes.
The changes were then committed to the vulnerable repository, and a pull request created from the forked repository back to the original to explain to the repository owners what was happening. It is now down to the repository owners themselves to accept the patch, added McKee.
“The vulnerable tarfile module is included in the base Python package and is a readily available solution for a common problem – it is also, without a direct fix from Python, firmly embedded in the supply chain of many projects,” he said.
“It’s permanence along with the fact that nearly all the learning material for how to properly use the tarfile module teaches developers how to use it improperly creates a broad attack surface. Through these efforts to automate and patch vulnerable projects, the software supply chain attack surface is narrowed.
“This work to narrow the attack surface cannot be done without collaboration across our industry,” added McKee. “As an industry we cannot afford to ignore the need to seek out and eradicate foundational vulnerabilities. Mass patching of open source projects can be done, even if it takes a lot of time, and it can deliver benefits to organisations of all sizes, across sectors and regions.”
He urged any and all organisations using code libraries and frameworks in their applications to put in place proper checks and evaluation measures to ensure proper visibility into their software supply chains, and emphasised the importance of leaning on developers to get educated on all layers of the tech stack.