12.8 million new secrets occurrences were leaked publicly on GitHub in 2023, +28% compared to 2022, according to GitGuardian. Remarkably, the incidence of publicly exposed secrets has quadrupled since the company started reporting in 2021.
Companies need to manage sensitive information exposure
The growing number of code repositories on GitHub, with 50 million new repositories added in the past year (+22%), increases the risk of both accidental and deliberate exposure of sensitive information.
This reality underscores the vital need for companies to track and manage the exposure of their sensitive information. Too many remain vulnerable to breaches without awareness or means to mitigate them.
In 2023 alone, over 1 million valid occurrences of Google API secrets, 250,000 Google Cloud secrets, and 140,000 AWS secrets were detected.
While the IT sector, which includes software vendors, is the most affected industry, with 65.9% of all detected leaks, other industries are also impacted. These include education, science & tech, retail, manufacturing, and finance & insurance, which account for 20.1%, 7%, 1.5%, 1.2%, and 1% of leaks, respectively.
This highlights the need for increased vigilance and proactive measures to protect sensitive information across all industries as the risks associated with secret sprawl continue to grow.
The research sheds light on an important security gap: upon discovering an exposed valid secret, 90% remain active for at least five days, even after the author is notified. API keys and authentication tokens for major service providers such as Cloudflare, AWS, OpenAI, or even GitHub are often affected by non-revoked secrets.
“Developers erasing leaky commits or repositories instead of revoking are creating a major security risk for companies, which will remain vulnerable to threat actors mirroring public GitHub activity for as long as the credential remains valid. These zombie leaks are the worst,” said Eric Fourrier, CEO of GitGuardian.
The prevalence of zombie leaks may be underestimated
To assess the prevalence of zombie leaks, the study selected a random sample of 5,000 erased commits that had exposed a secret. Of the repositories that hosted these commits, only 28.2% were still accessible at the time of the study.
This indicates that the remaining repositories were likely deleted or made private in response to the leak, suggesting that the prevalence of zombie leaks may be underestimated.
Furthermore, the study hypothesizes that companies may use DMCA takedowns as a means to govern leaky repositories over which they do not have control. In support of this, the study found that in 2023, 12.4% of the 2,050 repositories taken down by GitHub exposed at least one secret, representing a 37.8% increase from 2020.
These findings are crucial for grasping the full scope of the secrets sprawl issue. While most security initiatives focus on detecting leaks, the bottleneck lies in improving the security posture. Simply alerting developers falls short; what’s truly essential is providing them with the necessary guidance and support to rectify their mistakes effectively.
“The Toyota breach in 2022, which occurred after a hacker obtained credentials for one of its servers from source code published on GitHub, is proof that even five years after a leak, a compromise can still happen,” concluded Fourrier.
Secrets sprawl affects more than code repositories
The year 2023 marked the breakthrough of generative AI, significantly impacting various professional fields with rapid adoption facilitated by user- friendly chats and developer-friendly APIs. Developers, as we have seen, are at the forefront of this new wave, and there is no doubt that this powerful technology, in the hands of both good and bad actors, will have an outsized impact on cybersecurity.
The study also reveals that 3.11% of secrets leaked in private repositories were also exposed in public repositories. This dismantles the idea that relying on the privacy of source code as a security layer is a valid strategy.
This year, GitGuardian expanded its investigation into the pervasiveness of leaked secrets within PyPI (the official third-party package management system for the Python community). In 2023, 11,054 unique secrets were exposed in package releases. Approximately 10,000 of those secrets had been there since before 2023, and over 1,000 had been introduced that year.
Lastly, the report provides a set of valuable recommendations for organizations committed to tackling secrets sprawl. A blend of awareness, training, and efficient, automated processes is essential. However, organizations must also employ discovery tools and robust controls. This is where secrets detection and remediation platforms come in, facilitating continuous security assessment of secrets, enforcing consistent policies throughout the software development lifecycle, and speeding up incident resolution.
As GitHub’s popularity soars, it increasingly attracts malevolent actors, positioning it as a central hub for cyber threats.