Making the case for GitHub’s Secret scanning

16 Dec, 2023
Xebia Background Header Wave

After scanning the GitHub Actions Marketplace for the security of those actions (read that post here) I was curious to see what happens if I’d enable Secret Scanning on the forked repositories. I regularly teach classes on using GitHub Advanced Security (where secret scanning is part of) and I always tell my students that they should enable secret scanning on their repositories. I even have a course on LinkedIn Learning about GitHub Advanced Security in case you want to learn more about it.

What is GitHub Secret Scanning?

Secret scanning is part of the GitHub Advanced Security offering if you have a GitHub Enterprise account, but for public repos it is free to use (and enabled by default). GitHub scans the repository (full history) and issues and pull requests contents to see if it detects a secret. The detection happens based on a set of regular expressions shared by GitHub’s secret scanning partners (over 100 and counting). If a secret is detected, GitHub will notify the repository owner as well as the secret scanning partner. Depending on the context (public repo or not for example), the secret scanning partner can decide to revoke the secret immediately, which I think most partners do.

This functionality is so good and fast, that I routinely post my GitHub Personal Access Tokens to GitHub issues during my trainings, to show the power of secret scanning. Usually I already have an email and a revoked token before I finish explaining what is happening in the background.

Photo of a woman holding her index finger to her mouth in a 'sst' manner

Photo by Kristina Flour on Unsplash.

Analyzing the actions repositories.

In my GitHub Actions marketplace scan, I have the repositories of 14k actions forked into an organization, so I can enable secret scanning and see what I get back from secret scanning. Since all Action repositories on the marketplace are public, any secret that is found has been found before, so I expect all these secrets to already have been revoked before.

Overall results: Found [1353] secrets for the organization in [1110] repositories (out of 13954 repos scanned). Here is a top 15 of most found secrets to see what is being found:

Secret scanning alerts

Alert typeCount
GitHub App Installation Access Token692
Azure Storage Account Access Key155
GitHub Personal Access Token120
Amazon AWS Secret Access Key50
Plivo Auth ID40
Amazon AWS Access Key ID40
Google API Key34
Slack API Token31
Slack Incoming Webhook URL27
Atlassian API Token22
Plivo Auth Token16
GitHub SSH Private Key12
Amazon AWS Session Token12
HashiCorp Vault Service Token11
PyPI API Token10

With all the news recently about credential leaking and malicious actors using these secrets to do bad things, I think it is very important to enable secret scanning on your repositories! Having this data really shows the power of GitHub and its secret scanning partners.

Analyzing the results

I wanted to get these results to get a feel for the amount of things secret scanning would find. I my opinion, the maintainers of these actions have a high level of understanding Git and GitHub, so you’d expect a relative low number of secrets being found. Still, 1110 repositories out of 13954 repos is 7.9% off all repos where secrets have been found. This shows how easy it is to accidentally commit secrets to a repository. Even in my own repos, a secret was found (a GitHub Personal Access Token even) that I accidentally committed in an environment file! And that while I teach people on these things!

I think this is a good number to show to my students and customers to make the case for enabling secret scanning on their repositories. Even folks with a high level of understanding, will still make mistakes and secret scanning will help by finding them for you, every time you make a change to your repository.

Rob Bos
Rob has a strong focus on ALM and DevOps, automating manual tasks and helping teams deliver value to the end-user faster, using DevOps techniques. This is applied on anything Rob comes across, whether it’s an application, infrastructure, serverless or training environments. Additionally, Rob focuses on the management of production environments, including dashboarding, usage statistics for product owners and stakeholders, but also as part of the feedback loop to the developers. A lot of focus goes to GitHub and GitHub Actions, improving the security of applications and DevOps pipelines. Rob is a Trainer (Azure + GitHub), a Microsoft MVP and a LinkedIn Learning Instructor.

Get in touch with us to learn more about the subject and related solutions

Explore related posts