Tools & Services

Behind the Scenes: Why Netlify Engineering Uses GitHub Advanced Security

Tools & Services

Behind the Scenes: Why Netlify Engineering Uses GitHub Advanced Security

Netlify has spent a great amount of time integrating Static Application Security Testing (SAST) tooling in CI pipelines over the last couple of years, we’ve noticed that even the greatest SAST tools often fail in practice. Before deciding on GitHub Advanced Security, we evaluated and tried many different types of tools. After careful evaluation we decided to use GitHub Advanced Security, using a somewhat unique approach, and we would like to share our experiences with you.

The Challenges of SAST (And Why We Chose GitHub Advance Security)

In order to achieve the desired post-scan functionality, before we started using GitHub Advanced Security, we were mostly wrapping open source tools with script languages to add what we felt was necessary. These wrapped OSS tools look something like ‣ or ‣. But Netlify’s developers didn’t want the need for additional configuration files to manage security scan functionality like suppressions, issue creation and Slack alerting. Also, with this approach, our Security Engineering team would fall behind in the maintenance of these wrapped tools. So we began our search for a comprehensive tool that was already in existence on the market.

First and foremost, everyone knows we need security tools that locate quality findings. But looking beyond quality of findings, the main challenge for Security Engineers focusing on SAST is getting the vulnerability notification in front of developers in a way that is worthwhile and actionable. This is extremely challenging for a variety of reasons, each of which have their own caveats that must be understood. Here are what we consider to be the most pressing challenges:

Context switch from CI UI to Triage UI

Want to find a way to burn out your devs? Have them jump to various security UIs to triage their issues. Context switching can be one of the biggest buzzkills during development. Your build isn’t passing because a status check is asking you to jump to some security tool UI, which you have to learn how to navigate, and solve some vulnerability issue that you, as the developer, wish didn’t exist at all. Generally after all that work we find out the finding was a false positive. Context switching is a huge burnout concern.

Also, as a cost consequence, most of these tools charge by user licenses, so fewer users in fewer tools means more money. Using GitHub Advanced Security simply means switching tabs in the same UI, handling multiple SAST needs such as code scanning, secret scanning, and dependency analysis in one place.
Adjusting the alerting severity

Being able to define the severity in which the CI stops building in a flexible way, that is, per service or per repository, is crucial. GitHub frees up the time of the security team with Advanced Security by allowing individualized repo settings by repo administrators. For example, teams can start by blocking builds on High or greater severity. Then, once they have triaged all the medium findings manually, they can adjust to block builds when a new Medium or greater finding is discovered. This method of working downward through severities equivocates to iterative security improvements. Asking too much on the first day will lead to developer burnout, and a distaste for security tooling and policy.
Suppressions
Suppressions for secret scanning are a huge must-have. If a team uses a mock secret token in many tests that is definitely not sensitive, they should not be constantly alerted on it. Repeated false positives is a great way to create alert fatigue and developer burnout. Github Advanced Security has suppressions at the repo level and the organization level, which is fantastic.
Creating an workable issue
GitHub CodeQL code scanning can create a workable GitHub issue for the code vulnerabilities that it discovers. Notice in both the tools we used previously, which are mentioned at the beginning of this article, “creating issues” is a must-have. Maintenance of this feature can be problematic. For example, for each finding, before creating an issue, we must check if an issue is already created. Through our analysis we discovered that we would spend up to 25% of a Security Engineer’s time just maintaining these needed “must-have” functionalities. Bugs like these and other problems faced by Netlify developers led us away from wrapping other OSS tooling, and we took a hard look at comprehensive security products on the market.

Having issues created is especially nice for lower severity findings, because they can be left to work by another developer when the prioritization makes sense.
Reducing reliance on the security team, while not overburdening developers.

Giving developers the power to triage the security findings on the repos that they administer can be great for the security team, but it means asking the developers to do more. By putting all the status checks and triage mechanisms in a single UI, developers are less burdened.
Another note to make here is the quality of the secrets and code vulnerability findings is great, and the false positives are minimal. CodeQL actually turns the code pieces into queryable elements in a database. Since it has the granular access to the code on a repository level, it can effectively analyze the code as repeatable, customizable queries on a DB. This greatly increases the repeatability of code vulnerability queries, and consequently reduces false positives.
Empower the developer’s security intentions.

Netlify likes to say that “Everyone should be telling the security story.” This is especially true for our developers, who are the people that impose change in Netlify’s Security posture, and effectively become the most inner line of defense. After all, if it's not built secure, it can’t run securely.

We’ve done something unique at Netlify. Instead of mandating GitHub Advanced Security on a specific day, we train and empower developers on the reasons we want to enable many types SAST (and DAST) tooling. So in our Advanced Security example, developers are presented with the following issue in their repos

Notice that the actual implementation of the security tools is controlled through labels, which anyone with the ability to create labels can use to apply. When the labels are set to false, the action merely stop the issue from being recreated.

Both presenting GitHub Advanced Security as an issue to developers, and enabling their control of activation with labels have huge advantages:

This allows developers to see the necessity to implementing security scans and validations as part of the normal issue workflow.
It also permits the implementation to happen during a time that is convenient for the codeowner team’s schedule for the repository or service, but at the same time emphasizes the need to do so based on policy. At Netlify, some larger teams might have a on-call person that works on these. Some smaller teams leverage our security engineers assistance.
One thing to note is that in many cases, the initial scan findings can be a bit overwhelming to triage. So we made sure we made time for security engineers to be available to help our development teams with the triage.
The process invokes a CodeQL code scanning workflow PR, eliminating one additional step for developers from the normal process of going thru the wizard found under the Security tab.

Netlify’s Security team is then able to perform continuous monitoring of the overall implementation metrics at enterprise and organization levels, to understand the overall security posture of the organization, and gauge the level of implementation of dependency checking, code scanning, and secret scanning across our GitHub organizations and repositories. Having this enterprise-wide observability enables us to emphasize trends to our Risk Management Committee for comprehensive security awareness by leaders and staff across the company.

Remaining Issues to Solve:

There are definitely some larger challenges to implementing these types of SAST tools that are important, but solutions are generally not available in open-source or commercial tooling. Our opinion is that this is due to a disconnect between the people using the tools, and the people creating them. Here are some of the major challenges that never really seem to be solved well.

What do we do about monorepos?

Monorepos makes establishing codeowners difficult, therefore establishing accountability for the security alerts becomes more difficult. Often figuring out which team is responsible for a finding involves understanding nuances of the service and code paths.
Perhaps being able to proactively map files, or even portions of a file, to specific groups or teams , for notification of issues in those portions, would be a good first step.
“Recasting” the severity or “changing the severity of a vulnerability”

I recently spoke to our compliance officer, and he told me recasting wasn’t that cool anyways. They continued to explain that downgrading security vulnerabilities is a “cop-out” (because let’s face it, who has ever recast a vulnerability risk to a higher level?) They said “If it’s a thing, we should fix it, not just minimize it.” I myself have recasted vulnerabilities before, mostly from “low” to “informational”, and mostly just as a reporting formality.
Exceptions and Time-boxed Exceptions

Similar to suppressions, one of the biggest challenges we see in the implementation of SAST tooling in a CI/CD pipeline are Exceptions.
Allowing a finding, with developers justification and peer approval should be possible. There are many reasons why a finding may be allowed into an environment, the simplest being that the pace of development somehow outweighs the risk to the business. Audit reviews need documented exceptions if something is permitted for a reason. Github Advanced Security does this well with both UI and API methods to mark a false positive, and can give some decent canned responses such as being used in tests.

However, exceptions should have the possibility of being permitted only for a set amount of time. Most security exception review or audit practices revisit documented exceptions on a monthly or quarterly basis, so allowing exceptions for 30, 60, and 90 day intervals only makes sense. When the exception runs out, the alert would happen again. If the developer still cannot fix, but still requires the exception, they would go through the process again. This is one of the few situations when putting burden on the developer does become an imperative.

Conclusion

Netlify uses GitHub Advanced Security to discover dependency, code, and exposed secret vulnerabilities early in the Software Development Lifecycle. GitHub has a comprehensive UI that reduces context switching and toil for developers tasked with remediation of these vulnerabilities. Netlify’s security minded customers who host their source code in Github can help assure the security of their sites, workloads, and apps that run on Netlify by utilizing GitHub Advanced Security.s