A Rapidly Growing Problem Named Vulnerability
Talking to friends and practitioners in the space, a recurring topic is how it seems like the number of vulnerabilities we need to deal with is growing, and how challenging it has become to prioritize which of them to fix first. Although I agree with this premise, the reality is that we are only humans and maybe we are only under the impression that there are more vulnerabilities to deal with than ever. After all, we have also been busier than ever, and maybe we simply have less time to deal with this problem.
Throughout the research that lead to this article, I set out to find if this premise is really true. Are we really dealing with more vulnerabilities than ever? But finding out if this is indeed true isn't as satisfying as understanding the why behind this, so I also tried my best to justify the growth (or lack there of) vulnerabilities over time with actual data.
Are we collectively generating more vulnerabilities?
The first part of this research was a simple yes or no question: are we collectively generating more vulnerabilities? To answer this question, though, we need to get a bit in the weeds of the cybersecurity vulnerabilities. Are you familiar with the terms below? Feel free to skip to the next chart.
The first term we need to understand is of the CVE. The Common Vulnerabilities and Exposures (CVE) program, maintained by the MITRE Corporation and sponsored by the U.S. Department of Homeland Security (DHS) Cybersecurity and Infrastructure Security Agency (CISA), is a dictionary or glossary of vulnerabilities that have been identified for specific code bases, such as software applications or open libraries.
So, the CVE List is a list of all CVE IDs, and allows interested parties to acquire the details of vulnerabilities by referring to a unique identifier known as the CVE ID. It is important to note that not all vulnerabilities necessarily make their way into getting an assigned CVE ID for a variety of reasons, but the CVE List is by far the most recognized way to learn about cybersecurity vulnerabilities.
Thanks to the CVE program efforts, answering my original question turned out to be easier than I initially anticipated. They have a dedicated page for CVE metrics that includes the number of publised CVE records. It's important to know that A CVE Record contains descriptive data, (i.e., a brief description and at least one reference) about a vulnerability associated with a CVE ID.
With this free data in hands, I had it plotted in the chart below to help us visualize the number of published CVEs over time:
It doesn't take a lot of effort to see a clear pattern of new published CVE growth from 2017 onwards, so the hypothesis is clearly true. If we zoom in to the window of time from 2020 to 2023, we can see that the number of published CVEs grew over 57% on this period alone! To give us a better perspective on these numbers, in 2023 alone there were 79 new published CVE IDs a day on average. That’s over 3 new published vulnerabilities an hour!
But why?
This would have been a really lame article, and research, if it was only about confirming the hypothesis that we are generating more vulnerabilities than ever. I wanted to understand, and share with you, the why behind this growth. The reality is, though, that I can't, with 100% certainty, pinpoint reasons behind this growth.
This shouldn't stop us from speculating on some of the reasons, though. So my first hypothesis for this growth is simple: The growth on the number of vulnerabilities should be (as close to) directly proportional to the growth of lines of code developed. To prove this hypothesis, however, one needs to understand how much the codebase grew in the same period.
The challenge here is that there's no absolute way to know how much the global codebase grew in this period. We can extrapolate this information, though, using GitHub data, the most used SCM platform, as a way to approximate how much code was created.
Luckily for me, GitHub maintains a repository, called innovationgraph, that contains structured data files of public activity on GitHub itself. One of the metrics that is captured and shared with the public audience is the number of “Git push” over time. For the uninitiated, “git push” is the command a developer executes when submitting or removing code, to a remote git server, which is GitHub in this case. And although a growth on the number of “git push” doesn't mean a growth on “number of lines of code”, for this exercise, let's imagine that the average “lines of code” per “git push” hasn’t changed over time and it is positive.
With that in mind, let’s see visualize this data, plotted in the chart below:
As I expect, codebases in GitHub grew overtime. If we zoom in to the window of time from 2020 to 2023 again, we will see something interesting: the codebase grew virtually over 40% in this period, which is significantly lower than the increase of 57% of vulnerabilities for the same period. This goes against my hypothesis that the growth was (close to) directly proportional between vulnerabilities and codebase size. But why such a difference between growth rates?
Again, I can only speculate here. The cybersecurity discipline matured a lot in the past few years and we can look at this number with a positive spin, congratulating ourselves for collectively improving in detecting and disclosing vulnerabilities. Or, if we are the glass-half-empty-type, we could be lamenting on how the code quality, when it comes to security, has decreased.
Quick sidetrack: I want to add a touch of personal opinion here. It's almost impossible to find content today that doesn't speak to Generative AI (GenAI), and this one won't be different. If lower quality code is indeed the reason why we saw a disparity between growth of vulnerabilities vs growth of "git push", we can speculate on the impact of GenAI on the future of vulnerabilities. As you might know, GenAI is trained on existing content to generate new content, and developer’s copilots — GenAI agents purposefully built to generate code — are not different. So, if code quality is lower than ever, copilots will generate lower quality code as well. Worse: more code is then pushed to repositories, which will eventually lead to faster growth of (low quality) codebases and, as a consequence, of vulnerabilities.
Deployment Frequency
Vulnerable code isn't an actual vulnerability only because the codebase was changed. A vulnerability only exists when said vulnerable code is part of a new version of a software or open source library, so it needs to be deployed first. With that in mind, I decided to do research on modern deployment frequency to understand if that could also impact the number of vulnerabilities.
To help me understand how the deployment frequency changed in the last few years, I resorted to DORA, the DevOps Research and Assessment group. DORA has 4 Software delivery performance metrics that many organizations leverage to measure their own efficiency and maturity level when it comes to delivering value to their customers.
One of these metrics is the Deployment Frequency, or how often an organization can successfully deploy to production, exactly the data I need to answer the question I had in mind. Every year since 2014, with the exception of 2020, DORA released the Accelerate State of DevOps Report, that includes the results of a benchmark assessment of DevOps performance across hundreds of organizations, so this data can help us understand how they evolved over time.
Although organizations are divided by Elite, High, Medium or Low performance based on certain criteria, simply showing an evolution of the percentage of organizations spread across these tiers over time wouldn't be enough, as the criteria for each tier also evolved over time. Instead, I've plotted a chart below that represents the percentage of organizations capable of deploying multiple times a week, over the years:
As expected, despite a dip in 2022 that DORA theorizes can be a consequence of the pandemic, there was a growth from 2021 to 2023 of over 88% of the number of organizations deploying software more than once a week. This adds an increased dimension to the AppSec practitioner: they not only need to deal with more vulnerabilities than ever, and codebases larger than ever as we saw before, but they also need to deal with deployments faster than ever as well. So they have less time to assess the code quality in order to not slow down this fast deployment pipeline!
But we have an increased workforce… Right?
So far, we proved that the number of vulnerabilities grew, that there are more code changes to analyze than ever and that organizations are deploying to production on an increased rate. The impact of all these changes could be minimized, however, if we had more people working at protecting these applications.
Trying to get good data, however, on how much the Cyber Security job market grew, let alone the AppSec market did, in the last few years, proved itself as a tough question to answer. Getting this data for the American market, however, turned out to be easier. Using the website CyberSeek as reference, I captured the data and plotted the following chart, that includes both openings and filled positions in the Cyber Security job market in the USA:
Unfortunetally, and as probably one would expect, the Cyber Security job market, at least in the US, only grew less than 10% from 2020 to 2023. Of course these aren't global numbers, but I believe it's a good representation of the global growth.
Nevertheless, that means that number of people working on the securing applications everywhere didn't grow as fast as the number of vulnerabilities.
Conclusion
When I set out to answer the question if there were more published vulnerabilities today than ever before, I must confess that I expected the answer to be "of course yes". But I wanted to be sure. As we saw, that's exactly what happened. But we couldn't stop there, we had to understand the why.
As we saw, developers are pushing code more frequently than ever as well, while organizations are maturing their development processes to also deploy these changes in a frequency that just a few years ago would be unimaginable to most.
All this means that we need to be smarter. Trying to deal with this influx of vulnerabilities at the speed of DevOps doesn't allow us to use the same processes and tools that we have been using so far. In part 2 of this article, we will discuss how modern tools like SCA, EPPS and diverse techniques can help us minimize the impact of the growth of vulnerabilities and of deployment speed in our AppSec programs.
Stay tuned.