A study of security vulnerabilities on Docker Hub

A study of security vulnerabilities on Docker Hub Shu et al., CODASPY ’17

This is the first of five papers we’ll be looking at this week from the ACM Conference on Data and Application Security and Privacy which took place earlier this month. Today’s choice is a study looking at image vulnerabilities for container images in Docker Hub. (You may recall an earlier study by Banyan which looked at just official images. Jérôme Petazzoni’s response to that is also well worth reading.) It’s important to note the timing of this study – the analysis was done in April of 2016, and one month later Docker announced the Docker Security Scanning service (as you’ll soon see, it was very much needed!). This service provides automated security analysis, validation, and continuous monitoring for binary images hosted on Docker Hub. Even more recently, Docker introduced the Docker Store with Docker Verified Images (community/Hub images are not verified by Docker). So we would hope, if the analysis was repeated today, that the official and verified images would not contain any known vulnerabilities. There are plenty of unverified images out there too though, and this paper is a very good reminder of the need for software supply-chain management (a phrase I first heard via Joshua Corman).

It’s the same scenario as we recently saw with JavaScript dependencies on the web. As a developer, it’s incredibly easy to add a dependency to your project, which tends to lead to dependencies being added with little thought. In fact of course, adding dependencies is actively encouraged versus rewriting functionality yourself from scratch. Furthermore, any one dependency you bring in often has dependencies of its own. We need to add a little more discipline around the inclusion of dependencies (images, libraries, packages etc.) in projects. I recommend something like the following checklist.

Software supply-chain management checklist

  1. ☑️ Is the license of this dependency compatible with my project?
  2. ☑️ Is the dependency actively maintained, and do I know where information about vulnerabilities in the dependency will be published?
  3. ☑️ Is the version of the dependency I propose to include currently free from vulnerabilities? (Including any vulnerabilities in the transitive closure of dependencies).
  4. ☑️ Do I have an active process in place to alert me when a new vulnerability is found in the dependency?
  5. ☑️ Do I know where new versions of the dependency will be announced?
  6. ☑️ Do I have an active process in place to alert me when a new version is released? (The release notes may contain information about important fixes that didn’t get flagged as a vulnerability for some reason).

Such a process works best if you also have an automated way to catch and flag any new dependency being added to a project – via commits or pull requests for example. (I’ve been using Snyk.io to do this on one of my recent projects, other tools are available). Then you can make sure to go through the checklist during code review – and perhaps even go so far as to fail the build if it is not completed. For your own reference, a good place to document the checklist answers (where to go to find vulnerability information, and the processes in place for notification) might be in a dependencies.md file or similar checked into the root of your project.

Anyway, let’s get back to the study of security vulnerabilities in Docker Hub images…

Docker images and the Docker Hub

In case you’ve been living under a rock for the last couple of years:

Docker distributes applications (e.g., Apache, MySQL) in the form of images. Each image contains the target application software as well as its supporting libraries and configuration files… Docker Hub, introduced in 2014, is a cloud registry service for sharing application images. Images are distributed using repositories, which allowed versioned image development and maintenance.

Official repositories contain certified images from vendors, community repositories can be created by any user or organisation.

Analysis process

This is pretty much as you would expect: download images and scan them! Of course, there are lots of images so you want to do that in parallel. One difficulty is that although official image repositories are listed, there is no such list of community repositories. Not to be put off, the authors generated 5,000,000 unique strings, used the Docker Hub API to query Docker Hub for each of them, and recorded the results. In this way almost 100,000 unique repositories were discovered, ultimately leading to vulnerability scanning of 356,218 community images and 3,802 official images.

The Clair open source tool from CoreOS was used to do the scanning. Clair matches package metadata against the CVE database and related vulnerability tracking databases. Each found vulnerability is graded ‘low’, ‘medium’, or ‘high’ risk based on the Common Vulnerability Scoring System.

The analysis also examines how vulnerabilities propagate through across layers and images. For example:

Findings

The table below shows the number of vulnerabilities found for in images. Note that (as of April 2016) the worst offending community images contained almost 1,800 vulnerabilities! Official images were much better, but still contained 392 vulnerabilities in the worst case. Perhaps the most useful number is the median number of vulnerabilities in the :latest version of each image: 153 for community images, and 76 for official images. For official images, the :latest version tends to far much better than its predecessors (indicating active updating of vulnerable components), whereas for community images there’s not much difference.

If we look at the distribution of vulnerability severities, we see that pretty much all of them are high severity, for both official and community images. What we’re not told is the underlying distribution of vulnerability severities in the CVE database, so this could simply be a reflection of that distribution.

Over 80% of the :latest versions of official images contained at least on high severity vulnerability!

What kind of vulnerabilities did the authors find? Here’s the breakdown for the latest official images (year columns are the year the vulnerability was first reported in the CVE database, not the year of the image – Docker isn’t that old for a start!).

Many Docker Hub repositories are well maintained, whereas others remain unmaintained. Intuitively, an image that has not been updated in a long time is more likely to contain vulnerabilities. Therefore we seek to characterize the age of images at the time of analysis.

Here’s the CDF. It shows that official images tend to be actively updated, whereas the latest versions of community images can be very old. By my reading, about a third of community :latest images are over a year old.

When analysed by layer, we see that child images inherit on average 80 or more vulnerabilities from their parents:

This is an interesting observation, because it suggests that when a child installs new software packages, the maintainer is not applying security updates (e.g., with apt-get upgrade).

Tools

The paper mentions several tools that can assist in scanning container images for vulnerabilities:

To this list we can also add the following (and maybe others I’m not aware of or have forgotten too):

See also Docker’s ‘Benchmark for Security‘ recommendations. Be careful out there!