Twice the bits, twice the trouble: vulnerabilities induced by migrating to 64-bit platforms

Twice the bits, twice the trouble: vulnerabilities induced by migrating to 64-bit platforms Wressnegger et al. CCS 2016

64-bit is not exactly new anymore, but many codebases which started out as 32-bit have been ported to 64-bit. In this study, Wressnegger et al. reveal how a codebase originally written for 32-bit, and which is perfectly secure on 32-bit platforms, can have new vulnerabilities simply by compiling it for 64-bit systems. Beginning with a theoretical analysis of how such vulnerabilities might be introduced, the team then go and look for such vulnerabilities in the wild…

To assess the presence of migration flaws in practice, we conduct an empirical analysis and search for 64-bit issues in the source code of 200 GitHub projects and all package from Debian stable (“Jessie”) marked as Required, Important or Standard. We find that integer truncations and signedness issues induced by 64-bit migration are abundant in both datasets… Although the vast majority of these issues are not necessarily vulnerabilities, the sheer amount indicates that developers are unaware of the subtle changes resulting from migrating code to LP64.

They did also find genuine vulnerabilities among those issues, in every single area the theory predicted they might exist. These include vulnerabilities in high profile projects such as Google’s Chromium, the GNU C Library, the Linux Kernel, and the Boost C++ Libraries. The paper contains case studies in each of these areas.

Our analysis reveals that migration vulnerabilities are a widespread problem that is technically difficult to solve and requires significant attention during software development and auditing.

Lots of people have studied integer-based flaws, but this is the first work to consider those introduced solely from the migration to 64-bit. Another thing to add to the ever-growing worry list!

Many software vulnerabilities are rooted in subtleties of correctly processing integers, in particular, if these integers determine the size of memory buffers or locations in memory. Leveraging these flaws, an attacker can trigger buffer overflows, write to selected memory locations, or even execute arbitrary code.

In standard C there are 5 different integer base types: char, short, int, long, and long long, and these come in both signed and unsigned variations. Unsigned types hold only positive numbers, and thus can store larger positive numbers than their signed counterparts which can store both positive and negative numbers. You can also convert between integer types, on the basis of a rank ordering char < short < int < long < long long. It’s the conversions between signed and unsigned, and between the different base types that are the root of the troubles.

What all this means on a given platform is dependent on the platform data model, which associates each base type with a given width in bytes ({1,2,4,8}). There are two rules a platform model must follow: the signed and unsigned versions of a given base type must be the same width; and the width of a higher-ranked type must be ≥ the width of any lower-ranked type.

Here are some data models used in the wild:

This paper focuses in particular on the shift from the ILP32 data model used by Win32 and Linux, to the LLP64 (Win64) and LP64 (Linux) data models.

With 64-bit memory we need 8 bytes to hold an address. Under ILP32 therefore an address fits in pointer/size_t, int, long (and of course, long long). Under Win64 (LLP64), addresses no longer fit inside int and long, and under Linux (LP64) they no longer fit inside int.

The three most common sources for integer-related vulnerabilities are truncations, underflows/overflows, and signedness issues.

A truncation occurs when you assign a wider type to a narrower type. For example, assigning a pointer to an int variable under ILP32 is never truncating (4 -> 4), but on the 64-bit platforms it is (8 -> 4). Note that only the data model determines when truncations occur – you can’t tell just from base types or signedness.

Here’s a contrived buffer overflow example illustrating how truncation can get you into trouble, it results in x (wider type) bytes being copied into a buffer of size y (narrower type):

An overflow occurs when you try to store a larger value in a variable than its type can hold (again, data model dependent). Here’s a simple example in which adding a CONST value may trigger an overflow. In this case passing in a value of x just below MAXINT, such that adding CONST to it causes it to wrap will result once more in copying large amounts of data (x) into a very small buffer (the wrapped value of x + CONST).

Signedness issues occur when assigning from unsigned to signed variables, and vice versa. If the target variable is narrower than the donor variable, we have a truncation issue. If we assign a signed variable to an unsigned target variable with greater width (data model dependent), then a sign-extension occurs in which the most significant bit of the narrower type is propagated to fill the larger width of the target variable.

Figure 3 (below) illustrates a buffer overflow that is caused by a change of signedness and a sign-extension due to upcasting to a larger unsigned type (unsigned short to size_t). The example is however modified to work on all three considered platforms. An attacker controlling the variable x of type short can overflow the buffer by providing a negative number, for instance -1, which is converted to 0x0000ffff in line 2 and due to a sign extension to 0xffffffff in line 3.

A final signedness issue occurs when comparing two expressions of different types, and the comparison is signed when it should be unsigned:

64-bit migration vulnerabilities

All types of integers available on 32-bit platforms also exist in 64-bit data models, however, their width may differ. These changes introduce previously non-existent truncations and sign extensions in assignments. Surprisingly, the migration to 64 bit may even flip the signedness of comparisons and render checks for buffer overflows ineffective.

Here’s a matrix packed with information, though it takes a while to understand what it’s telling you. Suppose we assign a variable of type source type (columns) to a variable of type dest type (rows), what can go wrong? In each cell of the matrix the three circles correspond to 32-bit, 64-bit Windows, and 64-bit Linux respectively. White circles indicate no problems, grey circles are signedness issues, and black circles are truncations. Any time you don’t see three circles of the same colour in a cell, you’re looking at the potential to introduce a vulnerability just by porting.

As a result, this program is safe on 32-bit platforms, but vulnerable on 64-bit (line 5 introduces an integer truncation on porting):

Unfortunately, vulnerabilities of this type are supported by the design of standard library functions, such as fgets, fseek and snprintf, which receive or return size information as type int and long. The common idiom of using variables of type int to iterate over buffers further adds to this problem

Here’s what happens when you do the same analysis for comparisons as opposed to assignments:

In addition to flaws that result from changes in integer widths, code running on 64-bit platforms has to be able to deal with larger amounts of memory as the size of the address space has increased from 4 Gigabytes to several hundreds of Terabytes. In effect, the developer can no longer assume that buffers larger than 4 Gigabytes cannot exist in memory. As a result, additional integer truncations and overflows emerge, which do exist on 32-bit data model in the first place, but cannot be triggered on the corresponding platforms in practice.

See §3.2 for several examples.

Five common vulnerability patterns

  1. New truncations (that exclusively happen on 64-bit systems), e.g. when using atol
  2. New signedness issues, for example using a signed variable of type int to specify the amount of memory to copy with memcpy.
  3. Dormant integer overflows – e.g loops using variables of size_t that increment or decrement an (unsigned) int in their loop body.
  4. Dormant signedness issues, e.g. calls to strlen that do not take a string literal, and assign the return value to an (unsigned) int.
  5. Unexpected behaviour of library functions, e.g. calls to snprintf that make use of the return value.

We find that on average, C/C++ projects from Debian stable tagged as Required, Important or Standard spawn 1,798 warnings concerning type conversions, 703 of which are exclusive to 64-bit systems… 10% of all invocations to the memcpy function in the inspected Debian and GitHub projects, are called with a signed value of 32-bit in size rather than the 64 bit wide size_t as parameter for the number of bytes to copy. Finally, we make use of this systematization and the experience thus gained to uncover 6 previously unknown vulnerabilities in popular software projects, such as the Linux kernel, Chromium, the Boost C++ Libraries and the compression libraries libarchive and zlib—all of which have emerged from the migration from 32-bit to 64-bit platforms.