Why your encrypted database is not secure

Why your encrypted database is not secure Grubbs et al., HotOS’17

This is the third paper we’ve looked at so far in The Morning Paper on the topic of encrypted databases. The clear takeaway for me is that practical, provable security guarantees are very hard to deliver! Don’t confuse better protection with unbreakable protection, and don’t become complacent because you think you have more security guarantees than you really do.

In Generic attacks on secure outsourced databases we saw that if it is possible for an attacker to observe enough queries and responses – even when both the queries and the responses are encrypted and opaque, then reconstruction attacks can recover the secret information stored in the database. In Breaking web applications built on top of encrypted data  we saw the difficulty of porting (or indeed, just designing) applications to work well with encrypted databases without unintentionally leaking information, and the difficulties involved in protecting against active attackers.

If we can’t offer protection against active attackers, nor against persistent passive attackers who are able to simply observe enough queries and their responses, the fallback is to focus on weaker guarantees around snapshot attackers, who can only obtain a single static observation of the compromised system (e.g., an attacker that does a one-off exfiltration). Today’s paper pokes holes in the security guarantees offered in the face of snapshots attacks too.

Many recent encrypted databases make strong claims of “provable security” against snapshot attacks. The theoretical models used to support these claims are abstractions. They are not based on analyzing the actual information revealed by a compromised database system and how it can be used to infer the plaintext data.

A quick refresher on encrypted databases (EDBs)

Examples of encrypted data systems include CryptDB, Mylar, Arx, and Seabed, as well as CASB (Cloud Access Security Broker) solutions from CipherCloud and SkyHighNetworks. It’s worth noting that the first two authors of this paper declare “large financial stakes” in Skyhigh Networks.

Encrypted databases operate on top of a commodity database management system (DBMS) such as MySQL or MongoDB but store data in an encrypted form so that even if the DBMS or underlying OS is compromised, the attacker cannot obtain the protected data… For efficiency, encrypted databases rely on specialized encryption schemes that allow the server, given only the ciphertexts, to perform some computations in response to client queries. The price is the leakage of partial information about plaintexts.

As we’ve already seen, observations of query evaluations by persistent attackers can exploit the leakage of information from property-revealing encryption (PRE) schemes such as order-revealing encryption, deterministic encryption, and searchable encryption.

Since PRE schemes are always vulnerable to persistent attacks, many EDBs claim security against snapshot attacks only. Examples include the latest claims for CryptDB and Mylar (revised after the original claims were shown false by, respectively, “Inference attacks on property-preserving encrypted databases” and “Breaking web applications built on top of encrypted data“), new systems Arx and Seabed, and new cryptographic schemes such as Lewi-Wu order-revealing encryption.

It’s time to take a closer look at snapshot attacks…

Snapshot attack mechanisms

A running DBMS will have state information in any (or all!) of four places: volatile DB state in RAM and CPU registers, persistent DB state on disk, volatile OS state, and persistent OS state.

  • Disk theft will yield the persistence information stored by the DB and OS, but not any volatile state. (Ignoring full disk encryption in this analysis, since EDBs claim protection even in the absence of this).
  • SQL injection can enable arbitrary code injection and full control of the memory space of the DB process, yielding the persistent and volatile DB state.
  • A DBMS running in a virtual machine is vulnerable to VM image leaks which give you everything – persistent and volatile state for both DB and OS.
  • A full-system compromise (e.g rooting the DBMS) gives full access to the persistent and volatile OS and DB state. “This enables persistent passive and active attacks, but ‘smash-and-grab’ attacks that simply grab available data and leave are prevalent.”

Information leaks

“We have the strongest, most secure front door of any building in the city.”

“Yes, but you left the windows open.”

If you have access to the disk, you have access to the DBMS logs. If you have access to the query engine (SQL injection), you have access to the DBMS’ diagnostic tables. And if you have access to memory, you can see in-memory data structures and caches. None of these are good it turns out.

Information in logs

… logs record changes to the individual database records at the byte level. Using standard forensic techniques for reconstructing insert, update, and delete transactions from these logs, an attacker who compromised the disk can reconstruct queries that modified the database.

The above example refers to MySQL, which also has a separate binary log supporting replicated transactions and point-in-time recovery, from which the timing of queries can be inferred. (For certain types of encrypted databases, timing information can leak sensitive data). MongoDB has a similar mechanism.

We’re not talking about the kind of logs you might send to your favourite logging service here, we’re talking about the transaction undo/redo logs that are a critical part of the functionality of a DBMS.

Information in diagnostic tables

… modern DBMS’s include tables – extractable via SQL injection – that store a great deal of performance statistics intended to help tune specific databases to their workloads and diagnose problems and performance bottlenecks.

For example, MySQL’s information_schema database contains information about the contents of caches and currently executing queries, and its performance_schema database contains aggregate statistics about query execution as well as a threads table with information about the current statements being executed by all threads. Here you’ll also find historical information about past queries, the number of rows examined, and the number of rows returned.

Information in in-memory data structures

If you can can access to memory, then a lot of useful information will be in caches. For example InnoDB maintains adaptive hash indices for frequently accessed pages, the and MySQL query cache can be configured to keep the results of certain SELECT queries. Even with the cache disabled, queries persist in MySQL’s internal heap long after their execution.

As an experiment, the authors issued a SELECT query to MySQL using a random string (that does not appear anywhere in the database) as a column name. They followed this by 1000 SELECT queries, then 500 inserts, then a twenty minute wait and 100,000 more SELECT queries. Then they dumped the memory of the MySQL process.

The full text of the original query appeared in three distinct locations in memory, and the random string appeared in three additional locations by itself… This leak is not surprising since MySQL is not designed for security-critical operations and does not implement secure deletion. In Section 6, we show that in the context of encrypted databases this otherwise minor oversight has dramatic implications for the (lack of) security.

Breaking confidentiality

The aforementioned ‘section 6’ shows how the information gained from these sources can compromise the confidentiality of data across a wide range of EDBs designed to work on top of existing commodity DMBSs.

For example, many schemes break if the attacker can obtain even a single token value – but the text of queries (and therefore the search token) is stored in several locations in MySQL as we just saw.

Tokens will thus be available to any realistic snapshot attacker. The consequences depend on the system. For CryptDB, Mylar, and any other system using variants of searchable encryption, a snapshot attacker can use leakage-abuse attacks to infer the query and the plaintext of any record it matches.

For the newer Lewi-Wu scheme information about partial histograms from range queries, combined with query tokens, also leaks sensitive information.

In summary, query tokens found in system snapshots enable a snapshot adversary to recovery large amounts of protected data in all existing encrypted databases.

Seabed turns out to vulnerable to a frequency analysis based attacks using information from query histograms, and Arx leaves information behind in logs that can also be used to recover sensitive information.

This all turns out to be much harder than we expected

Deploying encrypted databases on commodity DBMS’s can have unexpectedly bad consequences for security. Logs, caches, and data structures kept by DBMS’s leak information that is not accounted for in the threat modelsused by the designers of encrypted databases. Critically, today there is no such thing as a “snapshot” attacker who cannot observe past queries, workloads, and access patterns—because any realistic snapshot of the system contains this information.

What if you designed a database from the ground-up for security then? You’d still run into fundamental tensions between the need for things like caches and performance statistics (so that the DBMS performs well) and security. There are such things as history-independent data structures, however:

… whether history independence can be achieved for practical encrypted databases remains an open question. Solving it requires new research into designing and implementing databases that efficiently hide queries and access patterns.