Breaking web applications built on top of encrypted data Grubbs et al. CCS 2016
Data security and privacy is one of the big issues of our time, and uploading data to cloud services tends to put that issue into sharp focus. Do I trust this SaaS service with my data? What happens if there’s a breach? Can an adversary uncover information I’d rather be kept private? A very interesting strand of research looks at these and related problems, using something called property revealing encryption (PRE). The basic idea is that data inside the cloud service is kept encrypted with a key known only to the client, and so the data is opaque to the cloud service. However, if that’s all we did systems would be very inefficient as any operation on the data would require shipping it back to the client to be decrypted and operated upon. And search becomes pretty much impossible. This is where the property revealing part of the encryption comes in, the properties revealed should be just enough to let the server-side part of the application do its job (typically sorting, searching, sharing and so on). The class of systems built in this way are termed ‘BoPETs’ by the authors, since they Build-On-Property-revealing-EncrypTion.
One of the valuable things this paper provides is an insight into the current status of BoPET systems – what we can and can’t expect from them, and what future research and development might still be required. The discussion is anchored in the context of a particular open source BoPET implementation called Mylar, built on top of the Meteor.js web framework. Mylar uses a variation of property revealing encryption called multi-key searchable encryption (MKSE) based on the work of Popa and Zeldovich. We also get an important reminder about relying on formally (or informally) proved results – always check the (possibly implicit) assumptions and conditions giving the context in which the proof is valid. It’s easy to violate some of those when lifting the results into a real world project – at which point of course all bets are off.
The most important lessons transcend Mylar and apply generically to the entire class of BoPETs. First, we give concrete illustrations of how encryption schemes designed to be secure against snapshot passive adversaries end up being completely insecure when deployed in a system requiring security against an active adversary. Second, the natural process for porting applications – encrypt all data and adapt server operations so they can be peformed over ciphertexts – leaves metadata exposed. This reveals a lot of sensitive information even to snapshot passive adversaries. Another lesson is that BoPETs need to define realistic threat models and then develop formal cryptographic definitions that capture security in those threat models. Building a scheme first and then crafting a cryptographic model in which the scheme can be proved secure can result in schemes whose security breaks down in practice.
A quick primer on MKSE and Mylar
The unit of access control is a prinicipal, which is associated with a public/private key pair. In Mylar, documents are stored in MongoDB, and the app developer specifies which fields are to be encrypted, and with which principals. The mapping between principals, fields, and documents is up to the developer to decide based on the needs of the app.
Unencrypted confidential data exists only in users’ browsers and is ostensibly never sent to or stored on the server.
Users are also principals, and Mylar uses certificates to protect the binding between users’ identities and their keys.
Multi-key searchable encryption (MKSE) allows efficient keyword search over encrypted keywords. At the heart of this is a mechanism by which tokens can be adapted to allow searching over documents the server can’t read: for each encryption key, the client generates and sends to the server a delta value .
A user may have access to multiple principals, thus Mylar needs to support keyword search over documents encrypted with different keys. A straightforward approach is to have the user submit a separate search token for each principal it has access to, but this is expensive with many principals. For efficiency, Mylar relies on the server to generate tokens for searching over multiple principals. The user submits a single token, for documents encrypted with the user’s principal. The server then uses a client-generated delta value to convert this token to another token for documents encrypted with a different principal. Whenever a user is given access to a new principal, their client automatically sends the delta associated with that principal to the server.
Mylar claims to protect confidentiality of users’ data and queries against actively malicious servers, including servers that collude with malicious users, under the assumption that honest users never have access to any document controlled by the adversary.
- Snapshot passive attacks are one-time compromises of the server that give the attacker a complete snapshot of the server’s state at the time of the attack.
- Persistent passive attacks involve an attacker who can observe the server’s operations over a period of time. Over and above a snapshot passive attack, this means the attacker is able to use patterns of access to infer information.
- Active attacks are those where an arbitrarily malicious attacker can tamper with messages to and from clients and perform any operation on the server.
Commercial BoPETs all encrypt data before uploading to a server but are vague about their adversary models… Some academic BoPETs claim security against active (and therefore passive) attacks with the important caveat of excluding attacks based on access patterns or metadata. This restriction stems from the fact that the state-of-the-are PRE schemes upon which BoPETs are based leak this information for the sake of efficiency. This leakage may be inevitable, but we need methodologies for assessing the damage it can cause…
Difficulties in porting apps to use PRE (safely)
The only official sample shipped with Mylar was a chat application called kChat. The authors ported three additional open-source Meteor apps to use Mylar including a medical appointment app, an OpenDNA app that allows users to upload DNA sequencing information from 23andMe and similar testing services to check for risk groups, and a sample ecommerce app called MeteorShop. The porting process aimed to preserve the user experience of the original app, to follow the app’s data model unless changes are required to encrypt confidential data, and to change as few of the relationships between data structures as feasible. “We believe this process reflects what developers would do when porting their Meteor apps to Mylar.”
Deciding how to create access-control principals and which field to encrypt with which principal requires an understanding of how data is used and shared within the app. These decisions are the most subtle and critical parts of the porting process. For apps where multiple documents are encrypted with the same key, we created principals associated with the documents’ common feature. For example, all messages in a chat room should be encrypted with the same key, thus principals are associated with chat rooms. If each document is encrypted with its own key, principals correspond to individual documents. For example, each medical appointments in MDaisy has its own principal. If functionality is moved from the server to the client, the developer may need to update user permissions for the data, which is notoriously challenging. The user must be given enough permissions for the app to work correctly, without enabling him to access another user’s private data. We conjecture that many developers will struggle to make these decisions correctly and that independently developed Mylar apps will contain vulnerabilities caused by developers’ mistakes. Even the authors of Mylar made a security error when porting kChat to Mylar.
Let’s see how the natural results of the porting process stand up to the different threats.
Snapshot passive attackers can exploit metadata to try and infer information from the database. Persistent passive attackers may additionally exploit access patterns, and active attackers have even more means at their disposal.
The authors give several examples of metadata or non-encrypted data revealing information. For example, in the MDaisy medical appointment app each appointment is created by a member of the medical staff and shared with only one patient. Starting with an appointment, you can therefore find the encrypting principal, and following the principal’s wrapped keys, find out which patient(s) have access to the appointment. If an oncologist doctor has several appointments with the same patient over a number of months, you can clearly infer what the patient might be suffering with.
This leakage is inherent in any application where the semantics of inter-user relationships reveal sensitive information about users. Preventing it requires hiding the access graph from the server—a complicated feat in its own right—as well as hiding users’ interactions and data accesses.
Other examples include information in the names of principals (often chosen to be human understandable). These are not encrypted on the server, and in the kChat example, will reveal information about which chat rooms exist. Even encrypted information can leak data – the size of the encrypted data itself can be revealing (for example, patient objects are larger than medical staff objects in MDaisy). Preventing this requires padding data to a pre-defined size, but introduces large overheads.
Exploiting access patterns
In MDaisy, if different patients access the same encrypted procedure information, you can infer they are both undergoing the same treatment. Given more users and more treatments, if any single user publicly discloses their treatment then all other users undergoing the same treatment may lose their privacy.
In MeteorShop, items in the user’s cart are encrypted, as are item prices. But products added to the cart are highly likely to come from the list of products recently visited by a user.
In OpenDNA the server can learn sensitive information based on whether or not a user’s searches match.
Access patterns are very revealing…
In the searchable encryption literature, it is common to assume for simplicity that the search index is static, i.e., fixed before a security experiment begins and does not change.
MKSE’s formal model assumes a fixed corpus of documents. But in real applications documents can be added, updated, and deleted. A server can generate its own keys, use them to encrypt a document, and then give a target access to that document (sharing it with them).
A malicious server can create its own principal, for which it knows the keys. We call such principals tainted. It can then add the tainted principal’s wrapped keys to the victim’s accessInbox, and the victim’s client will automatically generate the delta for keyword searches over the tainted principal.
Given the client’s keyword search delta, a malicious server can now search all of that users private files for the given keyword. Also, for efficiency Mylar trusts the server to convert search tokens between principals, which enables the server to perform dictionary attacks on all of the user’s queries over any principal.
The problem of building client-server application frameworks that provide meaningful security against persistent passive and active attackers on the server remains open… To protect against active attacks, every essential operation performed by the server must be either executed or at least verified on the client side of the application. This runs contrary to the entire premise of BoPETs, since they aim to preserve the existing split of application functionality between clients and servers with all concomitant benefits and rely on clever encryption to protect data from untrusted servers.
It’s possible to break open safes too of course, but that doesn’t mean that companies store all of their valuable physical assets out in the open because of it. Thinking you have more security than you really do though, is definitely a dangerous thing.