Personal Data: Thinking Inside the Box

Personal Data: Thinking inside the box – Haddadi et al. 2015.

This paper got some coverage in the press recently with a story in the British newspaper “The Guardian” headlined “Fightback against internet giants’ stranglehold on personal data starts here.” Amongst the authors we also find Jon Crowcroft (who wrote yesterday’s paper ‘On the duality of resilience and privacy‘) and members of the Nymote project from the University of Cambridge.

We are introduced to the concept of a ‘Databox’ as something you own that captures, indexes, stores and manages data about you and data generated by you.

The purpose of a Databox is not simply to gather all your personal data into one place, but to enable controlled access to that data. By this we mean that it must be selectively queryable: users should have fine-grained control over what data are made available to third parties. More complex possibilities include supporting privacy-preserving data analytics techniques such as differential privacy and homomorphic encryption.

Making this fine-grained control practical for end-users still remains an open challenge in my view – think about cookie management in your browser for example. A Databox has much broader application than cookies though, and access could be local-only, or permission could be granted to copy selected data out to third-party services.

In a system where access is granted to process data locally but not to take copies of data, this is relatively straightforward; but in a system where data is, by default, copied out to the third-party, cooperation on their part is required to implement something like a time-to-live function for data.

In other words, we still have to trust third-parties with our data at the end of the day. It would require a forward-looking service with clear privacy statements to earn such trust. After all, as Crowcroft himself told us yesterday:

It is a truth universally acknowledged that centralized cloud services offer the opportunity to monetize vast aggregates of personal data.

(and to create fast feedback loops, and use machine learning and predictive analytics to improve the service).

Why would a cloud service voluntarily give that up? Here’s a thought experiment: imagine a service with a very upfront and honest policy that gives you a choice of either (a) free usage of the service in return for allowing the company to store and analyse your data to be used for commercial purposes such as advertising, or (b) a small monthly fee but your data remains private to you at all times. Would you pay?

A consequence of the controlled access envisioned above is that users may choose to deny third-party services (e.g., advertisers or cloud service providers) access to their data. In the simplest case this might lead to those users simply no longer being able to make use of those services. However, a more acceptable and scalable option would rather be to provide means for those services to charge the user in other ways: those who wish to pay through access to their data may do so, while those who do not may pay through more traditional monetary means. That is, the Databox must be able to ‘talk money’, enabling users to trace payments alongside data flow to and from different third-party services, available via some form of app store.

The authors put forward a second argument for why companies might be motivated to give up their data hoards: it could also act as an exposure reduction mechanism.

The Databox could also act as an exposure reduction mechanism for commercial organisations which may no longer intend to hold and control a range of private data directly (e.g., health records), and rather let the data subject take control of their sensitive information. The commercial organisation could still access and query the data as previously described. This is particularly relevant for international organisations that otherwise have to be aware of a plethora of legal frameworks. An analogy might be the way online stores use third-party payment services such as PayPal or Google Wallet to avoid the overhead of Payment Card Infrastructure compliance for processing credit card fees.

There are fundamental barriers, technical and social, that have yet to be successfully addressed before the Databox concept can take off. These include:

  • The Databox must be highly available if it is to take such a central place in our lives
  • There must be trust in the system
  • Ease of use:

Existing systems intended to help users manage their personal data have found it difficult to control the associated complexity. User preferences in this space are inherently complex: socially derived and context dependent… A successful Databox will need consistent user interaction models and will enable developers of Databox applications to make use of these models.

  • Figuring out the cost model for all stakeholders involved.

Perhaps the Databox concept can open up the potential for new business models. These could be explored in the context of an individual service (or set of related services) without necessarily needing to build the full generic Databox platform first.

Considering the churn experienced in the personal data startup space, with a number of new but typically short-lived entrants and offerings, it seems that few truly viable business models have yet been discovered in this space. Our belief is that the power of personal data can only be realised when proper consideration is given to its social character, and it can be legibly and negotiably combined with data from external sources. In this case, we might anticipate many potential business models.

The Nymote project (and its constituent components including Mirage, Irmin, and Signpost) is exploring this direction. It seeks to provide ‘lifelong control of your networked personal data.’ Well worth checking out!