It’s another cut of similar data today, but this time looking at how privacy information is leaked over time in different versions of an (Android mobile) app. You probably don’t need to read the paper to answer the question posed in the very opening sentence of the abstract: “Is mobile privacy getting better or worse over time?” Nevertheless, it is still interesting to see this quantified, to learn of the shockingly slow rate of adoption of HTTPS, and to realise that (through design or otherwise) by rotating the PII leaked by an application across a number of releases, it’s possible to build up a very complete picture of end users. I bet you won’t find that in the release notes!
…we compile a dataset that informs what information is exposed over the Internet (identifiers, locations, passwords, etc.), how it is exposed (encrypted or plaintext), and to whom that information is exposed (first or third party).
You can find the raw data and analysis online at https://recon.meddle.mobi/appversions/. Here’s the short summary:
- Privacy is worsening over time, with over 50% of apps getting worse with subsequent releases and just 26.3% getting better (the others stay constant or are highly variable).
- If you just look at a single version of an app, you miss a lot of the PII gathered by that app over time.
- HTTPS adoption is really slow – it takes about five years for at least half of the apps in the study to start using it (we leak your data, and we leak it in the clear).
- Third-party tracking is pervasive.
The study looks at 512 of the most popular Android apps over a period of 8 years:
PII (Personally identifiable information) is “information that can be used to distinguish or trace an individual’s identity.” For the purposes of this study, the authors track the following kinds of PII:
Here are the headline numbers for the type of PII leaked by applications and their APKs over time:
The Ad Id is a user-resettable id that Google required apps to use instead of persistent identifiers, starting from August 2014. This change had a noticeable effect on app behaviour:
The most commonly leaked PII types are unique identifiers (more than half of all apps leak an advertiser ID and/or hardware serial number) and locations (53.1% of apps). We nonetheless still find a substantial fraction of apps (more than 10%) leaking highly personal and security-sensitive information such as email addresses, phone numbers, and gender.
13 apps leak passwords, and 6 of those do so in plaintext!! In general, apps that leak the most PII types also send a significant fraction of their traffic (34-47%) without encryption, thus exposing PII to network eavesdroppers.
The authors then studied how privacy leaks (both the type of information leaked, and the frequency) changes over time for individual apps.
…for all but 7% of the apps in our dataset, a study using only one version is guaranteed to underestimate the PII gathered over the lifetime of the app.
Here’s a Pinterest example showing the PII leaked (and to where) across different versions of the app over time:
The plot shows that the app sends user passwords to a third party and starts leaking gender, location, advertiser ID, and GSF ID in more recent versions. In addition, the frequency of Android leaks increases by two orders of magnitude.
(The password leak was responsibly disclosed by the authors, and fixed in later version of the app not included in the study).
Across all apps, the following table summarises the frequency with which PII is leaked:
Our analysis is in part motivated by findings from Harvest, a documentary film that used ReCon to identify PII leaked over the course of a week from a woman’s phone. Specifically, her GPS location was leaked on average every two minutes by the Michaels and Jo-Ann Fabrics apps. This behavior, thankfully, was isolated to one version of the apps; however, it raises the question of how often such “mistakes” occur in app versions.
Time for a little name-and-shame: “some versions of apps leak PII once every 1 to 10 seconds during an experiment. Example apps include AccuWeather, Learn 50 Languages, Akinator the Genie FREE, and JW Library, which leak either location or unique ID or both, nearly constantly.”
And here are the top ten third-parties gathering the most data. Google is so far out in front in this table, they beat the rest of the top ten combined.
Noticeably missing from this table, but picked up in the analysis we saw yesterday, is Facebook (the #2 provider of Ad tracking services after Google in the Lumen study).
Some third-party domains track both unique identifiers and other more personal information like location, email address, and gender, which allow the domain to link individuals and personal information (including locations of interest such as home, work, etc.) to tracking identifiers. In other words, even if a third party makes a link between unique ID and a sensitive piece of personal information once, it can tie this personal information to unique ID without collecting the former in the future. This is particularly problematic for user privacy, since it erodes their ability to control how they are monitored and allows cross-app tracking.
There remains a substantial amount of traffic flowing over HTTP (i.e., unencrypted).
Apps adopt HTTPS extremely slowly, “for half of the domains, it takes over two years for only 10% of apps to adopt HTTPS; and five years for over 50% of apps.” (This is from the moment HTTPS was known to be first available for a domain).
Tracking privacy risk
Using a privacy risk score (see section VI for details) the authors track how overall privacy risk has been changing over time:
…when it comes to leaking PII and contacting third parties, apps have gotten substantially worse over time.
You can explore the privacy risks of any app in the dataset for yourself online at https://recon.meddle.mobi/appversions/.
The last word
We found that the PII shared with other parties changes over time, with the following trends: (1) overall privacy trends to worsen across versions; (2) the types of gathered PII change across versions, limiting the generalizability of single-version studies; (3) HTTPS adoption is relatively slow for mobile apps; (4) third parties not only track users pervasively, but also gather sufficient information to know what apps a user interacts with, when they do so, and where they are located when they do.