50 ways to leak your data: an exploration of apps’ circumvention of the Android permissions system Reardon et al., USENIX Security Symposium 2019
The problem is all inside your app, she said to me / The answer is easy if you take it logically / I’d like to help data in its struggle to be free / There must be fifty ways to leak their data.
You just slip it out the back, Jack / Make a new plan, Stan / You don’t need to be coy, Roy / Just get the data free.
Hop it on the bus, Gus / You don’t need to discuss much / Just drop off the key, Lee / And get the data free…
— Lyrics adapted from “50 ways to leave your lover” by Paul Simon (fabulous song btw., you should definitely check it out if you don’t already know it!).
This paper is a study of Android apps in the wild that leak permission protected data (identifiers which can be used for tracking, and location information), where those apps should not have been able to see such data due to a lack of granted permissions. By detecting such leakage and analysing the responsible apps, the authors uncover a number of covert and side channels in real-world use.
These deceptive practices allow developers to access users’ private data without consent, undermining user privacy and giving rise to both legal and ethical concerns.
A covert channel occurs when one application that does have the required permission to access data captures it and passes it onto another app that does not have such access through some communication mechanism. A side-channel is any mechanism that allows an app to directly obtain privileged information without having the privileges to do so (and without an accomplice!).
Side-channels are typically an unintentional consequence of a complicated system.
There are four main parts to the story here:
- The testing environment that runs apps in a ‘honeypot’ and figures out which of those apps are leaking data they don’t have permission to obtain.
- Reverse engineering of those apps to uncover the mechanisms they are using
- Creation of fingerprints based on the above analysis, which can be used to surface other apps using the same techniques
- Analysis of the results to figure out what mechanisms are being used in the wild, and how prevalent they are.
Before we dive in, I’d just like to note that the authors responsibly disclosed their findings to Google, who have announced that they will be fixing many of the identified issues in AndroidQ.
Google, to their credit, have announced that they are addressing many of the issues that we reported to them. However, these fixes will only be available to users able to upgrade to Android Q — those with the means to own a newer smartphone. This, of course, positions privacy as a luxury good…
Finding apps that leak data
Using a purpose-built Google Play scraper the team downloaded 252,864 versions of 88,113 different Android apps. Each app was then executed on a physical mobile phone equipped with a custom OS and network monitor. The apps are driven using Android’s Application Exerciser Monkey which injects a pseudo-random stream of simulated user input events into the app (a UI fuzzer).
The test mobile phones run an instrumented version of Android Marshmallow, on top of a custom Linux kernel which records all file I/O. Network traffic is also monitored, included all TLS-secured traffic where the developers hadn’t used certificate pinning (i.e., most apps).
Once the app has run in the rig for a while, the network traffic generated by the app is inspected to uncover leaked information. If information is found, this is compared to the permissions requested by the app. An app leaking data it didn’t have permission to see has presumably obtained that data by nefarious means. So far so good, but there’s a catch…
…identifying personal information inside network transmissions requires significant effort because apps and embedded third-party SDKs often use different encodings and obfuscation techniques to transmit data.
The following table shows the kinds of information the authors search for (as well as how many apps were found to leak it and by what channels).
The traffic reverse engineering includes decoding gzip, base64, and ASCII-encoded hexadecimal, looking for personal information directly, and also for MD5, SHA1, and SHA256 hashes of it. In terms of personal information, the authors focus explicitly on location data, and an identifiers that can be used for tracking purposes: IMEI, network MAC address, and router MAC address.
Finding out how those apps leak data
Given that we now have network traffic with known personal information in it, the authors group by type of information and network destination it is being sent to, and then reverse engineer one app from each group to find out the covert and side-channels used to gather the data.
Because many of the transmissions are caused by the same SDK code, we only needed to reverse engineer each unique circumvention technique: not every app, but instead for a much smaller number of unique SDKs. The destination endpoint for the network traffic typically identifies the SDK responsible.
Determining how widely the various techniques are used
The final step is to craft a unique fingerprint that identifies the presence of an exploit in an SDK (e.g. a string constant corresponding to a fixed encryption key), and then decompile all the apps in the corpus and search for those strings.
50 ways to leak your data…
So finally, the bit you’ve been waiting for! What are all those apps doing?
One covert channel in use is very simple: writing to the SD card. An app with permission to see some data writes it to a well-known file name on the SD card, and other apps that don’t have the permission can read it from there. Salmonads, “a third-party developers’ assistant platform in Greater China” uses this technique to write the IMEI to a file /sdcard/.googlex9/.xamdeco0962
. Another app using the SDK, but without permission to obtain this identifier, can read it from there. According to Google Play metadata, the lower bound on the number of times apps using this channel to obtain the IMEI without permission were downloaded is around 17.6 million.
Baidu’s Maps SDK also uses the same technique to leak the IMEI, apps found to include this data-leaking SDK include Disney ‘s Hong Kong and Shanghai theme park apps, and the Samsung Health and Browser apps. There is a lower bound of 2.6 billion installations for apps identified as containing this SDK.!! Many of these apps do have permission to read the IMEI, and they handily write it for the ~700M installations of apps using the SDK that don’t.
Unity (the cross-platform game engine) using a UNIX ioctl
call to obtain the MAC address of the WiFi network interface. 748 apps were found to be gathering and sending the MAC address in this way without holding the ACCESS_NETWORK_STATE permission.
Other apps access the router mac address either by reading the ARP cache (opening the file /proc/net/arp
and reading its content) or by simply requesting the igd.xml
(Internet gateway device configuration file) directly from the router itself.
The OpenX SDK was found to be using the ARP cache side channel, and the code makes it clear the authors knew what they were doing. No forgiveness here:
…a close analysis of the code indicated that it would first try to get the data legitimately using the permission-protected Android API; this vulnerability is only used after the app has been explicitly denied access to this data.
One new side channel was found for leaking geolocation data: EXIF metadata in images. In fact, giving access to your photos, an app can learn a lot about all your past locations too…
The behaviors that we document in this paper constitute clear privacy violations. From a legal and policy perspective, these practices are likely to be considered deceptive or otherwise unlawful.