Not that CSI. CSI in this case stands for channel state information, which represents the state of a wireless channel in a signal transmission process.
WindTalker is motivated from the observation that keystrokes on mobile devices will lead to different hand coverage and finger motions, which will introduce a unique interference to the multi-path signals and can be reflected by the channel state information (CSI).
By setting up a rogue access point, determining the point in time when a user is entering a PIN (for the Alipay payment system in the demonstrated attack – the largest mobile payments company in the world), and observing the fluctuations in wifi signal, it’s possible to recover the PIN. Particularly with side-channel attacks, I usually feel a mix of “oh wow, you can do that, that’s really ingenious…” coupled with a sense of despair at just how insecure everything really is in the presence of skilled attackers. Today’s paper, as with yesterday’s, is no exception.
WindTalker is the latest in a long line of keystroke / pin inference methods. Owusu et al. demonstrated accelerometer based keystroke inference which recovers six-character passwords on smartphones. Liu et al. applied a similar idea to use a smart watch to track hand movements over a keyboard, and achieve 65% recognition accuracy. Zhu et al. Show that smartphone microphones can be used to record the unique acoustic sound of keystrokes, Liu et al. exploited smartphone audio hardware to recover 94% of keystrokes. Yue et al. demonstrated google glass/ webcam based keystroke inference with a success rate over 90%. Shukla et al.’s video-based attack breaks over 50% of PINs. WiKey uses CSI waveform patterns to distinguish keystrokes on an external keyboard. WiPass detects graphical unlock passwords. (See §8.2 for references). But WindTalker is particularly effective because it doesn’t require any access to the victim’s phone, and works with regular mobile phones, and it piggy-backs on an existing wifi connection.
The basic outline of the attack is as follows:
- Set up a public wifi hotspot, in a place where a target is likely to remain relatively stationary and within ~1.5m of the hotspot. Targeting a particular table in cafe, for example.
- When the target connects, monitor their wifi traffic. This is (hopefully!) transmitted over https, but the meta data is not encrypted, and that’s enough for WindTalker to work from. What WindTalker needs to do is determine the point in time that the target is about to enter a PIN, for example, when completing an Alipay mobile payment. It turns out an IP address is sufficient information:
To determine the sensitive input windows, WindTalker runs in a real-time fashion to collect the meta data (e.g., IP address) of the targeted sensitive mobile payment applications (e.g., Alipay). For example, in the experiment, Alipay applications will always route their data to the server of some specific IP address such as “110.75.xx.xx”. This IP address will be kept to be relatively stable for one or two weeks. With the traffic meta data, WindTalker obtains the rough start time and end point of the Sensitive Input Window via searching packets whose destination is “110.75.xx.xx”. Then WindTalker begins to analyze the corresponding CSI data in that period of time…
- During the sensitive input window WindTalker sends ICMP Echo requests to the target’s smartphone (about 800 packets a second). The phone replies with an an Echo Reply. For a 98 byte ICMP packet, at 800 packets/second the bandwidth consumption is only 78.4Kb/s which will not noticeably degrade the wifi experience of the target.
- At the end of the input window, analyse the data, and recover the PIN!
I guess that last part warrants a little further explanation…
First we need to clean up some of the noise in the data. This is achieved using a low-pass filter to remove high frequency noise, since the variations in CSI waveforms caused by finger motion lie at the low end of the spectrum. The next step is dimension reduction using PCA (Principle Component Analysis). PCA is a well-established algorithm for finding the most significant / influential components in the CSI time series.
With PCA, we can identify the most representative components influenced by the victim’s hand and finger’s movement and remove the noisy components at the same time. In our experiment, it is observed the first k = 4 components show the most significant changes in CSI streams and the rest of the components are noise.
Now that we have a cleaned up CSI waveform, it’s time to start identifying the individual finger touches within it. This is done with a three-step process:
The starting point is shown in (a) above. Another (Butterworth) filter is applied with a 10Hz cutoff to smooth the waveform (b). The CSI waveforms for individual keystrokes now need to be extracted (they’re easy to spot by eye, indicating this should not be too difficult!):
It is observed that the CSI segments during the keystroke period show a much larger variance than those happening out of the period (d). So segments are extracted where variance is greater than a pre-determined threshold (c).
Given a keystroke segment, the final stage is to work out what keystroke it represents.
As shown in Fig.7 (below), it is observed that different keystrokes will lead to different waveforms, which motivates us to choose waveform shape as the feature for keystroke classification. To compare the waveforms of different keystrokes, we adopt Dynamic Time Warping (DTW) to measure the similarity between the CSI time series of two keystrokes. However, directly using the keystroke waveforms as the classification features leads to high computational costs in the classification process since waveforms contain many data points for each keystroke. Therefore, we leverage Discrete Wavelet Transformation to compress the length of the CSI waveform.
Finally, a classifier uses the DTW distances between the input waveform and the key number waveforms in a reference dataset. The classifier chooses the key number with the minimum score as the predicted key number, but all scores are saved in order to generate password candidates.
Where do those reference key number waveforms come from? Right now, WindTalker needs a per-user pre-training session. With enough data from enough users, perhaps that won’t be required in the future. It seems a bit much to ask a target to pre-train on your system so that you can nab their password! But the pre-training required is very minimal, with just one sample of a given keystroke, recovery rate is as high as 68.3%.
In practice, the attackers have more choices to achieve the user specific training. For example, they can simply offer the user free WiFi access and, as the return, the victim should finish the online training by clicking the designated numbers. It can also mimic a Text Captchas to require the victim to input the chosen numbers. We further analyze the impact of the number of training data on recovery rate in WindTalker. Table.2 shows the recovery rate increases with the training loop increases. Even if there is only one training sample for a keystroke, WindTalker can still achieve whole recovery rate of 68.3%.
The paper concludes with a case study of cracking Alipay pins with volunteer users. The system thus knows to look for segments inside a sensitive input window with six keystrokes:
After the keystroke extraction and recognition process, WindTalker lists possible password candidates. In the example above, the top three candidates suggested by WindTalker were 773919, 773619, and 773916. The actual password was indeed 773919. Especially if an application allows multiple attempts at entering a passcode, there’s a good chance of getting the right one.