Skip to content

FM Backscatter: Enabling connected cities and smart fabrics

April 28, 2017

FM Backscatter: Enabling connected cities and smart fabrics Wang et al., NSDI’17

If we want to connect all the things, then we need a means of sending and/or receiving information at each thing. These transmissions require power, and no-one wants to have to plug in chargers or keep swapping batteries for endless everyday objects. So where will this power come from? One promising approach uses ambient backscattering. Ambient backscatter devices pick up existing transmissions (e.g., TV or Wi-Fi signals) and convert them to tiny amounts of electricity, this power is then used to modify and reflect the signal with encoded data.

This paper enables connectivity on everyday objects by transforming them into FM radio stations. To do this, we show for the first time that ambient FM radio signals can be used as a signal source for backscatter communication.

It turns out that, for example, all those talk show radio programmes really are a powerful medium. In the literal sense.

The prototype system built by the authors achieved data rates of up to 3.2 kbps, and ranges of 5-60 feet, while consuming as little as 11.07 μW of power.

Requirements for communicating everyday objects

The goal is to enable communication from everyday objects in outdoor scenarios (for example, from bus stop posters, street signs, smart clothing, and so on) to smart phones and cars. Wide outdoor deployment of RFID readers would be expensive, Wi-Fi backscatter is only really useful indoors, and TV signal receivers are not included in smartphones and most cars.

What about Bluetooth Low Energy (BLE)? In broadcast mode, BLE can send short packets every 100ms, and therefore won’t work for streaming audio. In addition, the bluetooth antenna in cars is positioned inside to interact with mobile phones and other devices. The car body itself would shield such an antenna from any outside signals.

What we really want is a solution that:

  • Uses ambient signals that are already ubiquitous in outdoor environments
  • Can be easily received by cars and smartphones using existing hardware
  • Is legal to backscatter in the desired frequencies without a license (which rules out cellular transmissions)
  • Enables decoding of the raw incoming signal without any additional hardware.

The suitability of FM radio

Enter FM radio!

  • Broadcast FM radio infrastructure exists in cities around the world
  • FM radio towers transmit at high power (several hundred kilowatts), providing an ambient signal source suitable for backscattering
  • FM radio receivers are included in the LTE and Wi-Fi chipsets of almost every smartphone (and of course, are in cars)
  • In the US, low-power transmitters can operate on FM bans without requiring a license
  • FM radios provide access to the raw audio decoded by the receiver, which can be used to extract the backscattered data

c_ in one of these FM bands, and information at each time instant is encoded by a deviation from _fc_.

### Backscattering FM radio

The authors describe three mechanisms for backscattering FM radio transmissions: overlay backscatter, stereo backscatter, and cooperative backscatter.

Backscattering is a multiplicative operation in the frequency domain. Say a signal source is transmitting a tone signal \cos (2\pi f_c t) at a centre frequency f_c. For a backscatter signal with baseband B(t), we need to generate the signal B(t) \cos (2\pi f_c t).

If we pick B(t) such that it is centred at f_{back} and uses audio signal FM_{back}(\tau) then an FM radio tuned to the frequency f_c + f_{back} will output the audio signal FM_{audio}(t) + FM_{back}(t).

Thus, by picking the appropriate backscatter signal we can use the multiplicative nature of backscatter in the RF domain to produce an additive operation in the received audio signals. We call this overlay backscatter because the backscattered information is overlaid on top of existing signals and has the same structure as the underlying data.
f_{back} is chosen so that f_c + f_{back} lies at the centre of an unoccupied FM channel. There are plenty such channels available, as the following indicates:

To send audio (e.g., a music sample for a band advertised on a poster) the desired audio can be overlaid directly. To send data, audio signals are generated using modulation techniques. The desired cosine signals are approximated with a square wave alternating between +1 and -1.

These two discrete values can be created on the backscatter device by modulating the radar cross-section of an antenna to either reflect or absorb the ambient signal at the backscatter device. By changing the frequency of the resulting square wave, we can approximate a cosine signal with the desired time-varying frequencies.

If an FM radio station is broadcasting in mono, then audio or data can be backscattered in the stereo stream without interference from the audio signals in the ambient FM transmissions. To tell the receiver to pick up the stereo stream, the backscattering device also needs to generate the 19 kHz pilot signal. This is the stereo backscatter approach. There’s a variation that works well for talk shows too:

While many FM stations transmit in the stereo mode, i.e. with the 19 kHz pilot tone, in the case of news and talk radio stations, the energy in the stereo stream is often low. This is because the same human speech signal is played on both the left and right speakers… based on this observation, we can backscatter data/audio in the stereo stream with significantly less interference from the underlying FM signals.

Cooperative backscattering can give much lower error rates, but requires the cooperation of two receiving devices to create a MIMO system. One device is tuned to the original band of the FM signal ($f_c$) and the other is tuned to f_c + f_{back}. The joint information can be used to cancel the ambient FM signal and decode the backscattered signal. See section 3.3 in the paper for further details.

Data encoding

Data is encoded using frequency-shift keying (FSK). Two schemes are described, a low rate scheme (100 bps) for low signal-to-noise ratio (SNR) environments, and a higher rate (1.6 or 3.2 kbps) scheme for high SNR environments.

For the 100 bps transmission, a binary FSK scheme (2-FSK) is used where zero and one bits are represented by the frequencies 8 and 12 kHz respectively. For high bit rates, a combination of 4-FSK and frequency division multiplexing is used: sixteen frequencies between 800 Hz and 12.8 kHz are grouped into four consecutive sets, within each set 4-FSK is used to transmit two bits. Experiments showed that the bit-error rate (BER) performance degraded significantly above 3.2 kbps, so this is the maximum achievable rate for the target applications.

Implementation and evaluation

The FM backscatter design is implemented in an integrated circuit with a TSMC 65 nm LP CMOS process (see section 4 of the paper for detail). A Moto G1 smartphone with Sennheiser MM30i headphones as its antenna is used for smartphone receiving, and a 2010 Honda CRV is used for car testing.

For the smartphone, the following plot shows how SNR changes as a function of the distance between the backscatter device and the FM receiver at five different power levels. At a -50 dBm power level, the power in the backscattered signal is still reasonably high at close distances.

At 100 bps, the BER is nearly zero at distances up to 6 feet across all power levels, and for power levels greater than -60 dBm range increases to over 12 feet. At 1.6 and 3.2 kbps range is reduced, but BERs are still low up to 3 and 6 feet away at -60 and -50 dBm respectively.

For the car receiver, the system can work well up to about 60 ft.

We leverage the low power backscatter techniques described in this paper to show that posters can broadcast information directly to cars and smartphones in outdoor environments. To evaluate this, we design two poster form factor antennas…

The results showed that data could be decoded at 100 bps at distances of up to ten feet with a smartphone. An overlaid snipped of music from the band advertised on the poster could be decoded at a distance of up to 4ft. The car could detect the same signal at 10 ft.

The team also designed a smart shirt by machine sowing an antenna into it:

We perform this experiment in an outdoor environment in which the prototype antenna receives ambient radio signals at a level of -35 dBm to -40 dbm. Fig. 17b (above) compares the bit error rate when the user is standing still, running (2.2 m/s) or walking (1 m/s). The plot shows that at a bit rate of 1.6 kbps while using MRC, the BER was roughly 0.02 while standing and increases with motion. However at a lower bit rate of 100 bps, the BER was less than 0.005 even when the user was running. This demonstrates that FM radio backscatter can be used for smart fabric applications

ViewMap: Sharing private in-vehicle dashcam videos

April 27, 2017

ViewMap: Sharing private in-vehicle dashcam videos Kim et al., NSDI’17

In the world of sensor-laden connected cars that we’re rushing towards, ViewMap addresses an interesting question: how can we use the information collected by those cars for common good, without significant invasion of privacy? It raises deeper questions too about the limits of state surveillance and where we think they should be. In short, ViewMap assumes cameras in cars (e.g., DashCams) that are continuously recording their environment. Anonymous fingerprints of short video segments are uploaded to a government-controlled service which can then request video footage from the scene of a traffic accident or crime. The challenge in all of this is preserving privacy.

Motivating examples and privacy concerns

Dashcam’s are becoming increasingly popular in many parts of Asia and Europe (over 60% adoption in South Koreo for example). For countries such as South Korea, Russia, and China, the use of dashcams is an integral part of the driving experience. Other countries such as Austria and Switzerland strongly discourage dashcams due to visual privacy concerns though.

Where they exist, dashcam videos are often useful to accident investigations. Vehicles directly involved in the accident may have their own videos, but nearby vehicles may have wider views offering different angles.

Authorities such as police want to exploit the potential of dashcams because their videos, if collected, can greatly assist in the accumulation of evidence, providing a complete picture of what happened in incidents.

How do you find those videos though, and persuade the drivers of the vehicles in question to share them?

At least in the case of a traffic accident, other drivers should be aware that they’ve witnessed an accident and come forward. For general crimes though, the situation may be different:

While CCTV cameras are installed in public places, there exist a countless number of blind spots. Dashcams are ideal complements to CCTV since pervasive deployment is possible (cars are everywhere). However, the difficulty here is that users are not often aware whether or not they have such video evidence.

It’s easy to see the privacy risk when providing personal location and time-sensitive information – this could easily be used to track individuals. In addition to potential privacy leakage for participating drivers, there is also a privacy risk for other drivers and individuals that are captured in the recorded video. This latter information is of course exactly what the authorities are interested in when investigating incidents.

ViewMap system design

The ViewMap system places a Raspberry Pi powered dashcam in each vehicle, connected via bluetooth to a Galaxy S5 phone (for uploading information). A dedicated short-range communications (DSRC) radio is placed in the back of the car for vehicle-to-vehicle communication.

Number plates in the captured video are blurred in realtime before being stored. The system does not currently do any face detection and blurring. Every minute of recorded video is represented by a data structure called a View Profile (VP) that summarises time and location/trajectory, a video fingerprint, and a Bloom filter of the video fingerprints of video taken by neighbouring vehicles (exchanged over the DSRC links). Vehicles anonymously upload their View Profiles using a Tor client on the phone. The anonymised, self-contained VPs are stored in a central VP database by authorities. The dashcam system itself keeps a rolling window (typically 2-3 weeks worth of driving time) of video footage associated with these VPs). Original video footage will not be available after this time.

Video fingerprinting

Dashcams are time-synchronized using GPS and record new videos every minute on the minute. Every second, each vehicles produces and broadcasts a View Digest (VD) of the video segment it is currently recording.

The view digest at time i is a concatenation of:

  • Current time
  • Current location
  • The byte-size of the video so far for this view profile window
  • The initial location for the view profile
  • The current view profile identifier
  • A hash of the view digest from time i-1 together with the last one second of video data

When a vehicle receives a view digest broadcast it verifies that the time and location are within acceptable ranges before storing it. At most two view digests are kept per neighbour per view profile identifier – the first and the last received in the window.

Once the current minute of recording is complete, each vehicle A generates its View Profile, VP, for that minute. The VP contains the sixty View Digests for the minute, plus a Bloom filter of the first and last View Digests received from each neighbouring vehicle as above.

Protecting location privacy

As described so far, it would be possible to follow a user’s path by linking a series of VPs adjacent in space and time. To protect against this, each vehicle also uploads a number of guard (fake) VPs.

Suppose that a vehicle V accepts digests from m other vehicles during its VP window. For a configurable percentage α of these (the study uses α = 0.5), V creates guard VPs with a trajectory starting at the other vehicles initial location (as recorded in the view digest), and ending at its own (V’s) location. This can be seen visually in the following figure.

There are readily available on/offline tools that instantly return a driving route between two points on a road map. In this work, we use the Google Directions API for this purpose. In an effort to make guard VPs indistinguishable from actual VPs, we arrange their VDs variable spaced (within the predefined margin) along the given route.

Notice therefore that privacy depends on sufficiently wide adoption of the system. Using a simulation with n = 50 to 200 vehicles travelling in a 4km2 area, the authors studied the uncertainty in a vehicles position over time, measured as location entropy in bits (i.e., 8 bits of entropy means a 1 in 256 chance of guessing the true location). In the lowest density case of 50 vehicles, 3 bits of entropy (1 in 8 chance) are arrived at within 10 minutes of driving.

(The markers seem to be messed up in the figure above and the one following, but it’s enough to get the idea…)

The following figure looks at the success of a tracker in determining the true location over time (as compared to a ground truth made available for the simulation).

We see that, in the sparse case of n = 50, the tracking success ratio decreases to 0.2 before ten minutes and further drops below 0.1 before fifteen minutes. In the case without guard VPs, on the other hand, the tracking success ratio still remains above 0.9 even after twenty minutes. This result shows: (i) the privacy risk from anonymous location data in its raw form; and (ii) the privacy protection via guard VPs in the VP database.

Collecting video evidence

To gather video evidence of an incident, the system builds a series of view maps, one for each one-minute period of interest. The ViewMap system is designed to protect against attackers gaming the system by uploading fake VPs for various purposes, thus we need to establish trust in the view map. Trust is anchored in special trusted VPs. These come from authorities such as police cars.

Building a view map for a given location starts by finding the nearest trusted VP. We then have to build a chain of trust from that VP to the actual location of interest, each link in the chain is called a viewlink.

To start off with, we gather all VPs whose claimed locations at the time of interest are within the geographical area between the anchor trusted VP and the target location. For each member of this set, we find neighbour candidates with time-aligned locations within the radius of the DSRC radios. These candidates are filtered using Bloom filter membership queries to make sure they really did communicate in that time window. Edges (viewlinks) are then created to the filtered candidates. These edges signify that the two connected VPs are line-of-sight neighbours at some point in their timelines.

Given the graph formed by the above process, we’re now in a position to weed out fake VPs using a variation of the TrustRank algorithm originally developed to output a probability distribution of a person arriving at a given web page after randomly clicking on links from a given seed page.

To exploit the linkage structure of viewmaps, we adopt the TrustRank algorithm tailored for our viewmap. In our case, a trusted VP (as a trust seed) has an initial probability (called trust scores) of 1, and distributes its score to neighbor VPs divided equally among all adjacent ‘undirected’ edges (unlike out-bound links in the web model). Iterations of this process propagate the trust scores over all the VPs in a viewmap via its linkage structure.

Once legitimate VPs near a given incident have been identified, the system solicits the videos. This is done by posting ‘request for video’ announcements in a well-known place which users check. If a user has a matching video, they can upload it anonymously. (ViewMap includes an anonymous reward mechanism to encourage this behaviour – see section 5.3 in the paper for details). The uploaded video can be validated via cascading hash operations against the system-owned (anchor of trust) VP, and then reviewed by human investigators.

In a controlled experiment, various locations are selected where vehicles are either in line-of-sight (LOS),non-line-of-sight (NLOS), or mixed locations:

It can be seen from the results below that creating view link edges in the graph does indeed correspond highly with line-of-sight situations and with at least one of the two vehicles involved appearing in the other’s video.

Improving user perceived page load time using gaze

April 26, 2017

Improving user perceived page load time using gaze Kelton, Ryoo, et al., NSDI 2017

I feel like I’m stretching things a little bit including this paper in an IoT flavoured week, but it does use at least bridge from the physical world to the virtual, if only via a webcam. What’s really interesting here to me is what the paper teaches us about web page load performance. We know that faster loading pages are correlated with all sorts of user engagement improvements, but what exactly is a faster loading page? If we want faster loading pages because they drive better user engagement, then the ideal page load time metric should correspond with the way that a user perceives page load times. The most popular page load metrics, OnLoad, and Speed Index have the advantage of being easy to measure, but they it turns out they’re not always a good match with the user-perceived page load time, uPLT.

  • The OnLoad PLT metric measures the time taken for the browser OnLoad event to be fired (i.e., once all objects on the page are loaded). OnLoad tends to over-estimate page load time because users are often only interested in content ‘above the fold’ (visible without scrolling) when first waiting for a page to load. (Be warned though, with some page structures it can also under-estimate).
  • The SpeedIndex PLT metric or Above Fold Time (AFT) measures the average time for all above-the-fold content to appear on the screen. “It is estimated by first calculating the visual completeness of a page, defined as the pixel distance between the current frame and the ‘last’ frame of the Web page. The last frame is when the Web page content no longer changes. Speed Index is the weighted average of visual completeness over time. The Speed Index value is lower (and better) if the browser shows more visual content earlier.” The issue with Speed Index is that it doesn’t take into account the relative importance of content.

The authors conducted a study across 45 web pages and 100 different users using videos of pages loading to ensure that each user saw an identical experience for each page. Additional studies were conducted with simulations of different network speeds. The users were asked to press a key on the keyboard when they perceived the page to be loaded (see section 3 in the paper for full details of the study setup).

Here’s an example of how the different metrics look for the site:

And here are the overall results across all 45 web pages and 100 users:

OnLoad tends to either over-estimate (on average by 6.4 seconds) or under-estimate (by 2.1 seconds on average) when compared to true uPLT. Pages that are heavy in Javascript and/or images tend to have even larger OnLoad time gaps. Speed Index is about 3.5 seconds lower than uPLT for 87% of Web pages.

Using gaze to improve page load times

Having understood that neither OnLoad nor the Speed Index metric is a good indicator of the true user-perceived page load time, the authors turn their attention to figuring out what to focus on in order to reduce perceived page load times. Intuitively, what a user is looking at (visual attention) should tell us what is important on the page. We can track this using gaze tracking software…

Recently, advances in computer vision and machine learning have enabled low cost gaze tracking. The low cost trackers do not require custom hardware and take into account facial features, user movements, a user’s distance from the screen, and other user differences.

The study uses an off-the-shelf webcam based gaze tracker called GazePointer. A 50-user study is conducted using the GazePointer setup, across 45 web pages. An auxiliary study also used a much more expensive custom gaze tracker and confirmed that the results concur with the webcam-based solution. Each web page is divided into a set of regions, and the study tracks the regions associated with a users fixation points (i.e., when the user is focusing on something). For example, here are the visual regions for

Here’s a heat map across the 45 websites, showing where the attention budget is spent. For example, when looking at the first web site we see that 5 regions combined are fixated on by 90% of users, whereas the remaining 75% of regions are fixated on by less than half of the users.

In general, we find that across the Web pages, at least 20% of the regions have a collective fixation of 0.9 or more. We also find that on average, 25% of the regions have a collective fixation of less than 0.3, i.e., 25% of regions are viewed by less than 30% of the users.

This leads to the following hypothesis: prioritising loading the parts of a web page that hold users attention should result in faster perceived page load times.

The WebGaze system collects gaze feedback from a subset of users as they browse, and uses the gaze feedback to determine which Web objects to prioritise during page load. (Good luck getting many people to opt-in to having their gaze tracked by webcam while they’re browsing though! Let’s just assume that getting sufficient gaze feedback is possible – e.g., from internal user testing).

To identify which Web objects to prioritize, we use a simple heuristic: if a region has a collective fixation of over a prioritization threshold, then the objects in the region will be prioritized. In our evaluation, we set the prioritization threshold to be 0.7.

Each visual region may have multiple objects. The CSS bounding rectangles for all objects visible in the viewport can be obtained via the DOM. An object is said to be in a given region if its bounding rectangle intersects with the region, when an object belongs to multiple regions, it is assigned the priority of the highest priority of those regions. Having found the visible Web objects to be prioritised, the next task is to extend the set of prioritised objects to include any other objects they may depend on.

The WProf tool is used to extract dependencies. “While the contents of sites are dynamic, the dependency information has been shown to be temporally stable. Thus, dependencies can be gathered offline.

To actually implement prioritization, WebGaze uses HTTP/2’s Server Push functionality.

Server Push decouples the traditional browser architecture in which Web objects are fetched in the order in which the browser parses the page. Instead, Server Push allows the server to preemptively push objects to the browser, even when the browser did not explicitly request these objects. Server Push helps (i) by avoiding a round trip required to fetch an object, (ii) by breaking dependencies between client side parsing and network fetching, and (iii) by better leveraging HTTP/2’s multiplexing.

In some pathological cases, Server Push can make things much worse! WebGaze reverts back to the default case without optimization when this is detected. This happened with 2 of the 45 pages in the study.

Does gaze-based prioritisation actually improve perceived user page load times though? An evaluation compared WebGaze with three alternative strategies:

  • Default: the page loads as-is without prioritisation
  • Push-all: all of the objects on the Web page are pushed using Server Push
  • Klotski: the Klotksi algorithm is used to to push objects and dependencies with an objective of maximising the amount of above-the-fold content that can be delivered within 5 seconds.

Figure 11 (below) shows the CDF of the percentage improvement in uPLT compared to alternatives. On an average, WebGaze improves uPLT 17%, 12%, and 9% over Default, Push-All, and Klotski respectively. At the 95% percentile, WebGaze improves uPLT by 64%, 44%, and 39% compared to Default, Push-All, and Klotski respectively.

In about 10% of cases, WebGaze does worse than Klotski. In these cases, Klotski is sending less data than WebGaze. “This suggests we need to perform more analysis on determining the right amount of data that can be pushed without affecting performance.”

The authors note the significant security and privacy concerns with deploying gaze tracking in the wild. If you want to experiment, my recommendation would be to conduct your own (opt-in, in-the-lab) user studies using gaze tracking, and then use the information gleaned to improve Web object prioritisation for production systems.

FarmBeats: An IoT platform for data-driven agriculture

April 25, 2017

FarmBeats: An IoT platform for data-driven agriculture Vasisht et al., NSDI ’17

Today we have another pragmatic, low cost, IoT system case study. And it’s addressing a problem almost as important as cricket: how can we help to meet the burgeoning demand for food across the globe by increasing farm productivity? [Just in case British humour doesn’t translate to all cultures reading this post, yes, that’s a joke!].

… field trials have shown that techniques that use sensor measurements to vary water input across the farm at a fine granularity (precision irrigation) can increase farm productivity by as much as 45% while reducing the water intake by 35%. Similar techniques to vary other farm inputs like seeds, soil nutrients, etc. have proven to be beneficial.

Given that, why isn’t everyone doing it? Manual sensor data collection is expensive and time-consuming, but automated sensor data collection typically requires expensive cellular data loggers and accompanying subscriptions – and even then the limited data rates don’t let you upload all the data you’d really like to. Cellular coverage can also be poor on farms, and prone to weather-based outages.

What if we could use wifi out on the farm?

Ingredients for a smart farm

  • A collection of sensors for e.g., soil temperature, pH, and moisture. Also, some IP cameras may be useful to keep a remote eye on things.
  • For sensors without wifi support, some Arduinos, Particle Photons ($20 each), or NodeMCUs to add wifi capability.
  • Some weatherproof boxes to enclose the sensors
  • Some IoT base stations providing wifi across the farm – these are connected to the farmer’s home internet connection using an unlicensed TV White Spaces (TVWS) network using FCC certified Adaptrum ACRS 2 radios ($200 each).
  • A solar charging system to power the base station, comprising two 60 Watt solar panels connected to a solar charge controller and backed by four batteries.
  • An 8-port digital logger PoE switch through which the solar power is routed, giving the ability to turn individual base station components on or off.
  • A Raspberry Pi with a 64GB SD card to serve as the base station controller
  • An IoT gateway device in the farmer’s home – basically a laptop.
  • A drone for capturing imagery (e.g., DJI Phantom 2, Phantom 3, or Inspire 1).
  • A cloud platform for long-term storage and analytics – in this instance the Azure IoT suite was used.

The cost per sensor using this network approach is an order of magnitude less than existing systems.

Having assembled all of the ingredients, connect them together like this:


FarmBeats leverages recent work in unlicensed TV White Spaces (TVWS) to setup a high bandwidth link from the farmer’s home Internet connection to an IoT base station on the farm. Sensors, cameras, and drones can connect to this base station over a Wi-Fi front-end. This ensures high bandwidth connectivity within the farm.

The farm itself may not have great Internet connectivity though. Even at the best of times uploading high bandwidth drone videos may be challenging, on top of which farms are prone to weather-related network outages that may last for weeks. The solution to both challenges is a gateway device installed at the farmer’s home which can provide continuous operation for the farm network even when the uplink to the Internet is down. The gateway also performs significant local computation on raw farm data before uploading it to the cloud. The farmer can access detailed data about the farm via the gateway whenever he or she is on the farm network.


The IoT base station on the farm is powered by solar panels with battery back-up. The base station controller caches sensor data collected from the sensors and syncs it with with IoT gateway when the TVWS device is switched on. It also plans and enforces duty cycle rates depending on the current battery status and weather conditions.

The solar power output varies with the time of day and weather conditions, and the goal within a given planning period (1 day) is to consume at most the amount of power that can be harvested from the solar panels. This is not enough to power the TVWS device continually (it uses 5x the power of the base station). Duty cycle planning figures out a power budget based on estimates of the solar panel output given the weather conditions (forecasts from the OpenWeather API). This power has to be used to collect data from sensors, upload the data to the gateway over the TVWS link, and support farmers using the base station for internet connectivity via their smart phones while on the farm (this is a variable component of the power budget).

FarmBeat’s duty cycling algorithm aims to minimise the length of the largest data gaps, under the constraints of energy neutrality and variable access.

A greedy algorithm is used to determine which data to upload to the gateway when the TVWS device is powered on, and the schedule is set such that the TVWS device is not powered on when there is no data to be uploaded. The base station is also duty cycled so that it can connect to the sensors frequently enough to capture their data. The sensors themselves consume very little power compared to the base station, and are set with a duty cycle off time less than the base station transfer windows.

The following charts show the impact of power-aware duty cycling:

Imaging and bandwidth management

Aerial photography from drones is an important part of data gathering. A flight path must be planned for a drone to cover the farm in as efficient a manner as possible, and then the resulting photos must be stitched together into an orthomosaic.

To cover as much of the farm as possible in as little time as possible, the most efficient plan is to minimise the number of waypoints in the flight path since at each waypoint the drone decelerates and comes to a halt before accelerating again. Most existing commercial flight planning systems use an east-west flight path. FarmBeats instead calculates the convex hull of the area to be covered and creates a flight path that minimises waypoints. This reduced the time taken to cover an area by 26% in the FarmBeats trial deployments.

FarmBeats further saves on flight time (and hence power requirements) by taking advantage of the wind.

Since farms are large open spaces and typically very windy, we observed that quadrotors that have an asymmetric physical profile can exploit the wind either for more efficient propulsion of deceleration. Figure 3 (below) shows an example of a quad rotor (DJI Inspire 1) that has an asymmetrical profile, where its front and side are considerably different; thus, it can exploit the wind similar to sailboats.

The angle of the quadrotor with respect to the vertical axis is called yaw. During acceleration with a favourable wind behind, the longer side is turned to face the wind to benefit from wind assistance. Once at speed, the narrower side is turned to the wind to reduce air drag. Deceleration into can once again exploit air drag (and/or a headwind). Depending on wind velocity, this adaptive wind-assisted yaw control algorithm reduced flight time by up to 5%.

A 4-minute flight capturing 1080p video at 30 frames per second is almost a gigabyte of video data. Existing solutions ship the video to the cloud and create an orthomosaic there, but FarmBeats incorporates the orthomosaic video processing pipeline into the gateway to do as much processing locally as possible. This is challenging since the computation of high-resolution surface maps is both compute and memory intensive and not suitable for the farm gateway device.

We have developed a hybrid technique which combines key components from both 3D mapping and image stitching methods. On a high level, we use techniques from the aerial 3D mapping systems, just to estimate the relative position of different video frames; without computing the expensive high resolution digital surface maps. Since this process can be performed at a much lower resolution, this allows us to get rid of the harsh compute and memory requirements, while removing the inaccuracies due to non- planar nature of the farm. Once these relative positions have been computed, we can then use standard stitching software (like Microsoft ICE) to stitch together these images.

Given the orthomosaic and the sensor readings, the final challenge is to create a precision agriculture maps for the whole farm. For example, moisture maps, pH maps, and temperature maps.

A machine learning model based on probabilistic graphical models that embed Guassian processes is used to extrapolate from the sensor data points to the full territory. This model seeks to balance spatial and visual smoothness:

  • Since we are measuring physical properties of the soil and the environment, the sensor readings for locations that are nearby should be similar (spatial smoothness).
  • Areas that look similar should have similar sensor values. For example, a recently irrigated area has more moisture and hence looks darker (visual smoothness).

FarmBeats deployments have been running at two different farms for over six months, connecting to around 10 different sensor types, three different camera types, three different drone versions, and the farmers’ phones. The farmers use it for precision agriculture, animal monitoring, and storage monitoring.

Bringing IoT to sports analytics

April 24, 2017

Bringing IoT to sports analytics Gowda et al., NSDI 17

Welcome back to the summer term of #themorningpaper! To kick things off, we’ll be looking at a selection of papers from last month’s NSDI’17 conference.

We haven’t looked at an IoT paper for a while, and this one happens to be about cricket – how could I resist?! The results are potentially transferable to other ball sports – especially baseball – so if that’s more your thing the results should also be of interest. More generally, it’s an interesting case study in embedding intelligence into everyday objects.

Sports analytics is becoming big business – for broadcasters, coaches, sports pundits and the like. Typically the data for such systems comes from expensive high-quality cameras installed in stadiums.

We explore the possibility of significantly lowering this cost barrier by embedding cheap Inertial Measurement Unit (IMU) sensors and ultrawide band (UWB) radios inside balls and players’ shoes. If successful, real-time analytics should be possible anytime, anywhere. Aspiring players in local clubs could read out their own performance from their smartphone screens; school coaches could offer quantifiable feedback to their students.

A high-end camera network can cost upwards of $10,000, whereas the system developed in this paper (iBall) has an overall cost of around $250. Furthermore, it’s the first serious IoT-based effort to “accurately characterize 3D ball motion, such as trajectory, orientation, revolutions per second, etc.“.

For those less familiar with the game, this is cricket:

(I never expected to see a figure like that in an CS paper!). The main focus is on ball instrumentation during delivery – speed, flight-path, and spin are all of keen interest in cricket, but the system can also track the path of the ball once hit by the batsman, as well as the location of players on the cricket pitch. It’s still very much a prototype system (evaluated indoors only, with a ball that needs charging every 75-90 minutes, and not fully weight balanced) but there’s a lot of promise here: results from 100 different throws (we call them deliveries) achieved a median location accuracy of 8cm, and orientation errors of 11.2 degrees. Players are tracked with a median error of 1.2m, even when near the periphery of the field (we call it the boundary).

The core of the system is an instrumented ball containing an Intel Curie board with IMU (Inertial Measurement Unit) sensors and a UWB (ultrawide band) radio.

Sensor data is streamed via the UWB anchor to receiving ‘anchors’ located at each wicket:

UWB signals are exchanged between the ball and anchors to compute the ball’s range as well as the angle of arrival (AoA) from the phase differences at different antennas. The range and AoA information are combined for trajectory estimation. For spin analytics, the sensors inside the ball send out the data for off-ball processing. Players in the field can optionally wear the same IMU/UWB device (as in the ball) for 2D localization and tracking.

Let’s look next at the challenges involved in spin tracking, trajectory tracking, and player tracking.

Spin tracking

Three main spin-related metrics are of interest to Cricketers: (1) revolutions per second, (2) rotation axis, and (3) seam plane. From a sensing perspective, all these 3 metrics can be derived if the ball’s 3D orientation can be tracked over time.

You might initially think that gyroscopes and accelerometers would be great for this: but gyroscopes can only cope with spins up to about 5rps (while professional bowlers can generate spin of over 30rps), and gravity is not measured in accelerometers during free-fall. This leaves magnetometers. When the ball is in-flight, in the absence of air drag, the ball’s rotation will be restricted to a single axis throughout the flight. Furthermore, from the ball’s local reference frame, the magnetic north vector spins around some axis. This means that a magnetometer can infer both the magnitude and direction of rotation.

With air drag, the ball experiences an additional rotation along a changing axis: the ball continues to spin around the same vertical axis, but the seam plane gradually changes to lie on the vertical plane – this is called “wobble.”

The rotation of the magnetic vector traces out a cone which can be approximated from three successive measurements, this enables estimation of the local rotation (wobble).

… we model the motion of a moving cone, under the constraints that the center of the cone is moving on a quadratic path and that the cone angle is constant. We pick the best 6 parameters of this model that minimize the variance of the cone angles as measured over time.

To measure the global rotation (the spin imparted by the bowler), iBall uses sensor measurements from just before the bowler releases the ball. Here the gyroscope and accelerometer are both useful (angular velocity saturates the gyroscope only at the last few moments before ball release).

Gyroscope dead-reckoning right before ball release, combined with local rotation axis tracking during flight, together yields the time-varying orientation of the ball.

Trajectory tracking

In addition to spin, we’re also interested in the bowler’s line, length, and speed (length is the distance to the first bounce). These are all derivatives of the ball’s 3D trajectory.

Our approach to estimating 3D trajectory relies on formulating a parametric model of the trajectory, as a fusion of the time of flight (ToF) of UWB signals, angle of arrival (AoA), physics motion models, and DoP constraints (explained later). A gradient decent approach minimizes a non-linear error function, resulting in an estimate of the trajectory.

The time of the first bounce is easy to determine via the accelerometer, and forms a useful constraint on the possible trajectories (we know the ball is at height 0 when it hits the ground). The overall time of flight can be computed via the UWB radios, which offer time resolution at 15.65ns.

Briefly, the ball sends a POLL, the anchor sends back a RESPONSE, and the ball responds with a FINAL packet. Using the two round trip times, and the corresponding turn-around delays, the time of flight is computed without requiring time synchronization between the devices. Multiplied by the speed of light, this yields the ball’s range.

With only 2 anchors, both at ground level, we can’t resolve the 3D location of the ball though. We know the distance from the two anchors, and the ball could lie on any intersection of spheres with these radii.

As mentioned previously, we have the bouncing constraint to help pin down part of the flight path. A simple physics model (remember e.g. s = ut + \frac{1}{2}at^2?) further constrains the flight path. The initial ball release location is assumed to be within a 60cm cube, as a function of the bowler’s height. Given these constraints, a model is fitted using gradient descent, the median ranging error is only 3cm.

It just remains to translate range to location, unfortunately we lose accuracy here due to a phenomenon known as dilution of precision.

Ideally, the intersection of two UWB range measurements (i.e., two spheres centered at the anchors) is a circle – the ball should be at some point on this circle. In reality, ranging error causes the intersection of spheres to become 3D “tubes”. Now, when the two spheres become nearly tangential to each other, say when the ball is near the middle of two anchors, the region of intersec-tions becomes large. Fig.13(b) shows this effect. This is called DoP and seriously affects the location estimate of the ball.

In the middle of the flight, the 90th percentile error can be on the order of 80cm (median ~40cm). To account for this, errors in the minimisation function are weighted so that less importance is given to range measurements where the DoP effect is large. This brings the median error back down to 16cm once more.

Precision is further increased by incorporating angle of arrival (AoA) measurements from a MIMO receiver in the bowler side anchor (the batsman interferes with signal at the batsman’s end). See section 4.5 for details.

Player tracking

Player tracking is relatively straightforward using e.g. clip-on UWB devices. A challenge arises due to the DoP effect again when the players are in line with wicket – close to this axis, the 90th percentile location uncertainty increases to 15m. To reduce this, the accelerometer is used to detect when a player is running, with velocity estimated using Kalman Filtering. The motion model is combined with the UWB ranging estimates. “iBall applies the same techniques to the ball, which can also be tracked after it has been hit by the batsman.


Evaluation was done indoors so that 8 ViCon IR cameras could be installed on the ceiling to obtain ground truth measurements.

The authors pretended to be cricket players, and threw the ball at various speeds and spins (for a total of 100 throws).

Why only 100 I’m not sure (and it really should be a multiple of six – instead of 50 spin throws and 50 regular throws, it should be 10 overs of spin, and 10 overs of swing/seam/pace bowling… 😉 ).

For spin tracking, iBall performs very close to the ViCon ground truth when measuring the cumulative angle error (the total cumulative angle truly rotated by the ball). Across 50 deliveries, a median angular velocity error of 1.0%, and a maximum error of 3.9% is observed. ORE, or orientation error. measures the difference between the estimated and true orientation of the ball at any point in time. The median error here is 11.2 degrees, increasing with spin rate.

For trajectory tracking, iBall achieves a median location error of 8cm. Speed is estimated with a median error of 0.4m/s (0.9 mph). “Upon discussion with domain experts, we gather that this level of accuracy is valuable for coaching and analytics.”

In the x-axis, the iBall trajectory prediction error is always less than or equal to 9.9cm. This means that according the the International Cricket Association, iBall is (just) accurate enough to be used to assist in LBW decisions.

The authors are now turning their attention to baseball and frisbee…

End of term, and thank you to the ACM

April 10, 2017

We’ve reached the end of term again, and The Morning Paper will be taking a two week break to recharge my batteries and my paper backlog! We covered a lot of ground over the last few months, and I’ve selected a few highlighted papers/posts at the end of this piece to tide you over until Monday 24th April when The Morning Paper will resume normal service.

I’d like to take this opportunity to thank you all once more for reading! The Morning Paper flies in the face of fashion – I write long-form pieces, and although I try to explain the material as simply as I can, the subject matter invariably makes for dense reading at times. The blog is hosted on a WordPress site using a very basic theme (all the cool kids are on Medium I hear), and the primary distribution mechanism is an email list (how 90’s!). Despite all that, The Morning Paper mailing list passed the 10,000 subscriber mark this last quarter – it’s wonderful to know that there are so many people out there interested in this kind of material. I’d also like to thank all of the researchers whose work I get to cover – you make researching and writing The Morning Paper a joy.

While we’re on the subject of thank yous, I’d also like to say thank you to the team at the ACM who recently worked on a mechanism to provide open access to any paper from the ACM Digital Library that is covered on The Morning Paper.

I always try to select papers that are open access, which often means scrabbling around to try and find a version an author has posted on their personal site. As well as opening up new potential content for the blog (for example, the ACM Computing Surveys), being able to link to ACM DL content should hopefully provide more stable links over time. If you see a link to an ACM DL piece in the blog and you’re not an ACM DL subscriber, please don’t be put off – you should be able to click through and download the pdf. Any difficulties just let me know and I’ll look into it for you.

One last thing before we get to the selections, there are now over 550 paper write-ups on this blog! If you’ve joined recently, that means there is a ton of great research you may have missed out on. Currently the only real way to explore that backlog is browsing through the archives by month. During this Easter break, I’m going to try and get my act together with a tagging scheme so that you can more easily find papers of interest from the backlog.

In TMP publication order, here are a few edited highlights from the first three months of 2017:

(Yes ok, I had a bit of trouble choosing this time around, that was rather a long list and it was difficult even getting it down to just those picks!).

Also, don’t forget we started working through the top 100 awesome deep learning papers list, and you can find the first week of posts from that starting here, and the second week here.

See you in a couple of weeks, Adrian.

SGXIO: Generic trusted I/O path for Intel SGX

April 7, 2017

SGXIO: Generic trusted I/O path for Intel SGX Weiser & Werner, CODASPY ’17

Intel’s SGX provides hardware-secured enclaves for trusted execution of applications in an untrusted environment. Previously we’ve looked at Haven, which uses SGX in the context of cloud infrastructure, SCONE which shows how to run docker containers under SGX, and Panoply which looks at what happens when multiple applications services, each in their own enclave, need to communicate with each other (think kubernetes pod or a collection of local microservices). SGXIO provides support for generic, trusted I/O paths to protect user input and output between enclaves and I/O devices. (The generic qualifier is important here – currently SGX only works with proprietary trusted paths such as Intel’s Protected Audio Video Path).

A neat twist is that instead of the usual use cases of DRM and untrusted cloud platforms, SGXIO looks at how you might use SGX on your own laptop or desktop. The trusted IO paths could then be for example a trusted screen path, and a trusted keyboard path. Running inside the enclave would be some app working with sensitive information (the example in the paper is a banking app), and the untrusted environment is your own computer – giving protection against any malware, keystroke loggers etc. that may have compromised it. It might be that your desktop or laptop already has SGX support – check the growing hardware list here (or compile and run test-sgx.c on your own system). No luck with my 2015 Dell XPS 13 sadly.

SGXIO surpasses traditional use cases in cloud computing and digital rights management, and makes SGX technology usable for protecting user-centric, local applications against kernel-level keyloggers and likewise. It is compatible with unmodified operating systems and works on a modern commodity notebook out of the box.

Threat model

SGX enforces a minimal TCB comprising the processor and enclave code. Even the processor’s physical environment is considered potentially malicious. In the local setting considered by SGXIO we’re trying to protect the local user against a potentially compromised operating system – physical attacks are not included in the threat model.

SGXIO supports user-centric applications like confidential document viewers, login prompts, password safes, secure conferencing and secure online banking. To take latter as example, with SGXIO an online bank cannot only communicate with the user via TLS but also with the end user via trusted paths between banking app and I/O devices, as depicted in Figure 1 [below]. This means that sensitive information like login credentials, the account balance, or the transaction amount can be protected even if other software running on the user’s computer, including the OS, is infected by malware.

SGXIO Architecture

At the base level of SGXIO we find a trusted hypervisor on top of which a driver host can run one or more secure I/O drivers. The hypervisor isolates all trusted memory. It also binds user devices to drivers and ensures mutual isolation between drivers. In addition to hypervisor isolation, drivers are also executed in enclaves.

We recommend to use seL4 as a hypervisor, as it directly supports isolation of trusted memory, as well as user device binding via its capability system.

The trusted hypervisor, drivers, and a special enclave that helps with hypervisor attestation, called the trusted boot (TB) enclave, together comprise the trusted stack of SGXIO. On top of the hypervisor a virtual machine hosts the (untrusted, commodity) operating system, in which user applications are run. The hypervisor configures the MMU to strictly partition hypervisor and VM memory, preventing the OS from escaping the VM and tampering with the trusted stack.

User apps want to communicate securely with the end user. They open an encrypted communication channel to a secure I/O driver to tunnel through the untrusted OS. The driver in turn requires secure communication with a generic user I/O device, which we term user device. To achieve this, the hypervisor exclusively binds user devices to the corresponding drivers.

(Other, non-protected devices can be assigned directly to the VM).

Secure user apps process all of their sensitive data inside an enclave. Any data leaving this enclave is encrypted.

For example, the user enclave can securely communicate with the driver enclave or a remote server via encrypted channels or store sensitive data offline using SGX sealing.

(Other, non-protected apps can just run directly within the VM as usual).

Creating an encrypted channel (trusted path) between enclaves (e.g., an app enclave and a driver enclave) requires the sharing of
an encryption key. The authors achieve this by piggy-backing on the SGX local (intra-platform) attestation mechanism (see section 3.1 in this Intel white paper). One enclave (in this case the user application enclave) can generate a local attestation report designed to be passed to another enclave (in this case the driver enclave) to prove that it is running on the same platform.

The user enclave generates a local attestation report over a random salt, targeted at the driver enclave. However, instead of delivering the actual report to the driver enclave, the user enclave keeps it private and uses the report’s MAC (Message-Authentication Code) as a symmetric key. It then sends the salt and its identity to the driver enclave, which can recompute the MAC to obtain the same key.

Establishing trust from the ground up.

The previous section described how SGXIO works when everything is up and running. We also need to be able to trust the hypervisor hasn’t been tampered with in the first place, which is achieved via a trusted boot and hypervisor attestation process.

Enclaves can use local attestation to extend trust to any other enclaves in the system… enclaves [also] need confidence that the hypervisor is not compromised and binds user devices correctly to drivers. Effectively, this requires enclaves to invoke hypervisor attestation. SGXIO achieves this with the assistance of the TB enclave.

SGXIO requires a hardware TPM (Trusted Platform Module) for trusted booting. Each boot stage measures the next one in a cryptographic log inside the TPM. Measurements accumulate in a TPM Platform Configuration Register (PCR) whose final value reflects the boot process and is used to prove integrity of the hypervisor.

The TB enclave verifies the PCR value by requesting a TPM quote (cryptographic signature over the PCR value alongside a fresh nonce). Once the TB enclave has attested the hypervisor, any driver enclave can query the TB enclave to determine if the attestation succeeded.

So far so good, but we still need to defend against remote TPM cuckoo attacks:

Here, the attacker compromises the hypervisor image, which yields a wrong PCR value during trusted boot. To avoid being detected by the TB enclave, the attacker generates a valid quote on an attacker-controlled remote TPM and feeds it into the TB enclave, which successfully approves the compromised hypervisor.

To defend against such cuckoo attacks, the TB enclave also needs to identify the TPM it is talking to, by means of the TPM’s Attestation Identity Key (AIK). And how does the TB enclave get the correct value of the AIK to compare against? This part requires external provisioning… and sounds pretty fiddly in practice:

Provisioning of AIKs could be done by system integrators. One has to introduce proper measures to prevent attackers from provisioning arbitrary AIKs. For example, the TB enclave could encode a list of public keys of approved system integrators, which are allowed to provision AIKs.

To prevent an attacker creating additional enclaves under the attackers control and directing legitimate TPM quote or TB enclave approval requests to these enclaves, the hypervisor hides the TPM as well as the TB enclave from the untrusted OS.

Only the legitimate TB enclave is given access to the TPM. Thus, the TB enclave might only succeed in hypervisor attestation if it has been legitimately launched by the hypervisor. Likewise, only legitimate driver enclaves are granted access to the legitimate TB enclave by the hypervisor. A driver enclave might only get approval if it can talk to the legitimate TB enclave, which implies that the driver enclave too has been legitimately launched by the hypervisor.

User verification

After all this work, how does a user know that they’re actually talking with the correct app via a trusted path? The answer relies on presenting a secret piece of information which is shared between the user and the app. For example, once a trusted path has been established to the screen, the app can display the pre-shared secret to the user via the screen driver.

Since the attacker does not know the secret information, he cannot fake this notification.

Once more though, we have to rely on an external provisioning mechanism to get the secret in place to start with:

This approach requires provisioning secret information to a user app, which seals it for later usage. Provisioning could be done once at installation time in a safe environment, e.g., with assistance of the hypervisor, or at any time via SGX’s remote attestation feature.

One last thing

I don’t know what Intel would think about this, but…

Intel’s licensing scheme for production enclaves might be too costly for small businesses or even incompatible with the open source idea. We show how to level up debug enclaves to behave like production enclaves in our model.

All production enclaves need to be licensed by Intel, whereas unlicensed enclaves can be launched in debug mode. Once launched though, the only difference between a debug and production enclave is the presence of SGX debug instructions
The core of the idea is to intercept all SGX debug instructions in the hypervisor, so that only the trusted hypervisor itself can debug enclaves. See section 7 in the paper for the fine print…