Speeding up Web Page Loads with Shandian – Wang et al. 2016
Despite its importance and various attempts to improve page load time (PLT), the end-to-end PLT for most pages is still a few seconds on desktops and more than ten seconds on mobile devices.
Page load times are very important for user experience and translate directly into commercial results for many sites: Shopzilla increased revenue by 12% by reducing PLT from 6 seconds to 1.2 seconds, and Amazon famously found that every 100ms of increase in PLT cost them 1% in sales. By determining and prioritising precisely the resources that are needed during initial page load, Shandian is able to demonstrate significant PLT improvements and remain compatible with caching and CDN services.
We evaluate Shandian on the top 100 Alexa web pages which have been heavily optimized by other technologies. Our evaluations still show that Shandian reduces PLT by more than half with a reasonably powerful proxy server on a variety of mobile settings with varied RTT, bandwidth, CPU power, and memory… Unlike many techniques that only improve network or computation, Shandian shows consistent benefits on a variety of settings. We also find that the amount of load-time state is decreased while the total amount of traffic is increased moderately by 1%.
load event, computes the resulting state, and sends just that to the browser for the initial page load. On the browser, the state is unmarshalled and used to display the initial page. The trick of course is to do all of this with low overhead, in a manner that is compatible with existing web infrastructure, and such that it does not break subsequent web page functionality (what happens after the
load event…). Before you get too excited, Shandian does require a lightly modified client-side web browser.
A proxy server is set up to preload a web page… the preload is expected to be fast since it exploits greater compute power at the proxy server and since all the resources that would normally result in blocking transfers are locally available. When migrating state (logic that determines a Web page and the state of the page load process) to the client, the proxy server prioritizes state needed for the initial page load over state that will be used later, so as to convey critical information as fast as possible. After all the state is fully migrated, the user can interact with the page normally as if the page were loaded directly without using a proxy server.
Why are Page Loads so Slow?
While [techniques such as SPDY] are moderately effective at speeding up the individual activities corresponding to a page load, they have had limited impact in reducing overall PLT because they still communicate redundant code, stall in the presence of conflicting operations, and are constrained by the limited parallelism in the page load process.
Ideally a browser would fetch the web objects of a page fully in parallel, but this is often prevented by dependencies among web objects. Consider loading this sample page:
To understand inefficiencies in the Web page load process, we conduct a study on the top 100 Alexa pages by using Chrome (which is a highly optimized browser)…
- CSS files often contain rules that are never used in a page, or at least not used during initial page load. In the top 100 sites, 75% of CSS rules are unused in the median case. “Surprisingly, 80% and 96% of CSS rules are unused for google.com and facebook.com respectively.“
- 80% of pages have sequentially loaded Web objects on the critical path.
Capturing Page Load State
Precisely identifying the state that is needed during a page load (load-time state) is non-trivial since load-time state and post-load state are largely mingled.
The Web page rendered using Shandian also needs to be functionally equivalent to one that is computed solely on the client, therefore the server needs proper client-side state to function properly (e.g. browser size, cookies, HTML5 local storage).
CSS evaluation is also slow and should be avoided during initial page load as much as possible. The result of CSS evaluation is unfortunately often a detailed and unwieldy list of styles for each HTML element. The most expensive part is the CSS selector matching step that matches the selectors of all the CSS rules to each HTML element:
Our design decision here is to perform CSS parsing and matching on the server, but leave style computations to be performed on the client. We migrate all the inputs required by style computations as part of load-time state.
One the page load event fires in the Shandian proxy, the HTML elements in the DOM and the matched CSS rules are serialized to json.
Here’s an example of Shandian load state and page loading for the example we saw previously:
Following the initial load, all the other resources needed by the page must be made available in a way that makes the behaviour of the page identical to one loaded without Shandian.
document.write this approach can no longer work (their use is considered bad practice), and the use of Shandian is disabled for such pages.
The Shandian reverse proxy is implemented as a webserver extension based on Chrome’s content shell with most modifications to Blink a few to V8. “The client-side browser is also based on Chrome, and we modify it as little as possible.”
The evalation shows that:
- Shandian significantly improves PLT under a variety of scenarios
- Shandian does not significantly hurt data usage, and
- The amount of client-side state that needs to be transferred to the server is small.
Page load times of the Alexa top 100 websites mobile home pages using a modified Android Chrome are reduced by up to 60% in the median case.
Using local assets and Dummynet to emulate varying bandwidths and RTTs showed that Shandian is insensitive to RTT, that bandwidth is not a limiting factor of PLTs, and that varying CPU and memory has the same impact for both Shandian and Chrome.
In summary, Shandian significantly improves PLT compared to Chrome under a variety of realistic mobile scenarios. This is rare since most techniques are specific to improve one of computation and network. But Shandian improves both…
The size of the total data loaded by the browser (pre and post load) increases by 7% before compression when using Shandian, but this drops to only 1% with standard gzip compression.
Shandian is compatible with existing latency-reduction techniques with notable examples of caching and CDNs.