Analysis of join-the-shortest-queue routing

Analysis of join-the-shortest queue routing for web server farms – Gupter et al 2007

What’s the best way to balance web requests across a set of servers? Round-robin is the simple algorithm that everyone knows best, but there is a better way… This paper analyzes the Join the Shortest Queue (JSQ) routing policy and shows that it delivers near-optimal results.

It is assumed that the load balancer (dispatcher) immediately routes requests to one of the servers in the farm, and that servers equally split their capacity over the requests they are processing. This is known a a processor sharing (PS) scheduling policy.

We are thus interested in an PS server farm with immediate dispatch.

Although the paper considers the serving of static web pages, the immediate dispatch and PS server policy are equally applicable in the case of web apps and REST APIs etc. The JSQ routing policy is widely used in commercial dispatchers.

Under JSQ, an incoming request is routed to the server with the least number of unfinished requests. Thus, JSQ strives to balance load across the servers reducing the probability of one server having several jobs while another server sits idle. From the point of view of a new arrival it is a greedy policy for the case of PS servers, because the arrival would prefer sharing with as few jobs as possible.

Gupter et al. build a queueing model conditioned by the average request arrival rate (assume a Poisson distribution), number of servers, and (mean) time taken by a job to run on a server in isolation. In the typical queueing theory description, this is an M/G/K/JSQ/PS model : Poisson arrival rate M, General distribution of service processing times, K servers, JSQ dispatching, and PS serving.

Despite the ubiquity of JSQ/PS server farms, no-one has yet analyzed the performance of JSQ in this setting.

The full analysis is heavy reading at points, but thankfully the key results are all summarized for us. Firstly it is shown that a single queue approximation (focusing on what happens at just one of the servers) can be used to understand the behaviour of the system as a whole, and furthermore that this approximation is exact if job-size distribution is exponential.

Does the behaviour of the system depend on the distribution of job sizes (I.e. the mix of long and short time to service requests)?

The JSQ/PS system shows near insensitivity to the variability of the job-size distribution… This is a non-trivial result since very similar routing policies for PS server farms, like Least-Work-Left (sending the job to the host with the least total work), or Round-Robin, are highly sensitive to the job-size distribution.

The model was tested against extensive simulations and found to be highly accurate:

…our analytical approximation method is always within 2.5% of simulation estimates for mean queue length and response time, under all job-size distributions examined.

(which also means you could use this model for capacity planning and/or response time estimations).

So JSQ has some very nice properties, but is it actually a good choice for your load-balancing algorithm?

We show, via simulation, that it is unlikely there is a routing policy that outperforms JSQ by more than about 10%

So there you have it: easy to implement, stable under a wide variety of request distributions, useful for modelling, and near-optimal. If you need to implement a load-balancing function for PS servers, JSQ is the way to go…