Particle Swarm Optimization | the morning paper

Particle Swarm Optimization – Kennedy & Eberhart 1995

What do flocks of birds, schools of fish, ant colonies, bat echo-location, firefly flashlights, cuckoos, and flower-pollination all have in common? They’ve all inspired optimisation algorithms! I’ve been perusing Xin-She Yang’s book on Nature-Inspired Optimization Algorithms and thought it might be fun to make a mini-series out of some of the algorithms featured there. So this week we’ll be getting a little closer to nature.

Let’s kick things off with the flocking behaviour of birds and of schools of fish, the same starting point as Kennedy and Eberhart took in 1995 on their journey towards the particle swarm optimization (PSO) algorithm…

A number of scientists have created computer simulations of various interpretations of the movement of organisms in a bird flock or fish school. Notably, Reynolds and Heppner and Grenander presented simulations of bird flocking. Reynolds was intrigued by the aesthetics of bird flocking choreography, and Heppner, a zoologist, was interested in discovering the underlying rules that enabled large numbers of birds to flock synchronously, often changing direction suddenly, scattering and regrouping, etc. Both of these scientists had the insight that local processes, such as those modeled by cellular automata, might underlie the unpredictable group dynamics of bird social behavior….

Could the same rules underlie animal social behaviour, including that of humans? Sociobiologist E.O. Wilson wrote that “In theory at least, individual members of the school can profit from the discoveries and previous experience of all other members of the school during the search for food. This advantage can become decisive, outweighing the disadvantages of competition for food items, whenever the resource is unpredictably distributed in patches.” The ideal that social sharing of information offers an evolutionary advantage was a key insight that led the authors towards the discovery of PSO. To model problems such as how human groups ‘flock’ in their beliefs and attitudes, Kennedy and Eberhart kept with the movement paradigm and simply added more dimensions. And thus the PSO algorithm began as a simulation of a simplified social milieu, in which agents were thought of as collision-proof birds.

A satisfying simulation was rather quickly written, which relied on two props: nearest-neighbor velocity matching and “craziness.” A population of birds was randomly initialized with a position for each on a torus pixel grid and with Xand Y velocities. At each iteration a loop in the program determined, for each agent (a more appropriate term than bird), which other agent was its nearest neighbor, then assigned that agent’s X and Y velocities to the agent in focus. Essentially this simple rule created a synchrony of movement.

To prevent the flock quickly settling on a unanimous direction, some craziness (randomness) was added back in to the X and Y velocities. This resulted in a lifelike appearence, but the variation was wholly artificial. The next step was to get rid of this artificiality. The insight leading to the breakthrough was that birds have a goal in mind…

While the idea of a roost was intriguing, it led to another question which seemed even more stimulating. Heppner’s birds knew where their roost was, but in real life birds land on any tree or telephone wire that meets their immediate needs. Even more importantly, bird flocks land where there is food. How do they find food? Anyone who has ever put out a bird feeder knows that within hours a great number of birds will likely find it, even though they had no previous knowledge of its location, appearance,etc. It seems possible that something about the flock dynamic enables members of the flock to capitalize on one another’s knowledge…

So in the next iteration, each agent was programmed to evaluate how good its current location was. The agents remembered the coordinates of the best location they had been too. As the agent moved through the simulation space, its X and Y velocities were adjusted by a random weight that moved it towards the best position it had seen.

There was also one unnatural (or so we believe!) element – a shared global consciousness so that each bird (agent) also knew the globally best position that any one member of the flock had found. On each iteration the X and Y velocities of an agent were also adjusted by a random weight to move the agent towards the global best. p-increment and g-increment represent the overall weights given to the agent-specifc and globally best positions respectively in this algorithm.

With p-increment and g-increment set low, the flock swirled around the goal,realistically approaching it, swinging out rhythmically with subgroups synchronized,and finally “landing” on the target.

Now it came time to simplify the algorithm, reducing it to the essential components that were able to simulate the desired behaviours. The algorithm worked just as well without craziness, so it was removed. The nearest neighbour velocity matching was also removed, and the algorithm works slightly faster without it – though the overall visual effect is more like a swarm than a flock.

The variables pbest and gbest and their increments are both necessary. Conceptually pbest resembles autobiographical memory, as each individual remembers its own experience (though only one fact about it), and the velocity adjustment associated with pbest has been called “simple nostalgia”in that the individual tends to return to the place that most satisfied it in the past. On the other hand, gbest is conceptually similar to publicized knowledge, or a group norm or standard, which individuals seek to attain. In the simulations, a high value of p-increment relative to g-increment results in excessive wandering of isolated individuals through the problem space, while the reverse (relatively high g-increment) results in the flock rushing prematurely toward local minima. Approximately equal values of the two increments seem to result in the most effective search of the problem domain.

The next step was a refinement to the velocity adjustments. Rather than simply testing current position relative to the known best and randomly adjusting velocity to move towards the best, velocities were adjusted relative to their distance from the known best location.

vx[][] = vx[][] + rand()*p-increment*(pbestx[][] - presentx[][])

It was soon realized that there is no good way to guess whether p – or g-increment should be larger. Thus, these terms were also stripped out of the algorithm. The stochastic factor was multiplied by 2 to give it a mean of 1,so that agents would “overfly” the target about half the time. This version outperforms the previous versions. Further research will show whether there is an optimum value for the constant currently set at 2, whether the value should be evolved for each problem, or whether the value can be determined from some knowledge of a particular problem. The current simplified particle swarm optimizer now adjusts velocities by the following formula:

vx[][] = vx[][] + 
         2 * rand()*(pbestx[][] - presentx[][]) + 
         2 * rand()*(pbestx[][gbest] - presentx[][])

The particle swarm optimization concept adheres to all five of Milonas’s principles of swarm intelligence;

Basic to the paradigm are n-dimensional space calculations carried out over a series of time steps. The population is respondingto the quality factors pbest and gbest. The allocation of responses between pbest and gbest ensures a diversity of response. The population changes its state (mode of behavior) only when gbest changes,thus adhering to the principle of stability. The population is adaptive because it does change when gbest changes.

The trained weights found by particle swarms sometimes generalize from a training set to a test set better than the solutions found by gradient descent. “For example, on a data set representing electroencephalogram spike waveforms and false positives, a backpropagation NN achieved 89 percent correct on the test data. The particle swarm optimizer was able to train the network so as to achieve 92 percent correct. ”

The particle swarm optimizer was compared to a benchmark for genetic algorithms in Davis: the extremely non-linear Schaffer f6 function. This function is very difficult to optimize, as the highly discontinuous data surface features many 1ocal optima. The particle swarm paradigm found the global optimum each run,and appears to approximate the results reported for elementary genetic algorithms in Chapter 2 of [Davis] in terms of the number of evaluations required to reach certain performance levels.

The authors conclude:

Particle swarm optimization is an extremely simple algorithm that seems to be effective for optimizing a wide range of functions. We view it as a mid-level form of A-life or biologically derived algorithm, occupying the space in nature between evolutionary search, which requires eons, and neural processing, which occurs on the order of milliseconds.