5.4.2 Simulated Annealing Algorithms

Simulated Annealing is a stochastic computational method for finding global extremums to large optimization problems. It was first proposed as an optimization technique by Kirkpatrick in 1983 [102] and Cerny in 1984 [103]. The optimization problem can be formulated as a pair of $ \left(C,f\right)$, where $ C$ describes a discrete set of configurations (i.e. parameter values) and $ f$ is the objective function that is to be optimized. The problem is then to find a set $ \left\{C_\mathrm{opt}\right\}$ such that $ f\left(c \in \left\{C_\mathrm{opt}\right\}\right)$ is optimal.

The optimization algorithm is based on a Physical Annealing analogy. Physical Annealing is a process in which a solid is first heated until all particles are randomly arranged in a liquid state, followed by a slow cooling process. At each (cooling) temperature enough time is spent for the solid to reach thermal equilibrium, where energy levels follow Boltzmann distribution. As temperature decreases the probability tends to concentrate on low energy states. Care must be taken to reach thermal equilibrium prior to decreasing the temperature. At thermal equilibrium, the probability that a system is in a macroscopic configuration $ i$ with energy $ E_i$ is given by the Boltzmann distribution

$\displaystyle p_i = \frac{\exp\left(-\frac{E_i}{kT}\right)}{S},$ (5.9)

with the set $ S$ of all possible configurations (or states)

$\displaystyle S = \sum^N_{i=1}\exp\left(-E_i/kT\right).$ (5.10)

Here $ T$ is the absolute temperature, $ k$ the Boltzmann constant, $ N$ the total number of configurations, and

$\displaystyle \sum_{i=1}^N{p_i} = 1.$ (5.11)

The behavior of a system of particles can be simulated using a stochastic relaxation technique developed by Metropolis et al [104]: Starting at time $ t$ and configuration $ q$ a candidate configuration $ r$ for the time $ t+1$ is generated randomly. The new candidate is accepted or rejected based on the difference between the energies associated with states $ q$ and $ r$. The condition for $ r$ to be accepted is determined by

$\displaystyle p = \frac{p_r}{p_q} = \exp\left(-\frac{E_r - E_q}{kT}\right) > 1.$ (5.12)

If $ p \le 1$ then $ r$ is accepted with probability $ p$. It was shown that for $ t\to\infty$, the probability that the system is in configuration $ s$ equals $ p_s$ [105]. One feature of the Metropolis algorithm is that a transition out of a local minimum is always possible at nonzero temperature. Another evenly interesting property of the algorithm is that it performs a kind of adaptive divide and conquer: Gross features of the system appear at higher temperatures, fine features develop at lower temperatures.

Optimization by Simulated Annealing involves the following preparatory steps:

  1. The analogies of the physical concept and the optimization problem must be determined. The energy function becomes a so-called cost function, the configuration of particles becomes the configuration of the parameters of the problem to optimize. Temperature is the control parameter of the optimization.

  2. An annealing schedule needs to be selected that defines a decreasing set of temperatures and also the amount of time to spend at each temperature.

  3. A way to generate and select new states must be defined.

The optimization algorithm is therefore comprised of three basic functional relationships: a probability density $ g(\vec{x})$, where $ \vec{x}$ is a $ D$-dimensional vector, the acceptance function $ h(\vec{x})$, and the annealing schedule function $ T(k)$ with the time step $ k$. The optimization itself takes place iteratively. Initially, the algorithm starts from a randomly chosen point of which the cost is computed. Next a new point gets chosen from a random number generator with a probability density $ g(\vec{x})$. In case the cost of this point is better than the cost of the other one, the new point is taken over. In case the cost is worse the point is accepted by a probability $ h(\vec{x})$. Another point is always chosen based on the best point so far. With each iteration the probabilities for large deviations from the best point and for acceptance decrease. This results in a behavior where distant points are explored at the beginning (high temperature), but not generated or rejected respectively as the temperature cools down.

For the standard Boltzmann annealing $ g(\vec{x})$, $ h(\vec{x})$ and $ T(k)$ are given by

$\displaystyle g(\vec{x})$ $\displaystyle =$ $\displaystyle (2\pi T)^{-\frac{D}{2}}\exp\left(
-\frac{\left(\Delta{\vec{x}}\right)^2}{2T}\right),$ (5.13)
$\displaystyle h(\vec{x})$ $\displaystyle =$ $\displaystyle \frac{1}{1+\exp\left(
\frac{E_{k+1}-E_k}{T}\right)},$ (5.14)
$\displaystyle T(k)$ $\displaystyle =$ $\displaystyle \frac{T_0}{\ln{k}}$ (5.15)

with the deviation $ \Delta{\vec{x}} = \vec{x}-\vec{x}_0$ of the new state from the previous one. Fig. 5.14 and Fig. 5.15 depict a plot of the probability density $ g(\vec{x})$ and of the acceptance function $ h(\vec{x})$ respectively for a one dimensional optimization problem.
Figure 5.14: Generation probability density. This plot shows $ g(\vec{x})$ for a one dimensional optimization problem. $ T$ depicts the normalized temperature. For low temperatures the generation probability for points with large distances $ \mathrm delta~x$ decreases rapidly.
\begin{figure}\centering\psfig{file=pics/siman-g, width=0.7\linewidth}\par\end{figure}

Figure 5.15: Acceptance function. This plot shows the acceptance probability $ h(\vec{x})$. For low temperatures the probability for a distant state to get accepted is very low.
\begin{figure}\centering\psfig{file=pics/siman-h, width=0.7\linewidth}\par\end{figure}

Several different annealing schemes than the one defined in the original algorithm have evolved in the past. A fixed schedule which is characterized by a fixed initial temperature, a constant temperature decrement, and a constant number of steps at each temperature is usually not practically applicable to general optimization problems since it requires several tests with different parameter values. Adaptive algorithms, which change their parameters during the evolution of the cost function are more appropriate. An adaptive algorithm was presented by Huang [106]. Here the parameters are chosen according to the mean value and standard deviation of the cost function. Also, in case the evaluation of the cost function itself is computationally expensive, a parallelized version of the Simulated Annealing algorithm is preferred over a sequential one.