8.5 Schemata and the Schema Theorem

The notion of schemata and the Schema theorem provide the theoretical background why genetic algorithms work so well in practice. The notion of schemata is fundamental to the following analysis. After introducing it the Schema theorem will be deduced.

The theoretical background of genetic algorithms is seen most clearly using the binary string representation and the classic algorithm (cf. Section 8.4.2). First the wildcard $ \star$ shall match any single component of a vector. A schema is built of the alphabet of genes (the fixed components) augmented by the wildcard symbol [49,77], e.g., $ (0,1,\star)$ matches $ (0,1,0)$ and $ (0,1,1)$ if the alphabet is $ \{0,1\}$. Hence a schema represents all strings of the search space which match the schema at all components except $ \star$.

The following analysis is performed for the classic genetic algorithm, whose inner loop is the following.

  1. $ t:=t+1$.
  2. Select the new population $ P(t)$ from the old one $ P(t-1)$ by single string selection based on a positive fitness function.
  3. Recombine $ P(t)$ by one point crossover and mutation.
  4. Evaluate $ P(t)$.
  5. Repeat.
First we give some definitions. Let $ \delta(S)$ denote the defining length of a schema $ S$, i.e., the distance between the first and the last fixed component. It measures how compact the information contained in a schema is. For example $ \delta\bigl((\star,\star,0,1,\star,1,0)\bigr)=4$. Let $ o(S)$ denote the order of the schema $ S$, i.e., the number of fixed components. For example $ o\bigl((\star,\star,0,1,\star,1,0)\bigr)=4$. Let $ \mathrm{eval}(S,t)$ be the fitness of the schema $ S$ at time $ t$, i.e., the average fitness of all strings in the population that are matched by $ S$. If $ S$ matches the $ n$ individuals $ \{ v_1, \ldots, v_n \}$, then $ \mathrm{eval}(S,t) := (1/k) \sum_{k=1}^n \mathrm{eval}(v_k)$, where $ \mathrm{eval}(v_k)$ is the fitness of a single individual $ v_k$. Finally $ F(t)$ is the total fitness of the whole population at time $ t$, i.e., $ F(t) :=
\sum_k \mathrm{eval}(v_k)$. $ N(t)$ is the size of the population at time $ t$.

In the single string selection step of the classic genetic algorithm considered, an individual $ v_k$ has the probability $ \mathrm{eval}(v_k)/F(t)$ of being selected. After the selection step it is expected that

$\displaystyle \xi(S,t+1) = \xi(S,t) \cdot N(t) \cdot \frac{\mathrm{eval}(S,t)}{F(t)}
$

strings match the schema $ S$, since $ S$ matches $ \xi(S,t)$ individuals, the number of single string selections is $ N(t)$, and the probability that an average string matched by $ S$ is selected in a single string selection is $ \mathrm{eval}(S,t)/F(t)$.

Introducing the average fitness of the population $ \overline{F}(t) :=
F(t)/N(t)$, this can be written as

$\displaystyle \xi(S,t+1) = \xi(S,t) \cdot \frac{\mathrm{eval}(S,t)}{\overline{F}(t)}.
$

The interpretation of this formula is that the number of individuals matching a schema $ S$ grows in each time step like the ratio of the fitness of $ S$ and the total fitness. Thus an above average schema, i.e., a schema with $ \mathrm{eval}(S,t) > \overline{F}(t)$, will proliferate successfully in the next generation, but a below average schema will not. In the long term the formula implies that above average schemata will proliferate exponentially.

The next step after selection is recombination, i.e., crossover and mutation, where new individuals are introduced into the population. The effects of crossover are considered first. A crossover site is selected uniformly among $ m-1$ possible sites, where $ m$ is the length of the vector representing an individual. Hence $ \delta(S) / (m-1)$ is the probability that the schema $ S$ is destroyed if a certain vector (or chromosome) undergoes crossover, and thus

$\displaystyle p_{cs}(S) \ge 1 - p_c \frac{\delta(S)}{m-1}
$

holds for the probability $ p_{cs}(S)$ of its survival, where $ p_c$ is the probability of crossover. $ p_{cs}(S)$ is greater equal, and not equal, to the right hand side, since it may happen that a schema is conserved after splitting it in middle if both ancestors happen to contain parts of the schema at the right places.

Thus taking into account selection and crossover, the number of individuals matching a schema $ S$ can be estimated by

$\displaystyle \xi(S,t+1) \ge \xi(S,t) \cdot \frac{\mathrm{eval}(S,t)}{\overline{F}(t)} \cdot
\left( 1 - p_c \frac{\delta(S)}{m-1} \right).
$

The final operator to be considered is mutation. Let $ p_m$ be the probability that a single component is changed. Since mutations are independent from one another, the probability that a schema $ S$ survives a mutation is

$\displaystyle p_{ms}(S) = (1-p_m)^{o(S)}.
$

Furthermore $ p_{ms} \approx 1 - o(S)p_m$, since $ p_m \ll 1$.

Hence the final estimation of $ \xi(S,t+1)$ is

$\displaystyle \xi(S,t+1)
\gtrapprox
\xi(S,t) \cdot \frac{\mathrm{eval}(S,t)}{F(t)} \cdot
\left( 1 - p_c \frac{\delta(S)}{m-1} - o(S)p_m \right),
$

which takes selection, crossover, and mutation in the classic genetic algorithm into account.

The Schema theorem can thus be stated as follows [40,77,13].

Theorem 8..1 (Schema Theorem)   The inequality

$\displaystyle \xi(S,t+1)
\gtrapprox
\xi(S,t) \cdot \frac{\mathrm{eval}(S,t)}{F(t)} \cdot
\left( 1 - p_c \frac{\delta(S)}{m-1} - o(S)p_m \right),
$

holds with the above notation. This means that short, low order, above average schemata receive exponentially increasing trials in subsequent generations of the classic genetic algorithm and below average schemata receive exponentially decreasing trials.

Finally it is noted that this inequality was derived for the classic genetic algorithm described at the beginning of this section and assuming that the fitness function is positive. A positive function can, however, always be achieved using a suitable transformation. This inequality gives insight how genetic algorithms work and why they can sample big search spaces very fast. Above average schemata receive exponentially increasing attention and they are combined by crossover. Furthermore mutation increases the variability of the population, but the disruptive effective of the crossover and mutation operators is not significant for short schemata.

Clemens Heitzinger 2003-05-08