# 8.5 Schemata and the Schema Theorem

The notion of schemata and the Schema theorem provide the theoretical background why genetic algorithms work so well in practice. The notion of schemata is fundamental to the following analysis. After introducing it the Schema theorem will be deduced.

The theoretical background of genetic algorithms is seen most clearly using the binary string representation and the classic algorithm (cf. Section 8.4.2). First the wildcard  shall match any single component of a vector. A schema is built of the alphabet of genes (the fixed components) augmented by the wildcard symbol [49,77], e.g., matches and if the alphabet is . Hence a schema represents all strings of the search space which match the schema at all components except .

The following analysis is performed for the classic genetic algorithm, whose inner loop is the following.

1. .
2. Select the new population from the old one by single string selection based on a positive fitness function.
3. Recombine by one point crossover and mutation.
4. Evaluate .
5. Repeat.
First we give some definitions. Let denote the defining length of a schema , i.e., the distance between the first and the last fixed component. It measures how compact the information contained in a schema is. For example . Let denote the order of the schema , i.e., the number of fixed components. For example . Let be the fitness of the schema  at time , i.e., the average fitness of all strings in the population that are matched by . If matches the individuals , then , where is the fitness of a single individual . Finally is the total fitness of the whole population at time , i.e., . is the size of the population at time .

In the single string selection step of the classic genetic algorithm considered, an individual has the probability of being selected. After the selection step it is expected that

strings match the schema , since matches individuals, the number of single string selections is , and the probability that an average string matched by  is selected in a single string selection is .

Introducing the average fitness of the population , this can be written as

The interpretation of this formula is that the number of individuals matching a schema  grows in each time step like the ratio of the fitness of  and the total fitness. Thus an above average schema, i.e., a schema with , will proliferate successfully in the next generation, but a below average schema will not. In the long term the formula implies that above average schemata will proliferate exponentially.

The next step after selection is recombination, i.e., crossover and mutation, where new individuals are introduced into the population. The effects of crossover are considered first. A crossover site is selected uniformly among possible sites, where is the length of the vector representing an individual. Hence is the probability that the schema  is destroyed if a certain vector (or chromosome) undergoes crossover, and thus

holds for the probability of its survival, where is the probability of crossover. is greater equal, and not equal, to the right hand side, since it may happen that a schema is conserved after splitting it in middle if both ancestors happen to contain parts of the schema at the right places.

Thus taking into account selection and crossover, the number of individuals matching a schema  can be estimated by

The final operator to be considered is mutation. Let be the probability that a single component is changed. Since mutations are independent from one another, the probability that a schema  survives a mutation is

Furthermore , since .

Hence the final estimation of is

which takes selection, crossover, and mutation in the classic genetic algorithm into account.

The Schema theorem can thus be stated as follows [40,77,13].

Theorem 8..1 (Schema Theorem)   The inequality

holds with the above notation. This means that short, low order, above average schemata receive exponentially increasing trials in subsequent generations of the classic genetic algorithm and below average schemata receive exponentially decreasing trials.

Finally it is noted that this inequality was derived for the classic genetic algorithm described at the beginning of this section and assuming that the fitness function is positive. A positive function can, however, always be achieved using a suitable transformation. This inequality gives insight how genetic algorithms work and why they can sample big search spaces very fast. Above average schemata receive exponentially increasing attention and they are combined by crossover. Furthermore mutation increases the variability of the population, but the disruptive effective of the crossover and mutation operators is not significant for short schemata.

Clemens Heitzinger 2003-05-08