Rev.Acad.Canar.Cienc., V (Núm. 1), 95-110 (1993)
G·enetic algorithms
applied to recognition problems
H. Van Hove
A. Verschoren *
University of Antwerp, RUCA
Department of Mathematics and Computer Science
Groenenborgerlaan 171
B-2020 Antwerp, Belgium
l. Introduction
Visual information usually contains a large amount of redundant data. For
example, if we represent the numbers O to 9 by 6 X 9 bitmaps (printer characters,
say), then we have, in principie, to consider 54 individual bits; in
order to recognize a single character. In this way, we clearly perform too
much work, since these 54 bits describe 254 "" 1016 possible bitmaps, whereas
we are only interested in the much smaller set representing {O, ... , 9}. A
better strategy thus consists in trying to discover significant bits that permit
to differentiate between these bitmaps ( triggers). The recognition procedure
for a character will then consist in testing these particular bits. Checking
whether a specific trigger is "on" or "off" may then lead, for example, to
the conclusion that the character to recognize belongs to {O, 1, 3, 7} or to
{2, 4, 5, 6, 8, 9}, say. Within these classes other triggers may then be used in
order to differentiate between O and 3 or 1 and 7, for example. It is clear that
the recognition procedure for bitmaps may thus be modeled into the form of
a binary tree, where each node will correspond to a test on a particular bit.
*This text is an expanded version of a talk given by the second author at the University
of La Laguna. He wishes to thank the organizers far their warm hospitality.
95
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
Somewhat more formally, let us start from a collection of objects S, the
problem space (for example 1024 x 1024 bitmaps), endowed with a family
of maps f; : S ---+ {O, 1 }; the feature maps (for example, the individual bits
for these bitmaps or a subset of these). Consider also a so-called test space
T ~ S, whose objects w~ want to recognize (a particular sample of 1024x1024
bitmaps; for example). We want the objects in T to be determined by the
values of the triggers f;, i.e, we want for any t', t" E T that f;(t') = f;(t") for
all i implies t' = t".
The recognition problem may then naively be formulated as: find a minimal
collection of feature maps amongst the f;, which still permits to differentiate
between the objects in T. Of course, the term "minimal" is rather ambiguous
here. lndeed, we may refer to tbe number of triggers needed or the number
of tests to be actually performed, but also to the cost of performing these
tests ( calculation time or implementation difficulty). Moreover, the evaluations
may be mutually dependent or may have to be performed in a specific
order. It is thus clear that we should work directly with so-called recognition
trees, which permit to recognize the objects in T. More precisely, we will
be interested in binary trees of the forro below, whose nodes correspond to
certain triggers ( or tests), whose edges are labeled by { +, - } ( expressing the
outcome of the test corresponding to the node from where they start) and
whose final nodes, labeled by *• should correspond bijectively to the objects
to be recognized.
P1
+/ '\.-
* P2
+/ '\.-
* p3
+/ '\.-
* *
We assume to be given an externally defined fitness function, whose value
expresses the overall value of the recognition tree with respect to the particular
problem (e.g., the number of objects in T which are actually recognized
by the tree).
We will show in this text how a suitable version of genetic algorithms yields
a solution to this problem, i.e., allows to construct recognition trees which
behave optimally with respect to the objective fitness function. For a more
96
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
detailed treatment and other applications, we refer to [9, 10, 11, 12] and work
in progress.
2. Genetic Algorithms
(2.1) Genetic algorithms, introduced by J. Holland [5], are search algorithms
based upan the mechanism of natural selection, as present in the
general principies of Darwinism and genetics. In particular, they simulate
ideas like "survival of the fittest", "mutation" and "structured information
exchange". We refer to [4] for a survey of sorne of the domains, where they
are succesfully applied.
The power of genetic algorithms resides in the following characteristics:
• genetic algorithms do not work directly upan the parameters of a problem,
but on a suitable en~oding of these;
• genetic algorithms do not work with a single object, but with a population
of objects;
• genetic algorithms are "blind"; they do not work with special, additional
information or characteristics, but only with direct information,
given by the evaluation of an ( objective!) fitness function;
• genetic algorithms do not work deterministically, but use randomized
(probabilistic) transition rules.
To illustrate this, consider the "problem" of maximizing f( x) = x2 on the
interval [O, 31]. lnstead of using traditional techniques (like hill-climbing and
gradient methods ), all of which are non-robust, one encades the parameter
space [O, 31] by using 5-digit binary numbers and one works afterwards on
the individual bits. One then starts from a random population of 5-digit
binary strings, which is modified through the probabilistic genetic operators
described below. Working with populations implies an implicit form of
parallelism, which allows to find global extrema instead of local ones, thus
eliminating one of the main pitfalls of classical algorithms.
(2.2) The simple genetic algorithm uses three elementary operators: reproduction,
crossover and mutation. If we restrict ourselves to these ·operators,
97
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
the algorithm proceeds as follows: Start from a (random) population A(O)
of N binary strings of length e, i.e., strings only consisting of O and l. We
denote the set of these (there are 2t of them) by Re . Each string A; E A(O)
is given a fitness value f;, determined by the fitness map f : Rt --t R+,
which describes the "quality" or "strength" of each string, depending upon
the problem to be solved.
During the reproduction phase, the algorithm selects strings in A(O), where
the probability of a particular string being selected is proportional to its
fitness. Reproduction thus obviously mimicks the "survival of the fittest"
principle.
To each pair of selected strings, one then applies crossover with a certain
probability Pe, by choosing randomly a crossover site 1 :S k :S e - 1, cutting
both strings into two parts at this site and interchanging the pieces thus
obtained. For example, if we start from the strings
A1 : 1100 l 101
A2 : 0110 l 110
where the crossover site (k = 4) is indicated by a vertical line (I), then we
obtain, after crossover, two new strings
A;:ll00110
A~ : 0110101
After this process, individual bits ("genes") are mutated (O --t 1, 1 --t O), with
a very small probability, say Pm = 0.001. The latter operation is introduced in
order to really induce changes, not inherently present within the components
of the initial population and to reintroduce properties that got lost while
applying the other operators.
We thus obtain a new population, denoted by A(l), on which the genetic
algorithm may be applied again. For a precise description of the convergence
features of this algorithm, we refer to [4, 5, et al].
(2.3) Given a population, what kind of information is contained within the
strings in this population, that permits (through the fitness function) to
apply directed search towards optima! values? If we consider the following
example:
98
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
string fitness
01101 169
11000 576
º~ººº 64
10011 361
(which corresponds to the example f: {O, ... ,31}-+ R: x r-+ x 2 , of course),
then we obviously observe that strings starting with 1 have the highest fitness
and that, of these, strings starting with 11 score better than those starting
with 10. We thus see that individual strings possess local information (contained
in the fitness of that string), while a population of strings possesses
global information ( the fitness of the composing strings, combined with structural
information).
This structural, global information may be described through the use of
schemata. A length l schema ( corresponding to a population of length l
strings) is a length C string consisting of the symbols O, 1 and O, where the
latter should be viewed as a "wild card" ( don't know, don't care) symbol.
We say that a length l string A satisfies the schema H ( or that H represents
the string A), and we denote this by H -+ A, if A and H coincide whenever
possible, i.e., in these places where H has a symbol different from D.
For example, it is clear that
0100
0110
1100
1110
If H has order o(H), i.e., if H has exactly o(H) bits different from O, then
there are exactly 2t-o(H) strings A with H -+ A. Let us call defining length
of a schema ( denoted by 8 ( H)) the distan ce between the first and the last
significant bit of H . One may then show, cf. [4, 5], that schemata with high
fitness (i.e., schemata which represent strings with fitness above average), low
order and low defining length (which are usually referred to as buil<f,ing blocks)
will receive .an essentially exponentially increasing number of representatives
throughout the successive generations produced by the genetic algorithm.
Without going into numerical or combinatorical details, let us just point out
the following example. Consider the schemata
99
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
string o(H) ó(H)
H1: 1 o o o o 2 4
H2: o o 1 1 o 2 1
H3: D 1 o l o 4 3
Applying crossover on strings represented by these schemata, it is intuitively
clear that those represented by H1 and H3 have highest probability of being
destroyed (i.e., applying crossover to a string satisfying the schema will not
necessarily yield a new string represented by it ). The reason for this comes
from the fact that o(H3 ) and 8(H1 ) are "high". Somewhat more precisely,
for any schema H, let us denote by m(H, t) the number of representatives of
H within the population A(t). The influence of the genetic operators may
then be descri bed by
J(H) ó(H)
m(H, t + 1) 2 m(H, t)¡(l - Pe f, _ l - o(H)pm),
where J(H) is the average fitness of the strings represented by H and] is the
average fitness of the whole population A(t). From this the above Schema
Theorem follows easily.
3. Recognition trees
(3.1) Consider a problem space S endowed with a family of feature. maps
Ji, ... , fr : S -t {O, 1 }. A pattern over S is a string p = a1 ... ar, where a; E
{O, 1, O} should be viewed as the value (if determined) of the corresponding
feature map f;. We say that s E S satisfies the pattern p = a1 ... ar if
f;(s) =a; for all l :Si :Sr with a;# D. We write T(p,s) = + or T(p,s) = -
depending on whether s satisfies p or not. A recognition tree of width n is
a binary tree with n terminal nodes (labeled by * ), whose other nodes are
labeled by patterns and whose edge pairs are labeled by { +, - }. We denote
the width of a recognition tree A by w(A).
100
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
Example (n = 4)
P1
+/ '\.-
* P2
+/ '\.-
p3 *
+/ '\.-
* *
For each non-terminal node, labeled by the pattern p say, the edge labeled
by + resp. - starting in this node will be referred to as the left resp. right
edge starting in p.
(3.2) Recognition trees may be encoded as follows. First, if A is a 1-
recognition tree, i.e., if T has just one (final!) node, then we put Ene( A) = *·
If A is a 2-recognition tree, then it is of the form
* *
and we encocle it as Ene( A) = (p * *). In general, if A is an n-recognition
tree with n 2: 2, say with top labeled by sorne pattern p, then the left and
right edge starting in p define "left" and "right" subtrees A+ resp. A_ of A.
In a picture:
P,/,_
___'\_.,
A_
We then define (recursively) Enc(A) = (p Enc(A+) Enc(A_)) .
For example, the 4-recognition tree A given by
PI
+/ '\.-
* P2
+/ '\.-
p3 *
+/ '\.-
* *
101
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
is encodcd a~ Ene( A) = (711 * (¡12(P:i* * )*) ). F:<tch (non-terminal) nodc (labeled
by thc pattern) p i11 a trPe dctl'rmincs a subtrce Ap with top node p. We
cal] Enc(J\p) the block determi11Pd by p. Por example, in the previous tree,
(pi.(p3 * * )*) is the block dctermi11Pd by p2 •
(3.3) lf T <::::Sin a test space, then we say that a recognitio11 tree A (with
w(A) =I T 1) recognizes T if the objects t ET correspo11d bijcctively to the
final nodes of A ancl if they rf'alizc cach test in the unique path from the top
of A corrcsponcling to llw terminal node corresponcling to t.
Por example, if t E T sho11ld corrcspond to the terminal node as indicated in
the figure below:
then we want:
* 112
+/ "'-
]J3 *
+/ "'-
* * <-'> t
r(71 1,1) = -
r(pz, l) = +
r(p3, l) = -
4. Genetic operators
( 4.1) Let us start from a population A(O ) of ( n- )recognition trees and assume
we are given a fitnr·ss f1mclion f, which to each recognition tree A
associates its fitness J(A). This fitness function will usually consist of a local
component (depending on the patterns occurring in the recognition tree A)
and a global component ( clepending mainly on the structure of A itself).
Por example, if we want to find "optima]" recognition trees in the problem
of recognizing a test space T <:::: S as in the previous section, then a typical
fitness function coulcl be given by lett ing
f(A) =ne E( A)+ ar(I T 1 -r(A, T))
102
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
for each recognition tree A. Here E(A) denotes the average (or total) cost of
evaluating the tests corresponding to the patterns in the tree, while r(A, T)
denotes the number of elements in T recognized by A. The constants O:'e
and O:'r are user defined parameters, which should be calibrated in terms of
the user's desiderata (what do we prefer: recognizing a maximal number of
objects in T or minimizing the global cost of the recognition process?).
Let us now introduce, as in [10], operators to be used by the genetic algorithm.
( 4.2) Reproduction. The reproduction operator in A(O) and its successive
generations works in exactly the same way as for the ordinary genetic
algorithm. So, recognition trees will be selected with a probability proportional
with their fitness. If we denote this probability by PA, then
f(A)
PA =
LBE.A(O) J(B).
Again, the argument for this resides in the fact that recognition trees with
high fitness are likely to produce offspring of at least equal (and possibly
higher) fitness. This selection procedure is usually implemented by using a
"roulette model" , i.e., one views the recognition trees as being distributed
over a roulette, each one occupying a sector of surface proportional to its
fitness.
( 4 .3) Crossover. The crossover operator is the analogue for recognition
trees of the usual crossover for strings. For any node (labeled by a pattern)
p in a recognition tree A, one denotes by nA(P) the width of the subtree
of A with top p. If two recognition trees A and A' are selected (through
reproduction, for example ), then one selects in A randomly a non-terminal
node. If nA1(p1 ) i= nA(P) for ali nodes p' in A', crossover <loes nothing.
Otherwise, it selects p' randomly with nA (p') = nA (p). One then interchanges
in A and A' the subtrees Ap and A~, with top p resp. p' and thus obtains two
new recognition trees B and B'.
103
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
So, in a picture, starting from:
yields
PI
+¿ \.-
/* P2
+¿ \.-
* p3
+¿ \.-
*
PI
+¿ \.-
*
* p~ +
+¿ \.-
p; *
+¿ \.-
* *
+
p~
+¿ \.-
p~ *
+¿ \.-
p; *
+¿ \.-
* *
p~
+¿ \.-
P2 *
+¿ \.-
* p3
+¿ \.-
* *
The encoding Enc(B) of B is found from that of A by exchanging the block
defined by pin Enc(A) by the block de:fined by p' in Enc(A').
Example: If Ene( A) = (PI* (p2 * (p3 * *))) and Ene( A') = (p~ (p~(p; ** )* )* ),
then (if crossover is performed with respect to p2 and p~) we obtain new
recognition trees B and B' with Enc(B) = (PI *(P~(P;**)*)) and Enc(B') =
(p~ (P2 * (p3 * *)) *).
( 4.4) Switch. The switch operator for recognition trees may be viewed as
a global analogue of the mutation operator on strings. It depends on the
choice of a node pin a recognition tree A. If pis a terminal node, the switch
operator <loes nothing. Otherwise, it interchanges the left and right edges
starting in p, in order to produce a new recogniton tree B.
104
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
In a picture:
P1 p~
+/'\.- +/'\.-
P2 * ::::} p~ *
+ / \. - +/ '\.-
* p3 p; *
+/ '\.- +/'\.- ,
* * * *
The encoding Ene(B) of B may be found from that of A by interchanging
in Ene(A) the second and third component in the block determined by p.
Example : If A is encoded by Ene(A) = (p1(p2 * (p3 * *))*) then (if switch
is applied in P2), we find for B the encoding Ene(B) = (p1(P2(p3 * *)*)*).
( 4.5) Translocation. The translocation operator for recognition trees may
be viewed a an "auto-crossover" operator; it should be compared to linear inversion
[3) and translocation [8) for the ordinary genetic algorithm on strings.
It depenás on the choice of a node pin a recognition tree A. If pis a terminal
node, then the translocation operator does nothing. Otherwise, it chooses
randomly a terminal node in A which does not belong to the subtree Ap of A
with top p and it interchanges this terminal node and Ap, and thus producing
a new recognition tree B.
In a picture:
PI
+/'\.-
P2 p3
+/ '\.-+/ '\.-
* * * *
::::}
p~
+/ '\.-
* p3
+/ '\.-
P2 *
+/ '\.-
* *
The encoding Ene( B) of B may be found from that of A by interchanging
in Ene( A) sorne * and the block determined by p.
Exam ple : If Ene( A) = (p1 (p2 * *) (p3 * *)) and if we apply translocation
as in the previous example, then we obtain a new recognition tree B with
Ene(B) = (P1 * (p3(p2 * *)*)).
105
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
( 4.6) Micro-operators. The main purpose of the previous operators is to
modify the global structure of recognition trees, but they do not influence
their contents ( the pat terns w hich la bel their no des). In order to remedy
this, one introduces rpicro~operators like micro-crossover and micro-mutation
which act directly on the nodes of the recognition trees in A(O) and its successive
generations. We refer to [9, 10] for a detailed treatment of these. Let
us just mention here that micro-crossover is essentially ordinary crossover for
strings composed of {O, 1, O} ( = patterns) occurring at nodes of recognition
trees with high fitness. Micro-mutation works in a similar way.
It may be proved that the relevance of these operators is higher for recognition
trees with low width. In the general case, they should be applied with
bigger probability in the beginning of the algorithm, relaxing this frequency
gradually afterwards ( cf. simulated annealing).
( 4. 7) The algorithm.
• Start from a random population of N ( n-)recognition trees.
• Select, through reproduction, two recognition trees and apply
- crossover with probability Pe;
- switch with probability Ps;
- translocation with probability Pt;
- micro-operators with probability Pμc and Pμm·
• lterate (until a suitable population A(t) is obtained).
5. Recognition schemata
(5.1) Justas in the classical situation, where schemata are used to describe
populations of strings, the global structure of a population of n-recognition
trees may be studied by using socalled n-recognition schemata. Here, an nrecognition
schema is just an r-recognition tree H with r :S n. We say that
H represents an n-recognition tree A ( or that A satisfies H) and we denote
this by H --t A, if there exists an embedding of H into A. This means that
H may be recovered as a subtree of A, up to replacing sorne non-terminal
nodes in A by *·
106
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
For exarnple with A and H as below we have H -t A:
PI
+/ '\.-
A: * P2
Pi
+/ '\.-
H: * P2
+/ '\.- +/ '\.-
p3 * * * +'/ '\.-
* *
If H -t A, then it is also clear that
Ene( A) = ( . .. Eneo(H) .. . ),
where Eneo(H) is obtained from Enc(H) by (possibly) replacing sorne ter
·rninal syrnbols * by patterns.
( 5.2) If "rnost" representatives of sorne fixed recognition scherna H by
recognition trees in a population A(O) have high fitness, then this seerns
to irnply that the presence of the "pattern" H is the origin of the high fitness
of these representatives. More evidence for this arises if H tends to
"survive" through the successive generations generated by applying the genetic
algorithrn. So, for each positive integer t, !et A(H, t) be the set of ali
recognition trees A E A(t) with the property that H -t A and let us write
m(H, t) =I A(H, t) I· The efficiency of the genetic algorithrn and its effect on
H rnay be understood by considering the value of m(H, t) through successive
generations.
( 5.2.1) If only reproduction is applied, then, exactly as is the classical case,
we obtain
J(H)
m(H, t + 1) = m(H, t)¡·
It is thus clear that schemata with high fitnes will receive an increasing
nurnber of representatives. If we assurne, as a first approxirnatiop, that J( H)
constantly rernains above the average, say f(H) = aj with a > 1, then we
obtain m(H, t + 1) = am(H, t), so m(H, t) ,...., atm(H, O). The nurnber of
recognition trees in A(H, t) thus increases exponentially, in this case. One
rnay show sirnilarly that if H seores below average, then m(H, t) decreases
exponentially.
107
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
(5.2.2) If the other operators are also applied, then sorne straightforward
combinatoria! calculations ( see [9, 10] for full details) show that
r-2 r-1 r-2 r-1
m(H, t + 1) ~ m(H, t)(l - Pe n _ 2 )(1 - Ps n _ l )(1 - Pt n _ 2 )(1 - Pμ n _ l ),
where r = w(H) and pμ = Pμe + (n - l)Pμm· If we assume the above probabilities
to be low, we then, by omitting crossed products, get
J(H) r - 2 r - 1
m(H, t + 1) ~ m(H, t)-J - (1 - (Pe+ Pt)-- - (p.+ Pμ)--).
n-2 n-1
We thus obtain
(5.3) Theorem. (Schema Theorem) The genetic algorithm attributes to
recognition trees with low width and high fi.tness an exponentially increasing
number o[ representatives through the successive generations.
6. Concluding remarks
( 6.1) The previous techniques may al so be applied to strategy trees. These
only differ from recognition trees, in the fact that they do not necessarily have
a fixed number of terminal nodes. They obviously find their origin within
game theory, where they are used to model 2-player strategy games. The
results obtained in [7] show that genetic algorithms may be used on these
trees, to find optima) strategies, for example.
(6.2) In principie, visual information (bitmaps) may be encoded linearly.
For example, the 4 x 3 bitmap
1•
•
108
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
may be encoded as 100 110 011 100 (using rows) oras 110 101 100 010 (using
columns ). Although this works perfectly well if we want to apply the genetic
algorithm, we may increase its speed and efficiency by working directly with
a two-dimensional matrix instead of with linear encodings. Introducing suitable
genetic operators for these "two-dimensional strings", one may thus take
into account two-dimensional correlations (hardly visible in linear encodings)
and obtain the desired enhancement of the genetic algorithm. We refer to
[9, 10] for details.
References
[1] Bagley, J.D., The behaviour of adaptive systems which employ genetic
and correlation algorithms, ph.d. thesis, University of Michigan, 1967.
[2] Cavicchio, D.J., Adaptive search using simulated evolution, ph.d. thesis,
University of Michigan, Ann Arbor, 1970.
[3] Frantz, D.R., Non-linearities in genetic adaptive search, ph.d. thesis,
University of Michigan, Ann Arbor, 1972.
[4] Goldberg, D.E., Genetic algorithms in search, optimization and machine
learning, Addison-Wesley, 1989.
[5] Holland, J.H., Adaptation in natural and artificial systems, University
of Michigan Press, 1975.
[6] Krishnaiah, P.R. and L.N. Kamal, Classification, pattern recognition
and reduction of dimensionality, North Holland, 1982.
[7] Robeys, K., H. Van Hove and A. Verschoren, Genetic algorithms and
trees, part II: Strategy trees, Computers and Artificial Intelligence, submitted.
[8] Smith, S.F., A learning system based on genetic adaptive algorithms,
ph.d. thesis, University of Pittsburgh, 1980.
[9] Van Hove, H., Twodimensional genetic algorithms, ph.d. thesis, University
of Antwerp, in preparation.
109
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017
[10] Van Hove, H. and A. Verschoren,Genetic algorithms and trees, part /:
Recognition trees, Cornputers and Artificial Intelligence, submitted.
[11] Van Hove, H. and A.Verschoren, Two-dirnensional genetic algorithrns,
to appear.
[12] Van Hove, H. and A. Verschoren, Genetic algorithrns applied to recognition
problerns, University of Antwerp, RUCA, preprint, 1992.
Rec ibido : 30 de Diciembre 1993
110
© Del documento, de los autores. Digitalización realizada por ULPGC. Biblioteca Universitaria, 2017