One difficulty in evolving both weights and architectures for Artificial Neural Nets (ANNs) is the computational expenditure required to train the population of ANNs in each generation (cf Whitley (1995)). Training sets can be large - on the order of 103 - 104 samples of the domain of interest, and training must be performed for each individual. Thus, for even modest population sizes, the computational expenditure is high. Furthermore, it is not clear (see Mitchell (1996)) that evolution of the NN weights alone can outperform well understood and computationally straightforward gradient descent methods.
What is less well understood for ANNs, however, is finding the optimal architecture - widely regarded as a domain-dependent problem. It is here that GAs or GP are most promising (Gruau 1994) as a method for seeking an optimum architecture. What is still required, however, is a means for rapidly training each ANN as they evolve.
Spall (1987, 1998) developed a rapid gradient descent method for multivariate optimization for problems in which the gradient is either computationally expensive to directly calculate or measure, or is noisy. Specifically, Spalls method - called Simultaneous Perturbation Stochastic Approximation (SPSA), was developed to optimize models of systems for which limited experimental data was available. SPSA applies to all problems in which the Loss function (-Fitness) is a continuous function of the variables to be optimized, and can be shown to provide a factor of N reduction in computation over Finite Differences, where N is the number of variables to be optimized over. Spall also shows that the gradient estimate produced by SPSA is to first order as good as the actual gradient for optimization purposes.
In this project, we will evolve a population of feedforward ANNs for pattern recognition, using SPSA to reduce the number of training cases required. It is proposed that the more rapid training will enable a more accurate estimate of ANNs fitness with less computational expenditure, thus allowing either more rapid evolution or larger population sizes for a better global search.
Genetic Algorithms and Genetic Programming been used extensively both to evolve neural network architectures and to evolve optimal connection weights for fixed architectures. Whitley, however, notes that cases in which GA development of optimal weights outperform gradient methods (e.g. backpropagation) are rare. Mitchell (1996) notes efforts to evolve NN weights, but there does not appear to be any compelling evidence that GAs can outperform backpropagation appropriately modified to avoid local optima ( cf Haykin, 1994).
Representation issues are important in evolving ANNs because of the size of the genome as the ANN evolves. and In some approaches, the NN phenotype is directly represented in the genotype, but this can limit the size and complexity of the phenotype, since the storage requirements for the weighting matrix grow as N2. Others note the highly compact representation of living genomes, and take the more biologically feasible approach of representing a program (cf Koza and Rice (1991)) or a set of rules for growing an NN - a recipe instead of a blueprint . Kitano (1990) provided a string replacement system for evolving the weights of a Boolean neural network. Gruau (1993, 1994) uses GP to evolve a graph representation of neural network ontogeny in a method he calls Cellular Encoding. Gruau claims that his method can be used to evolve any ANN while encoding it in a maximally compact manner.
Statement of Hypothesis
I hypothesize that the use of SPSA will accelerate the evolution of a neural network architecture for learning perceptual patterns in an artificial environment relative to backpropagation. This is with respect to a tabula rasa ANN with initially random weights.
The context for fitness and selection will be an artificial environment which consists of predators, food and mating opportunities. All three have perceptual attributes which comprise the domain of the ANN.
The project will use endogenous fitness evaluation - the individuals will reproduce if they perform well in the artificial environment by seeking food and mates and avoiding predators. A single attribute of the individual, which we will call energy determines the individuals survival - negative energy signifies death, and leaves unfilled slots in the population. At birth, each individual is equipped with a reserve of energy, or baby fat, Ebf and must learn to increase it in order to survive long enough to join the mating pool. Acquiring food (by strongly seeking food in response to its perceptual attributes) adds a certain amount of energy EF, and failing to avoid a predator subtracts a certain amount of energy, -EP. Seeking a mate gives the individual an opportunity to enter the mating pool the next time a slot opens, but also results in an energy deduction, -EM. Every time step which passes in which food is not acquired results in a deduction specified by -EH.
Population size is stabilized by filling empty slots with new offspring from parents who have joined the mating pool by acquiring a mating opportunity through successfully recognizing the perceptual attributes of a mate. Mates will be randomly selected from the pool. Entry into the pool requires forfeiture of one mating opportunity, but individuals will have the opportunity to acquire new opportunities, and may have acquired more than one. A highly fit individual will have a long life and will be good at recognizing mates and avoiding predators, and thus will enter into numerous mating lotteries and will reproduce more frequently.
The Perceptual Space
Well begin with a two dimensional perceptual space (for example, two types of colors, say redness and blueness). The space is partitioned into food, predators, neutral areas, and mates. An example is shown in Figure 1. Note two types of predators lurking near mates and food sources. Particularly insidious predators may even overlap food or a mate. Each type of entity is described as a rectangle by specifying its upper left hand corner, length and height.
As in individual moves through time, it is presented with
uniformly randomly selected perceptions. A single output neuron will
signal seek (a positive activation) or avoid
(a negative activation). If the individual correctly seeks food or a
mate, then its probability of obtaining food or acquiring a mating
opportunity is equal to its activation level, -1 ² Ao ² 1. In the case of a predator, the
activation minus 1, divided by two is equal to the probability of
encountering that predator, with the concomitant large energy
deduction - thus the individual will fare best with a strong
avoid signal. Any effort to seek or avoid when the
percept is neutral will result in an energy deduction proportional to
the activation level and parameter EN.
The Learning Process
To use SPSA, it is required that the Loss function be a continuous function of the independent variables. Since fitness is measured endogenously, it is difficult to directly use fitness as the -Loss function. Instead, we use the square of the difference between the desired activation level and the actual activation as the loss:
Where AD is the desired activation.
SPSA performs small, simultaneous random perturbations of the entire weight vector to estimate the loss function gradient. It nominally only requires two experiences to point to a better weight set (statistically speaking). It should quickly climb to the nearest hilltop in weight space. Some of the population may well climb to local optima, but for a weak local optimum this should result in shorter lifetimes and fewer matings, since members which usually climb better hilltops will perform better.
Representation is one of the most difficult issues to grapple with for this project. Evolving a straightforward blueprint - direct encoding of the ANN is easier to program and understand, but is expensive in terms of genome size for larger networks. Using a more compact representation like Gruaus Cellular encoding (Gruau, 1994) promises unlimited size networks using encoding of subnetworks, but is more difficult to code and debug.
I am investigating the use of 3 Lindenmayer system (Prusinkiewicz and Lindenmayer, 1990) using 3-dimensional Turtle geometry to grow a structure which terminates in the two-dimensional weight matrix. This promises to be relatively simple to encode and program. Other types of parametric L-systems are also being pursued as representation schemes. One essential property required is the ability to encode connection strengths, even if no more precisely than to specify whether the connection is inhibitory or excitatory.
Initializing the Population
It is important to initialize the population with individuals of sufficient complexity and possessing sufficient baby fat to be able to survive and reproduce. Excessively simple networks might not be able to learn the difference between food and predators, for example, and would quickly become extinct, since none would ever enter the mating pool.
One way to directly measure how a population is evolving is statistics on individual lifetimes and population lifetimes. Does the population remain stable or go extinct, given the selection pressures at hand? How long lived are the longest-lived individuals?
As the population evolves, individuals should live longer lives, but this means fewer reproductions in the fixed population size, and a larger mating pool, so evolution should slow. If selection pressures are increased (for example, by increasing Ep), or if the environment changes (food sources move around or get smaller) we would expect evolution to accelerate
Time permitting, some of the variations on the basic theme above I intend to attempt are discussed below.
Exploitation of additional perceptual attributes
If an additional perceptual attribute is added, will the population evolve the ability to exploit it? For example, it might allow individuals to more clearly distinguish predators from food. The degree of exploitation should depend on the selection pressure - if the overlap between food and predators is already small, the additional expenditure of growing the extra neurons may not be justified.
Simulated coevolution of the predators.
Over time, the predators become more deadly, simulated by gradually increasing Ep. This increases selection pressure and may jolt the population out of local optima.
Actual Coevolution of predators
Evolve the population(s) of predators with similar rules. This should generate arms races, thus accelerating the evolution of the subject population.
Evolution of mating attraction
Where in perceptual space a mate appears is permitted to evolve, with each individual having its own appearance. The resulting sexual competition increases selection pressure and could result in interesting strategies.
Balakrishnan, K. and Honavar, V. (1995), Evolutionary Design of Neural Architectures: A Preliminary Taxonomy and Guide to the Literature. Artificial Intelligence Research Group, Iowa State University, CS Technical Report 95-01.
Gruau, F. and Whitley, D. (1993) , The Cellular Development of Neural Networks: the Interaction of Learning and Evolution.
Gruau, Frederic (1994), Neural Network Synthesis using Cellular Encoding and Genetic Algorithm, PhD Thesis, LEcole Normale Superieure De Lyon, January 1994.
Haykin, Simon (1994), Neural Networks - A Comprehensive Foundation. Prentice Hall, 1994.
Kitano, H. (1990), Designing Neural Network Using Genetic Algorithm with Graph Generation System. Complex Systems, 4:461-476.
Koza, J. and Rice J. (1991), Genetic generation of both the weights and architecture for a neural network. IJCNN92, International Joint Conference on Neural Networks, pages 397-404, 1991.
Miglino, O., Nolfi, S., Parisi, D., Discontinuity in Evolution: how different levels of organization imply pre-adaptation. Technical Report, Institute of Psychology, National Research Council, Rome, Italy.
Mitchell, Melanie, Introduction to Genetic Algorithms, chapter 2. MIT Press, 1996.
Prusinkiewicz, Przemyslaw, and Lindenmayer Aristid, The Algorithmic Beauty of Plants, Second Edition, Springer, 1990.
Spall J. (1987), A stochastic approximation technique for generating maximum likelihood parameter estimates, Proceedings of the American Control Conference, 1987, 1161-1167 (first paper on SPSA).
Spall J. (1998), Implementation of the Simultaneous Perturbation Algorithm for Stochastic Optimization," IEEE Transactions on Aerospace and Electronic Systems, 34, 817-823, 1998, (guidelines for practical implementation).
Whitley, D., Genetic Algorithms and Neural Networks., in Genetic Algorithms in Engineering and Computer Science, J. Periaux and G. Winter, ed., John Wiley and Sons, 1995.