**Introduction
**One difficulty in evolving both weights and architectures
for Artificial Neural Nets (ANNs) is the computational expenditure
required to train the population of ANNs in each generation
(

What is less well understood for ANNs, however, is finding the optimal architecture - widely regarded as a domain-dependent problem. It is here that GAs or GP are most promising (Gruau 1994) as a method for seeking an optimum architecture. What is still required, however, is a means for rapidly training each ANN as they evolve.

Spall (1987, 1998) developed a rapid gradient descent method for multivariate optimization for problems in which the gradient is either computationally expensive to directly calculate or measure, or is noisy. Specifically, Spall’s method - called Simultaneous Perturbation Stochastic Approximation (SPSA), was developed to optimize models of systems for which limited experimental data was available. SPSA applies to all problems in which the Loss function (-Fitness) is a continuous function of the variables to be optimized, and can be shown to provide a factor of N reduction in computation over Finite Differences, where N is the number of variables to be optimized over. Spall also shows that the gradient estimate produced by SPSA is to first order as good as the actual gradient for optimization purposes.

In this project, we will evolve a population of feedforward ANNs for pattern recognition, using SPSA to reduce the number of training cases required. It is proposed that the more rapid training will enable a more accurate estimate of ANN’s fitness with less computational expenditure, thus allowing either more rapid evolution or larger population sizes for a better global search.

Representation issues are important in evolving ANNs because of the size of the genome as the ANN evolves. and In some approaches, the NN phenotype is directly represented in the genotype, but this can limit the size and complexity of the phenotype, since the storage requirements for the weighting matrix grow as N

Population size is stabilized by filling empty slots with new offspring from parents who have joined the mating pool by acquiring a mating opportunity through successfully recognizing the perceptual attributes of a mate. Mates will be randomly selected from the pool. Entry into the pool requires forfeiture of one mating opportunity, but individuals will have the opportunity to acquire new opportunities, and may have acquired more than one. A highly fit individual will have a long life and will be good at recognizing mates and avoiding predators, and thus will enter into numerous mating lotteries and will reproduce more frequently.

As in individual moves through time, it is presented with
uniformly randomly selected perceptions. A single output neuron will
signal “seek” (a positive activation) or “avoid”
(a negative activation). If the individual correctly seeks food or a
mate, then its probability of obtaining food or acquiring a mating
opportunity is equal to its activation level, -1 ² A_{o} ² 1. In the case of a predator, the
activation minus 1, divided by two is equal to the probability of
encountering that predator, with the concomitant large energy
deduction - thus the individual will fare best with a strong
“avoid” signal. Any effort to seek or avoid when the
percept is neutral will result in an energy deduction proportional to
the activation level and parameter E_{N}.

**The Learning Process
**To use SPSA, it is required that the Loss function be a
continuous function of the independent variables. Since fitness is
measured endogenously, it is difficult to directly use fitness as the
-Loss function. Instead, we use the square of the difference between
the desired activation level and the actual activation as the
loss:

Where A

SPSA performs small, simultaneous random perturbations of the entire weight vector to estimate the loss function gradient. It nominally only requires two experiences to point to a better weight set (statistically speaking). It should quickly climb to the nearest hilltop in weight space. Some of the population may well climb to local optima, but for a weak local optimum this should result in shorter lifetimes and fewer matings, since members which usually climb better hilltops will perform better.

I am investigating the use of 3 Lindenmayer system (Prusinkiewicz and Lindenmayer, 1990) using 3-dimensional Turtle geometry to grow a structure which terminates in the two-dimensional weight matrix. This promises to be relatively simple to encode and program. Other types of parametric L-systems are also being pursued as representation schemes. One essential property required is the ability to encode connection strengths, even if no more precisely than to specify whether the connection is inhibitory or excitatory.

As the population evolves, individuals should live longer lives, but this means fewer reproductions in the fixed population size, and a larger mating pool, so evolution should slow. If selection pressures are increased (for example, by increasing E

Over time, the predators become more deadly, simulated by gradually increasing E

Gruau, F. and Whitley, D. (1993) , “The Cellular Development of Neural Networks: the Interaction of Learning and Evolution.”

Gruau, Frederic (1994), “Neural Network Synthesis using Cellular Encoding and Genetic Algorithm,” PhD Thesis, L’Ecole Normale Superieure De Lyon, January 1994.

Haykin, Simon (1994),

Kitano, H. (1990), “Designing Neural Network Using Genetic Algorithm with Graph Generation System.” Complex Systems, 4:461-476.

Koza, J. and Rice J. (1991), “Genetic generation of both the weights and architecture for a neural network.” IJCNN92, International Joint Conference on Neural Networks, pages 397-404, 1991.

Miglino, O., Nolfi, S., Parisi, D., “Discontinuity in Evolution: how different levels of organization imply pre-adaptation.” Technical Report, Institute of Psychology, National Research Council, Rome, Italy.

Mitchell, Melanie,

Prusinkiewicz, Przemyslaw, and Lindenmayer Aristid,

Spall J. (1987), “A stochastic approximation technique for generating maximum likelihood parameter estimates,” Proceedings of the American Control Conference, 1987, 1161-1167 (first paper on SPSA).

Spall J. (1998), Implementation of the Simultaneous Perturbation Algorithm for Stochastic Optimization," IEEE Transactions on Aerospace and Electronic Systems, 34, 817-823, 1998, (guidelines for practical implementation).

Whitley, D., “Genetic Algorithms and Neural Networks.,” in