This paper was written in 1997 or ealier. A postnote written Oct 2003 follws at end of this paper.

Below are some of the preliminary results of the project I have been working on. I've managed to program a GA engine that accepts these basic data types ... strings, integers, floating-points, and complex number values. This gives one the freedom to structure problems in a universal way without worrying about internal GA implementation details. I have decided to pursue this subject from the ground up to see how far I can get without researching current efforts in the field. My attitude is to see if I can bring any fresh creativity into the problem with my own explorations.

If this file is too wordy you can cut to the chase at the end, where I discuss actual test runs of my program.

Genetic Algorhythm Project

My starting hypothesis is that one might define any computable problem as a vector in 'attribute' or 'feature' space. By representing each attribute as a mutable gene entity and linking genes into chromosomal like chains, we can represent a feature vector as a chromosome.

A population of chromosomes may then be crossover-mated, mutated,and tested as feature vectors to see which ones best represent a fitsolution to the given problem. By creatively repeating this cycle and discarding the less fit members, the population as a whole may evolve.

This technique of imitating nature gives interesting insights, but one must remember that this is not a simulation; it is an inspirational technique only. To preserve problem semantics it is necessary that crossovers occur only between identical gene types and within gene boundaries. A gene in my representation links an internal mutable representation to one and only one attribute or feature of a problem. This must be preserved. A chromosome can have any number of different types of genes of any length but the chromosomal template or structure must be inviolate as that is the essence of the problem representation.

The internal implementation is based on simple byte-encoded character strings. One byte is the minimal mutable element. A simple gene may consist of any number of elements. Complex genes are encoded as composites and object-oriented derivative classes which fall on the branches of an inheritance tree. All base GA functionality is coded using polymorphism or virtual functions.

Mating occurs via a crossover technique which preserves positional order of elements at all times. This is necessary to assure the semantics and syntax of representation as well as allowing convergence toward a solution. I feel this was a crucial design insight. Along the way I also solved a thorny problem in dynamic type binding. One can use virtual functions to treat any runtime derived object as a basetype and ignore 'type' problems. The trouble is that one can create virtual functions for destructors and any other functionality desired, but one can not in C++, write virtual constructors! How could a compiler write code to create an object which it doesn't know about?

I needed this ability because I wanted to let the client process create any new geno-types desired and feed them to the GAS engine, without having to rewire the internal GA code. You see, to create a population I needed to have some structural template to make copies of.The stucture cannot be pre-determined, but must be encoded at runtime. My clever solution ( I pat myself here ) was to create a virtual replicator function. Each gene can and must provide a facility for dynamically creating a copy of itself and handing it over to any callingprocess(function). Once I did this, my implementation was severed from interface, and I was free ( I hope ) to experiment with various genetic models without having to modify existing code.

As it stands now, I have a working protype that accepts input either as basic number types or as full-blown geno-types. Clients can derive their own derivative class genes from the set of base classes I designed and feed them to the GA also. This should give all the flexibility needed for experimentation.

Another interesting feature that I built in is that genes are constructed with an address link to any of standard native machine types -char, int, double, complex. The values at these linked addresses are automatically updated as each gene changes. One can thus spontaneously structure any GA problem as one would any ordinary programming challenge,i.e. as some set of variables. These variables will be automatically updated for the client so that it can track their values as they converge to a solution or not. No extra effort is needed. This is a'2-fer' since such access is needed when defining fitness functions. The programming effort is thus simplified into :

1) Define a set of variables which will represent some problem.

2) Define a function for testing the variables and returning some error or solution-fitness level.

3) Feed the variables and function to the GA constructor and let 'er rip.

4) Tinker with GA population parameters to enhance convergence. Population parameters include number of chromosomes, percentor degree of mutatations and number of cycles. I would also like totinker with crossover parameters and various mating strategies suchas associative selectivity, competitive mating, random mating, altruism, etc. Any ideas or suggestions would be appreciated.

XXXXXXXXXXXXX FIRST RESULTS XXXXXXXXXXXXXX

My first test consisted of three problem variables: an integer,a floating-point and a complex number. For simplicity of debugging,I setup a simple fitness function which evaluted the variables and compared them to fixed target values. This is a trivial problem, mathematically speaking but gave me a way to trim tab the GA code.

Initial population size was only 8 chromosomes or units. Mutation was set at 10%, and cycles at 50. As it turned out, these weren't bad starting guesses. The fitness level did indeed tend to improve from 0.0 up to about 0.4 , at which point it sort of oscillated about that range. I toyed with the mutation level and found that too much had the effect of wiping out previous gains, while too little had too little effect. Values from 3% to 6% seemed to do best.

Then I got the idea of preserving the most fit 1 or 2 units from mutation while mutating the rest. This stopped the oscillations and allowed for steady improvement. By uping the cycles and population size, I finally managed to achieve fitness levels of 0.99998. The catch of course, is that one desires small rapidly evolving populations,if real problems are to be attacked. I'm guessing that a sliding scale of mutation percentages might have interesting results. For example, if we have a bad egg in the bunch we might risk letting it experiment more.

The mating strategy already discards the least half of the whole population. Only the upper half are allowed to mate. Mating occurs via sequential order or decree and not by choice or random. Mutation occurs after mating and then the population is sorted by fitness and a new cycle begins. Other mating strategies might prove better.

Once I get some fundamental clues to optimum behaviour, I should be able to automate the parameter changes and make the GA anadaptive system which seeks convergence yet avoids cul' de sacs. At this stage I find the project a qualified success. I've overcome the major programming hurdles. I went through six model design shifts before homing in on this one. Implementation details and debugging can be quite tricky, but now I'm free to play a little with the engine and tune it.

One last kink looms large in this sort of thing, however.The larger question is how do you structure interesting problems, and what kinds of problems are best suited for the GA approach? There'sa dandy Catch-22 also. In a nut-shell, how the devil do you evaluatethe fitness of some answer if you don't already know what the correct answer is? Anybody got any clues to this one? If you've ever read any articles on GAs, this is the one apsect they allways gloss over!

Where the bit hits the bucket ... ahhhh there's the rub. Rasputin

//POSTNOTE OCT 2003

I next experimented with the classic 'knapsack problem' and got excellent results. The algorithm converged rapidly to an optimum solution. I discovered, however, that the fitness function is the crucial factor in all GAs. Small changes here can have major effects on convergence rate.

It is my contention that GAs are ideally suited for optimization problems because they avoid the traps and sink-holes that traditional 'hill climbing' algorithms run into. This is because the GA is able to scatter-shot test non-local pieces of the whole solution space with each evolution. They dont get trapped in local valleys. One downside is that convergence can take a while sometimes, but more important, I do not believe a GA can in and of itself add any useful new information to a system other than that which is encoded in the fitness function. I am unsure if one can program in fitness functions which themselves evolve via the GA mechanism. It seems to me that one has to inevitably hard-wire a fitness test somewhere. Now, one can hide this functionality imaginatively in a dynamic interactive enviroment but I still maintain that the rules one uses to set up the interacting entities in this enviroment will in turn be hardwired.

The above limitation makes me think that GAs can be very useful in an AI project but to achieve AI itself will require integration with other techniques perhaps.