Friday, February 24, 2012

The NexGen Algorithms


Modern evolutionary algorithms do not learn. They can be adaptive during the search of solution, but when the run is over, they forget everything and if we re-run the problem, the algorithm starts its work from a scratch. This is normal for algorithms, but if we look at all living beings we'll see that they do not forget everything that catastrophically! Yes, single experience can be resultless, but after trying to solve some problem for hundreds and thousands times a new knowledge and experience eventually emerges. This is the way people learn to walk, speak, handle spoon and many other things. As people get older their past experience helps them to learn things easier (if these new things correlate with their experience).
From the other hand there are incremental learning algorithms, which are being more or less actively developed for the last 2 decades. Those algorithms can learn in complexifying and/or dynamic environments, which makes them more adaptive and they do not suffer that much from 'afterun amnesia'. There are certain problems though, like 'catastrophic forgetting', when after several updates algorithm totally forgets what it learned initially and starts behave in a new unexpected way. Well, that's normal for humans, it's just like if some person learned to speak English in a childhood and after living in France for years speaking in French this person forgets its first language. I believe that if catastrophic forgetting is not very fast then it's not a major problem. The larger issue here is that incremental earning algorithms usually are applicable to a certain domain only. That is they do learn something and collect past experience, but this experience is almost useless for solving other problems. So 'increamental learning' can be called something like 'temporal adaptation' and is good in its way, but still far from true learning algorithm. And, yes, increamental learning is usually for machine learning, but not for optimization.
The ultimate goal here may be in creation of the algorithms, which simply learns from any problem and extracts some general abstract regularities and rules to improve its performance for future tasks. This is closely related to the meta-learning concept and I believe that this can be done if some external trainable control system is attached to the algorithm to guide this algorithm's behaviour. This is a very amibitious task, connected with creation of very adaptive algorithms and probably one of the milestones to overcome on the way to true AI, since some implicit method 'to learn how to learn' is already built-in by evolution inside living organisms (animals can improve their skills!) and is essential for survival and for emergence of highly-organized intelligence.
There's a well-known meta-systems transition theory by V. Turchin which states that system enters a new level of development when a meta-level control system appears. For the existing evolutionary algorithms some origins of such systems can be found, like rules for adaptation or some heuristics like: if we move in the same direction, then increase movement speed; or try to cluster variables to understand their relations for proper optimization. And the next big step here is possible unification of these heuristics and adding memory fo EA, so that the algorithm could remember what problems it already solved and what were useful and useless rules. Having this memory it could be possible to recognize some common parts between problems (like, 'hmm, I solved something like this before, and I probably should apply correpondent set of optimization rules'). That may be my fantasy, and there are a lot of open questions (how to store information about problems features and compare problems of different dimensionalities etc.), but this is a real task for scientists, which probably can boost development of algorithms. That's why NexGen :)    

Tuesday, February 21, 2012

The Neuroevolution, or to Randomness and Back Again

Most researchers, who develop neuroevolutionary algorithms involving evolution of ANN structure, concentrate on creating 'smart' algorithms out-of-box. This means, that the algorithm should have some predefined constraints to prevent too fast growth of the number of nodes and connections. It's also possible to track nodes' activity, counting each node's degree or total signal, which is translated through a node. Knowing these activities can be helpful for defining, for example, which nodes to connect ('active' ones) and which to remove ('lazy' nodes)
In one of my earlier posts there was an interesting observation that average number of hidden units tend to stabilize over time if there were some rules to decide, which operation to modify ANN structure to apply. Those rules depended on current ANN topology.
The interesting question is whether we could observe this 'auto'-stabilization if no ad-hoc rules are there, guiding which operation to use to mutate ANN.
I made series of experiments for one of the simplest NE algorithms for evolving both weights and structure of ANN (including activations).
The following operators were used

  • Add Node with Default Activation (Sigmoid). Adds isolated node. This operations has no immediate effect on individual's fitness (neutral operation), but in some consequent generation this node can be advantageous. Note also that even adding connections to this node not obligatory lead to a fitness change, if there is no other node in the network that accepts output signal of a new node.
  • Remove Random Node. Removes random node regardless on the number of its connections or that ANN structure can become meaningless, when input signals are not translated onto ANN outputs. Input and output nodes can not be removed.
  • Add Random Connection. Adds connection with random weight between two randomly chosen nodes.
  • Remove Random Connection. Removes random connection from a network. If in result some node becomes isolated, then this node is removed ('dying').
  • Modify Weight. Changes weight of randomly chosen connection. The weight's delta is defined using uniform distirbution.
  • Modify Activation. Changes activation function for random node. Can not change activation for input nodes
It's easy to see that these operators do not use any heuristic rules and behave randomly. To make algorithm even more random selection of operators is made also at random. That is when some ANN undergoes mutation and the algorithm to decide mutation operator, a random operator is picked.
I've used quite a bunch of test problems
  • XOR
  • Artificial Ant
  • 1 and 2 Poles Balancing
  • 7 problems from the Proben1 set (cancer1, card1, diabetes1, glass1, heart1, horse1, thyroid1)
And here are some results. Below are plots with averaged change of average number of nodes and connections for different problems (click to enlarge).



It's easy to see, that in most problems change of nodes number is step-like and that increase of the number of hidden nodes goes slowly.
So even for the NE algorithm with simplest acting operators and pure random operators choice scheme the number of hidden nodes seems to be controllable. Thus we can conclude that ANN structure growth during evolution may be not a major problem. Surely if some rules for application of operators change then different results can be achieved.
One more interesting question is was such a random NE algorithm succesfull at solving the test problems. And the answer is partially.

Here are the solved problems (in brackets there are test set classification errors)
  • 1-Pole Balancing (100k ticks equilibrium)
  • Artificial Ant (ate all 89 pellets)
  • card1 (13,95%)
  • heart1 (21,30%)
  • horse1 (31,87%)
And the unsolved problems
  • 2-Poles
  • cancer1 (37,36%)
  • diabetes1 (36,46%)
  • glass1 (69,81%)
  • thyroid1 (97,78%)
  • XOR (66,06%)
The results for unsolved problems seem rather devastating (up to 97,78% errors), but I doubt that everything was that bad. It was a very simple NE algorithm, a lot should be done to make it work well. And moreover it's already sort of miracle that this algorithm was able to solve several problems working almost at random using only evolution for a guidance.
One of the errors with current algorithm is that for some problems the resulting ANN always fires with only one output, and all other outputs are silent. Such a network gives correct answers from time to time, but it's just like stopped clock that shows correct time twice a day. So possible solution is to add regularization term into fitness function to encourage diversity of ANN outputs. But it is a work for future research and there are some other interesting results which I'll present a bit later.