Tuesday, February 21, 2012

The Neuroevolution, or to Randomness and Back Again

Most researchers, who develop neuroevolutionary algorithms involving evolution of ANN structure, concentrate on creating 'smart' algorithms out-of-box. This means, that the algorithm should have some predefined constraints to prevent too fast growth of the number of nodes and connections. It's also possible to track nodes' activity, counting each node's degree or total signal, which is translated through a node. Knowing these activities can be helpful for defining, for example, which nodes to connect ('active' ones) and which to remove ('lazy' nodes)
In one of my earlier posts there was an interesting observation that average number of hidden units tend to stabilize over time if there were some rules to decide, which operation to modify ANN structure to apply. Those rules depended on current ANN topology.
The interesting question is whether we could observe this 'auto'-stabilization if no ad-hoc rules are there, guiding which operation to use to mutate ANN.
I made series of experiments for one of the simplest NE algorithms for evolving both weights and structure of ANN (including activations).
The following operators were used

  • Add Node with Default Activation (Sigmoid). Adds isolated node. This operations has no immediate effect on individual's fitness (neutral operation), but in some consequent generation this node can be advantageous. Note also that even adding connections to this node not obligatory lead to a fitness change, if there is no other node in the network that accepts output signal of a new node.
  • Remove Random Node. Removes random node regardless on the number of its connections or that ANN structure can become meaningless, when input signals are not translated onto ANN outputs. Input and output nodes can not be removed.
  • Add Random Connection. Adds connection with random weight between two randomly chosen nodes.
  • Remove Random Connection. Removes random connection from a network. If in result some node becomes isolated, then this node is removed ('dying').
  • Modify Weight. Changes weight of randomly chosen connection. The weight's delta is defined using uniform distirbution.
  • Modify Activation. Changes activation function for random node. Can not change activation for input nodes
It's easy to see that these operators do not use any heuristic rules and behave randomly. To make algorithm even more random selection of operators is made also at random. That is when some ANN undergoes mutation and the algorithm to decide mutation operator, a random operator is picked.
I've used quite a bunch of test problems
  • XOR
  • Artificial Ant
  • 1 and 2 Poles Balancing
  • 7 problems from the Proben1 set (cancer1, card1, diabetes1, glass1, heart1, horse1, thyroid1)
And here are some results. Below are plots with averaged change of average number of nodes and connections for different problems (click to enlarge).



It's easy to see, that in most problems change of nodes number is step-like and that increase of the number of hidden nodes goes slowly.
So even for the NE algorithm with simplest acting operators and pure random operators choice scheme the number of hidden nodes seems to be controllable. Thus we can conclude that ANN structure growth during evolution may be not a major problem. Surely if some rules for application of operators change then different results can be achieved.
One more interesting question is was such a random NE algorithm succesfull at solving the test problems. And the answer is partially.

Here are the solved problems (in brackets there are test set classification errors)
  • 1-Pole Balancing (100k ticks equilibrium)
  • Artificial Ant (ate all 89 pellets)
  • card1 (13,95%)
  • heart1 (21,30%)
  • horse1 (31,87%)
And the unsolved problems
  • 2-Poles
  • cancer1 (37,36%)
  • diabetes1 (36,46%)
  • glass1 (69,81%)
  • thyroid1 (97,78%)
  • XOR (66,06%)
The results for unsolved problems seem rather devastating (up to 97,78% errors), but I doubt that everything was that bad. It was a very simple NE algorithm, a lot should be done to make it work well. And moreover it's already sort of miracle that this algorithm was able to solve several problems working almost at random using only evolution for a guidance.
One of the errors with current algorithm is that for some problems the resulting ANN always fires with only one output, and all other outputs are silent. Such a network gives correct answers from time to time, but it's just like stopped clock that shows correct time twice a day. So possible solution is to add regularization term into fitness function to encourage diversity of ANN outputs. But it is a work for future research and there are some other interesting results which I'll present a bit later.

No comments:

Post a Comment