Monday, July 5, 2010

Some thoughts on recognizing systems

Here are some thoughts just to organize my ideas for possible future research and clarify some problems and questions. The general problem is building a classifier (recognizing) system, which could evolve and learn to get new knowledge hopefully from various sources. This is a well known and a very challenging problem, which is not solved yet though a lot of people proposed their own solutions (like CEC, Phaeco, Adaptive Resonance Theory, ontologies, large-scale knowledge-based systems, biology-inspired approaches etc.). They are all good to some extent, but they also miss something, which makes them non-universal. I do not claim that I can propose my own solution (and quite possibly I'm not the person who will do this), but here are just some thoughts and classifications sketches, which are useful to think about.

===========================================
All learning systems can be divided into 2 major classes:
1. Supervised - there is a "teacher" which tells the System what output it should yield on a given input.
2. Unsupervised - the System itself to decide what to do with an input data looking for possible regularities. After the training such Systems can be used either "as is" (like Kohonen's maps) or their output can be further processed, e.g. mapped into classes space like in Echo-State Networks or Hawkins Networks.

===========================================
Input data is in general falls into (combination of) the following types:
1. Permutation-invariant, e.g. if we swap some values in object's description the object's class will remain the same. This is typical for images analysis problem when rotation or moving of the object on the image should not change the recognition result. This corresponds to the non-ordered feature vectors case. Some times this type of data doesn't demand full-scale permutation invariance, for example when recognizing hand-written text it's rotation or panning should not affect the recognition, but swapping pixels at random can distort the input critically.
2. Not permutation-invariant. Just the opposite case when order of feature vector components is significant which is often met in traditional setting of the classification problem, recognition of speech, processing time-series etc.
3. Scale invariant. This is a somewhat tricky thing. Input vector size remains the same but its spatial and/or temporal resolution changes. The example is recognition of letters and numbers from their images or recognition of fastened or slowed speech. The System, which is able to deal with this type of data, should adapt somehow to the characteristic scale of the input. In image processing this is sometimes reached via using the local maxima of Laplacian-of-Gaussian or Difference-of-Gaussian.

Note, that input data can be either of (1st and/or 3rd) or 2nd types, which means that types (1+3) and 2 are exclusive and their simultaneous processing demands either different subsystems or a kind of switching (with adaptive recognition of what type of data is being introduced on input).

===========================================
Most modern recognizing systems are just mere numerical windmills: they process numerical data using ad hoc + experimental assumptions like "if we have significant change of variance than the object under consideration changes its state", "it's enough to utilize information about lines orientation to recognize image category". But (almost?) no system have "semantic layer", which could consider interconnections between different classes and notions, organize them into sentences and statements etc., and which could affect the result of recognition. I believe that all researches feel that we've got to use semantic information as an important element in recognition process, but no good solution has yet been proposed. The problem here is twofold:
- it's extremely difficult to build such a system, because we do not have enough knowledge of what cognitive features are and how they can be modeled in a full-scale.
- most successive system rely on mathematics --> we will (almost?) definitely use numbers. Yes there are approaches which take into consideration fuzziness of the data, but such systems are still strictly governed by numbers, which define their vital parameters.

The big question is whether we can numerically approximate human's brain (or whatever is used for thinking, storing memories, setting goals, wishes, creating emotions and so on). There've been a lot of arguments on this theme, but the answer is unknown.

===========================================
I believe that the importance of multi-layered recognizing systems have been acknowledged for several past years. There are "deep-learning" architectures of neural networks (like those used by Yann LeCun and Geoffry Hinton), Hawkins networks, many good systems have a multi-component organization with separate stages of processing and recognition. However such system are still non-universal in terms of input data types (see text above) and solved problems and really important questions are still unanswered:
- How many layers to use for different problems? And how many different problems from the layers number points of view exist?
- How these layers should be formed?
- What components they should include?
- Is there a set of universal components (like nucleotides in DNA or subatomic particles)? And how strong they are "granulated" (how small they are)?
- How should we connect different layers?
- How to organize "natural" hierarchy so that a multi-layered architecture could be considered as a single layer?
- In brain spatial relations, bio-chemical and physiological  processes play very important role. How can we utilize our knowledge about them?
- How can we make such multi-layered systems in a fully automatic unified manner?

Since there is no system or approach which solves these problems automatically, we all have "hands to kiss and babies to shakes" ((c) by LeChuck), let's work :)

No comments:

Post a Comment