Classification in Almost Empty Spaces Bob Duin, Delft Technical University, The Netherlands

The developing sensor technology makes for an increasing number of application areas, more data, with a higher resolution and in a growing diversity of measurement modes available.  Automatic interpretation thereby becomes more needed. The design of pattern recognition systems, originally relying on the use of expert knowledge, has, consequently more often to be based on raw data. Issues in the design of training schemes based on such data will be discussed.  The traditional pattern recognition systems suffer from the curse of dimensionality, or overtraining. Their performances deteriorate eventually if the data is represented with a higher resolution by more features in a higher dimensional space. Character recognition directly based on 32 x 32 images is much more demanding for the training procedure (in terms of time, space and in particular training examples) than if it is based on 10 good features. This is caused by the fact that the representation space of 1024 dimensions (built by pixels) is almost empty and that a feature space built by 10 features may be reasonably filled by training examples. Some possibilities to solve this problem are: (1) the automatic detection of subspaces, (2) the training of robust (regularized) classifiers, (3) the combination of classifiers (as from a small dataset cannot be derived which on is good), (4) the use of different, e.g. dissimilarity based representations that better make use of the natural connectivity between neighbouring pixels.  Examples in the areas of recognition of images and spectra show that such approaches my help to overcome the problem of building pattern classifiers in almost empty spaces.