278 73 660KB
English Pages 3 Year 2001
Cluster-Based
Feature Extraction
and Data Fusion in The Wavelet
Domain Magnus Orn Ulfarsson and Jon Atli Benediktsson
Johannes
R. Sveinsson,
Department
of Electrical and Computer Engineering, University Hjardarhagi 2-6, Reykjavik, 1S-107, Iceland E-mail: sveinssoflhi.is, mouQhi.is, and benedikt(lhi.is
ABSTRACT This paper will concentrate on linear feature extraction methods for neural network classifiers. The considered feature extraction method is based on discrete wavelet transformations (.DWTS) and cluster-based procedure, i.e., cluster-based feature extraction of the wavelet coefficients of remote sensing and geographic data is considered. The cluster-based feature extraction is a preprocessing routine that computes feature-vectors to group the wavelet coefficients in an unsupervised way. These feature-vectors are then used as a mask or a filter for the selection of representative wavelet coefficients that are used to train the neural network classifiers. In experiments, the proposed feature extraction methods performed well in neural networks classifications of multisource remote sensing and geographic data. 1. INTRODUCTION The selection of variables is a key problem in pattern recognition and is termed feature selection or feature extraction [1]. However, few feature extraction algorithms are available for neural networks [2]. Feature extraction can, thus, be used to transform the input data and in some way find the best input representation for neural networks. For high-dimensional data, large neural networks (with many inputs and a large number of hidden neurons) are often used. The training time of a large neural network can be very long. Also, for high-dimensional data the curse of dimensionality or the Hughes phenomenon [1] may occur, Hence, it is necessary to reduce the input dimensionality for the neural network in order to obtain a smaller network which performs well both in terms of training and test classification accuracies, This leads to the importance of feature extraction for neural networks, that is, to find the best representation of input data in lower dimensional space where the representation does This work was supported in part by the Research Fund of the University of Icelancl and the Icelandic Research Council.
0-7803-7031-7/01/$17.00 0-7803-7031-7/01/$10.00(C) (C)2001 2001IEEE IEEE
867
of Iceland,
not lead to a significant decrease in overall classification accuracy as compared to the one obtained in the original feature space. In this paper linear feature extraction method, based on cluster-based feature extraction of the wavelet coefficients for neural networks classifiers are discussed and applied in classification of multisource remote sensing and geographic data. The method is an extension to a method proposed by Pittner and Kamarthi [3]. 2. FEATURE
EXTRACTION
Here we concentrate on linear feature extraction methods for neural networks and then leave the neural networks with the classification task. 2.1,
Wavelets
The discrete wavelet transform (DWT) [4] provides a transformation of a signal from the time domain to the scalefrequency domain. The DWT is computed on several with different time/scale-frequency resolutions. As each level of the transformation is calculated, there is a decrease in temporal resolution and a corresponding increase in scalefrequency resolution. The full DWT for a time domain signal in ,C2(finite energy), z(t), can be represented in terms of a shifted version of a scaling function @(t) and shifted and dilated version of a so-called mother wavelet function @(t). The connection between the scalar and wavelet functions with the scalar function at different scale are given by the two scale equations ~(t) = ~ h(k)@(2t - k) and ~(t) = ~ g(k)@(2t - k) kE2 ke.z? (1) where h(k) and g(k) are the finite low-pass and high-pass impulse responses for the DWT, respectively, The representation of the DWT can be written as
where wl,~ are the wavelet coefficients and Ul,k, j < jo are the scaling coefficients. These coefficients are given by the inner product in .C2, i.e. Uj,k
=
and Uj,~ =< z(~),+j,~(f)>. (3)
Here +J,k(t) = 2–~/2#(2–~t – k) is a family of scalar functions and +J,~(t) = 2–~f2@(2–~t – k) family of wavelet function and with a right choices of these mother functions the family of them form an orthogonal basis for the signal space. The wavelet coefficients, wj~~,are then measure of the signal content around time 23k and scaleffrequency 2-~ $0 and the scaling coefficients, uj,~ represent the local mean around the time 2Jk. The DWT can be implemented by a tree filter banks. Each stage oft he tree structure consists then of low-pass, h(t), and high-pass filters, g(t) each followed by a down-sampling of 2. Every time down-sampling is performed, the signal length is reduced by 2. The scale propagation is obtained by the output from the low-pass branch goes through the same process of filtering and down-sampling. Thus the DWT has a natural interpretatio:min terms of tree structure (filter banks) in the time-scale/frequency domain. The wavelet coefficients wj,~ are the output of each of the high-pass branch of the tree structure and the scalar coefficients, U30,~, is the output of the low-pass branch of the last stage of the DWT. 2.2.
Table 1: Training and Test Samples for Information Classes in the Experiment on the Anderson River data.
Cluster-Based
Feature Extraction
In this section preprocessing routine that computes featurevectors to group the wavelet coefficients that are going to be used to train neural network classifiers. This method is an extension of a feature extraction method proposed in [3]. Assume that we have 1 “representative” signals, xi, of length 1 = 2n. The DWT is computed for all the 1 representative signals, Zt. The wavelet coefficients are represented by wj,~ and U30,~where j = 1, 2, . . . . j., respectively. Next the DWT coefficients are arranged into a mat rix B = [bj,k] in the following form: lwj,~l ~
bj,k =
forj=l,.,,, joandl~k~~ forj = 1, ...,~oand~O @(z) = O for x