Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning Previous Document: What is GRNN? Next Document: Help! My NN won't learn! What should I do? See reader questions & answers on this topic! - Help others by sharing your knowledge Unsupervised learning allegedly involves no target values. In fact, for most varieties of unsupervised learning, the targets are the same as the inputs (Sarle 1994). In other words, unsupervised learning usually performs the same task as an auto-associative network, compressing the information from the inputs (Deco and Obradovic 1996). Unsupervised learning is very useful for data visualization (Ripley 1996), although the NN literature generally ignores this application. Unsupervised competitive learning is used in a wide variety of fields under a wide variety of names, the most common of which is "cluster analysis" (see the Classification Society of North America's web site for more information on cluster analysis, including software, at http://www.pitt.edu/~csna/.) The main form of competitive learning in the NN literature is vector quantization (VQ, also called a "Kohonen network", although Kohonen invented several other types of networks as well--see "How many kinds of Kohonen networks exist?" which provides more reference on VQ). Kosko (1992) and Hecht-Nielsen (1990) review neural approaches to VQ, while the textbook by Gersho and Gray (1992) covers the area from the perspective of signal processing. In statistics, VQ has been called "principal point analysis" (Flury, 1990, 1993; Tarpey et al., 1994) but is more frequently encountered in the guise of k-means clustering. In VQ, each of the competitive units corresponds to a cluster center (also called a codebook vector), and the error function is the sum of squared Euclidean distances between each training case and the nearest center. Often, each training case is normalized to a Euclidean length of one, which allows distances to be simplified to inner products. The more general error function based on distances is the same error function used in k-means clustering, one of the most common types of cluster analysis (Max 1960; MacQueen 1967; Anderberg 1973; Hartigan 1975; Hartigan and Wong 1979; Linde, Buzo, and Gray 1980; Lloyd 1982). The k-means model is an approximation to the normal mixture model (McLachlan and Basford 1988) assuming that the mixture components (clusters) all have spherical covariance matrices and equal sampling probabilities. Normal mixtures have found a variety of uses in neural networks (e.g., Bishop 1995). Balakrishnan, Cooper, Jacob, and Lewis (1994) found that k-means algorithms used as normal-mixture approximations recover cluster membership more accurately than Kohonen algorithms. Hebbian learning is the other most common variety of unsupervised learning (Hertz, Krogh, and Palmer 1991). Hebbian learning minimizes the same error function as an auto-associative network with a linear hidden layer, trained by least squares, and is therefore a form of dimensionality reduction. This error function is equivalent to the sum of squared distances between each training case and a linear subspace of the input space (with distances measured perpendicularly), and is minimized by the leading principal components (Pearson 1901; Hotelling 1933; Rao 1964; Joliffe 1986; Jackson 1991; Diamantaras and Kung 1996). There are variations of Hebbian learning that explicitly produce the principal components (Hertz, Krogh, and Palmer 1991; Karhunen 1994; Deco and Obradovic 1996; Diamantaras and Kung 1996). Perhaps the most novel form of unsupervised learning in the NN literature is Kohonen's self-organizing (feature) map (SOM, Kohonen 1995). SOMs combine competitive learning with dimensionality reduction by smoothing the clusters with respect to an a priori grid (see "How many kinds of Kohonen networks exist?") for more explanation). But Kohonen's original SOM algorithm does not optimize an "energy" function (Erwin et al., 1992; Kohonen 1995, pp. 126, 237). The SOM algorithm involves a trade-off between the accuracy of the quantization and the smoothness of the topological mapping, but there is no explicit combination of these two properties into an energy function. Hence Kohonen's SOM is not simply an information-compression method like most other unsupervised learning networks. Neither does Kohonen's SOM have a clear interpretation as a density estimation method. Convergence of Kohonen's SOM algorithm is allegedly demonstrated by Yin and Allinson (1995), but their "proof" assumes the neighborhood size becomes zero, in which case the algorithm reduces to VQ and no longer has topological ordering properties (Kohonen 1995, p. 111). The best explanation of what a Kohonen SOM learns seems to be provided by the connection between SOMs and principal curves and surfaces explained by Mulier and Cherkassky (1995) and Ritter, Martinetz, and Schulten (1992). For further explanation, see "How many kinds of Kohonen networks exist?" A variety of energy functions for SOMs have been proposed (e.g., Luttrell, 1994), some of which show a connection between SOMs and multidimensional scaling (Goodhill and Sejnowski 1997). There are also other approaches to SOMs that have clearer theoretical justification using mixture models with Bayesian priors or constraints (Utsugi, 1996, 1997; Bishop, Svensén, and Williams, 1997). For additional references on cluster analysis, see ftp://ftp.sas.com/pub/neural/clus_bib.txt. References: Anderberg, M.R. (1973), Cluster Analysis for Applications, New York: Academic Press, Inc. Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994) "A study of the classification capabilities of neural networks using unsupervised learning: A comparison with k-means clustering", Psychometrika, 59, 509-525. Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press. Bishop, C.M., Svensén, M., and Williams, C.K.I (1997), "GTM: A principled alternative to the self-organizing map," in Mozer, M.C., Jordan, M.I., and Petsche, T., (eds.) Advances in Neural Information Processing Systems 9, Cambrideg, MA: The MIT Press, pp. 354-360. Also see http://www.ncrg.aston.ac.uk/GTM/ Deco, G. and Obradovic, D. (1996), An Information-Theoretic Approach to Neural Computing, NY: Springer-Verlag. Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley. Erwin, E., Obermayer, K., and Schulten, K. (1992), "Self-organizing maps: Ordering, convergence properties and energy functions," Biological Cybernetics, 67, 47-55. Flury, B. (1990), "Principal points," Biometrika, 77, 33-41. Flury, B. (1993), "Estimation of principal points," Applied Statistics, 42, 139-151. Gersho, A. and Gray, R.M. (1992), Vector Quantization and Signal Compression, Boston: Kluwer Academic Publishers. Goodhill, G.J., and Sejnowski, T.J. (1997), "A unifying objective function for topographic mappings," Neural Computation, 9, 1291-1303. Hartigan, J.A. (1975), Clustering Algorithms, NY: Wiley. Hartigan, J.A., and Wong, M.A. (1979), "Algorithm AS136: A k-means clustering algorithm," Applied Statistics, 28-100-108. Hecht-Nielsen, R. (1990), Neurocomputing, Reading, MA: Addison-Wesley. Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley: Redwood City, California. Hotelling, H. (1933), "Analysis of a Complex of Statistical Variables into Principal Components," Journal of Educational Psychology, 24, 417-441, 498-520. Ismail, M.A., and Kamel, M.S. (1989), "Multidimensional data clustering utilizing hybrid search strategies," Pattern Recognition, 22, 75-89. Jackson, J.E. (1991), A User's Guide to Principal Components, NY: Wiley. Jolliffe, I.T. (1986), Principal Component Analysis, Springer-Verlag. Karhunen, J. (1994), "Stability of Oja's PCA subspace rule," Neural Computation, 6, 739-747. Kohonen, T. (1995/1997), Self-Organizing Maps, Berlin: Springer-Verlag. Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: Prentice-Hall. Linde, Y., Buzo, A., and Gray, R. (1980), "An algorithm for vector quantizer design," IEEE Transactions on Communications, 28, 84-95. Lloyd, S. (1982), "Least squares quantization in PCM," IEEE Transactions on Information Theory, 28, 129-137. Luttrell, S.P. (1994), "A Bayesian analysis of self-organizing maps," Neural Computation, 6, 767-794. McLachlan, G.J. and Basford, K.E. (1988), Mixture Models, NY: Marcel Dekker, Inc. MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of Multivariate Observations,"Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297. Max, J. (1960), "Quantizing for minimum distortion," IEEE Transactions on Information Theory, 6, 7-12. Mulier, F. and Cherkassky, V. (1995), "Self-Organization as an Iterative Kernel Smoothing Process," Neural Computation, 7, 1165-1177. Pearson, K. (1901) "On Lines and Planes of Closest Fit to Systems of Points in Space," Phil. Mag., 2(6), 559-572. Rao, C.R. (1964), "The Use and Interpretation of Principal Component Analysis in Applied Research," Sankya A, 26, 329-358. Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press. Ritter, H., Martinetz, T., and Schulten, K. (1992), Neural Computation and Self-Organizing Maps: An Introduction, Reading, MA: Addison-Wesley. Sarle, W.S. (1994), "Neural Networks and Statistical Models," in SAS Institute Inc., Proceedings of the Nineteenth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc., pp 1538-1550, ftp://ftp.sas.com/pub/neural/neural1.ps. Tarpey, T., Luning, L>, and Flury, B. (1994), "Principal points and self-consistent points of elliptical distributions," Annals of Statistics, ?. Utsugi, A. (1996), "Topology selection for self-organizing maps," Network: Computation in Neural Systems, 7, 727-740, http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html Utsugi, A. (1997), "Hyperparameter selection for self-organizing maps," Neural Computation, 9, 623-635, available on-line at http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html Yin, H. and Allinson, N.M. (1995), "On the Distribution and Convergence of Feature Space in Self-Organizing Maps," Neural Computation, 7, 1178-1187. Zeger, K., Vaisey, J., and Gersho, A. (1992), "Globally optimal vector quantizer design by stochastic relaxation," IEEE Transactions on Signal Procesing, 40, 310-322. User Contributions:Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning Previous Document: What is GRNN? Next Document: Help! My NN won't learn! What should I do? Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: