comp.ai.neural-nets FAQ, Part 2 of 7: LearningSection - What does unsupervised learning learn?

Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Previous Document: What is GRNN?
Next Document: Help! My NN won't learn! What should I do?

See reader questions & answers on this topic! - Help others by sharing your knowledge


Unsupervised learning allegedly involves no target values. In fact, for most
varieties of unsupervised learning, the targets are the same as the inputs
(Sarle 1994). In other words, unsupervised learning usually performs the
same task as an auto-associative network, compressing the information from
the inputs (Deco and Obradovic 1996). Unsupervised learning is very useful
for data visualization (Ripley 1996), although the NN literature generally
ignores this application. 

Unsupervised competitive learning is used in a wide variety of fields under
a wide variety of names, the most common of which is "cluster analysis" (see
the Classification Society of North America's web site for more information
on cluster analysis, including software, at http://www.pitt.edu/~csna/.) The
main form of competitive learning in the NN literature is vector
quantization (VQ, also called a "Kohonen network", although Kohonen invented
several other types of networks as well--see "How many kinds of Kohonen
networks exist?" which provides more reference on VQ). Kosko (1992) and
Hecht-Nielsen (1990) review neural approaches to VQ, while the textbook by
Gersho and Gray (1992) covers the area from the perspective of signal
processing. In statistics, VQ has been called "principal point analysis"
(Flury, 1990, 1993; Tarpey et al., 1994) but is more frequently encountered
in the guise of k-means clustering. In VQ, each of the competitive units
corresponds to a cluster center (also called a codebook vector), and the
error function is the sum of squared Euclidean distances between each
training case and the nearest center. Often, each training case is
normalized to a Euclidean length of one, which allows distances to be
simplified to inner products. The more general error function based on
distances is the same error function used in k-means clustering, one of the
most common types of cluster analysis (Max 1960; MacQueen 1967; Anderberg
1973; Hartigan 1975; Hartigan and Wong 1979; Linde, Buzo, and Gray 1980;
Lloyd 1982). The k-means model is an approximation to the normal mixture
model (McLachlan and Basford 1988) assuming that the mixture components
(clusters) all have spherical covariance matrices and equal sampling
probabilities. Normal mixtures have found a variety of uses in neural
networks (e.g., Bishop 1995). Balakrishnan, Cooper, Jacob, and Lewis (1994)
found that k-means algorithms used as normal-mixture approximations recover
cluster membership more accurately than Kohonen algorithms. 

Hebbian learning is the other most common variety of unsupervised learning
(Hertz, Krogh, and Palmer 1991). Hebbian learning minimizes the same error
function as an auto-associative network with a linear hidden layer, trained
by least squares, and is therefore a form of dimensionality reduction. This
error function is equivalent to the sum of squared distances between each
training case and a linear subspace of the input space (with distances
measured perpendicularly), and is minimized by the leading principal
components (Pearson 1901; Hotelling 1933; Rao 1964; Joliffe 1986; Jackson
1991; Diamantaras and Kung 1996). There are variations of Hebbian learning
that explicitly produce the principal components (Hertz, Krogh, and Palmer
1991; Karhunen 1994; Deco and Obradovic 1996; Diamantaras and Kung 1996). 

Perhaps the most novel form of unsupervised learning in the NN literature is
Kohonen's self-organizing (feature) map (SOM, Kohonen 1995). SOMs combine
competitive learning with dimensionality reduction by smoothing the clusters
with respect to an a priori grid (see "How many kinds of Kohonen networks
exist?") for more explanation). But Kohonen's original SOM algorithm does
not optimize an "energy" function (Erwin et al., 1992; Kohonen 1995, pp.
126, 237). The SOM algorithm involves a trade-off between the accuracy of
the quantization and the smoothness of the topological mapping, but there is
no explicit combination of these two properties into an energy function.
Hence Kohonen's SOM is not simply an information-compression method like
most other unsupervised learning networks. Neither does Kohonen's SOM have a
clear interpretation as a density estimation method. Convergence of
Kohonen's SOM algorithm is allegedly demonstrated by Yin and Allinson
(1995), but their "proof" assumes the neighborhood size becomes zero, in
which case the algorithm reduces to VQ and no longer has topological
ordering properties (Kohonen 1995, p. 111). The best explanation of what a
Kohonen SOM learns seems to be provided by the connection between SOMs and
principal curves and surfaces explained by Mulier and Cherkassky (1995) and
Ritter, Martinetz, and Schulten (1992). For further explanation, see "How
many kinds of Kohonen networks exist?" 

A variety of energy functions for SOMs have been proposed (e.g., Luttrell,
1994), some of which show a connection between SOMs and multidimensional
scaling (Goodhill and Sejnowski 1997). There are also other approaches to
SOMs that have clearer theoretical justification using mixture models with
Bayesian priors or constraints (Utsugi, 1996, 1997; Bishop, Svensén, and
Williams, 1997). 

For additional references on cluster analysis, see 
ftp://ftp.sas.com/pub/neural/clus_bib.txt. 

References: 

   Anderberg, M.R. (1973), Cluster Analysis for Applications, New York:
   Academic Press, Inc. 

   Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994) "A
   study of the classification capabilities of neural networks using
   unsupervised learning: A comparison with k-means clustering",
   Psychometrika, 59, 509-525. 

   Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
   Oxford University Press. 

   Bishop, C.M., Svensén, M., and Williams, C.K.I (1997), "GTM: A principled
   alternative to the self-organizing map," in Mozer, M.C., Jordan, M.I.,
   and Petsche, T., (eds.) Advances in Neural Information Processing
   Systems 9, Cambrideg, MA: The MIT Press, pp. 354-360. Also see 
   http://www.ncrg.aston.ac.uk/GTM/ 

   Deco, G. and Obradovic, D. (1996), An Information-Theoretic Approach to
   Neural Computing, NY: Springer-Verlag. 

   Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural
   Networks: Theory and Applications, NY: Wiley. 

   Erwin, E., Obermayer, K., and Schulten, K. (1992), "Self-organizing maps:
   Ordering, convergence properties and energy functions," Biological
   Cybernetics, 67, 47-55. 

   Flury, B. (1990), "Principal points," Biometrika, 77, 33-41. 

   Flury, B. (1993), "Estimation of principal points," Applied Statistics,
   42, 139-151. 

   Gersho, A. and Gray, R.M. (1992), Vector Quantization and Signal
   Compression, Boston: Kluwer Academic Publishers. 

   Goodhill, G.J., and Sejnowski, T.J. (1997), "A unifying objective
   function for topographic mappings," Neural Computation, 9, 1291-1303. 

   Hartigan, J.A. (1975), Clustering Algorithms, NY: Wiley. 

   Hartigan, J.A., and Wong, M.A. (1979), "Algorithm AS136: A k-means
   clustering algorithm," Applied Statistics, 28-100-108. 

   Hecht-Nielsen, R. (1990), Neurocomputing, Reading, MA: Addison-Wesley. 

   Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of
   Neural Computation. Addison-Wesley: Redwood City, California. 

   Hotelling, H. (1933), "Analysis of a Complex of Statistical Variables
   into Principal Components," Journal of Educational Psychology, 24,
   417-441, 498-520. 

   Ismail, M.A., and Kamel, M.S. (1989), "Multidimensional data clustering
   utilizing hybrid search strategies," Pattern Recognition, 22, 75-89. 

   Jackson, J.E. (1991), A User's Guide to Principal Components, NY: Wiley.

   Jolliffe, I.T. (1986), Principal Component Analysis, Springer-Verlag. 

   Karhunen, J. (1994), "Stability of Oja's PCA subspace rule," Neural
   Computation, 6, 739-747. 

   Kohonen, T. (1995/1997), Self-Organizing Maps, Berlin: Springer-Verlag.

   Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs,
   N.J.: Prentice-Hall. 

   Linde, Y., Buzo, A., and Gray, R. (1980), "An algorithm for vector
   quantizer design," IEEE Transactions on Communications, 28, 84-95. 

   Lloyd, S. (1982), "Least squares quantization in PCM," IEEE Transactions
   on Information Theory, 28, 129-137. 

   Luttrell, S.P. (1994), "A Bayesian analysis of self-organizing maps,"
   Neural Computation, 6, 767-794. 

   McLachlan, G.J. and Basford, K.E. (1988), Mixture Models, NY: Marcel
   Dekker, Inc. 

   MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of
   Multivariate Observations,"Proceedings of the Fifth Berkeley Symposium on
   Mathematical Statistics and Probability, 1, 281-297. 

   Max, J. (1960), "Quantizing for minimum distortion," IEEE Transactions on
   Information Theory, 6, 7-12. 

   Mulier, F. and Cherkassky, V. (1995), "Self-Organization as an Iterative
   Kernel Smoothing Process," Neural Computation, 7, 1165-1177. 

   Pearson, K. (1901) "On Lines and Planes of Closest Fit to Systems of
   Points in Space," Phil. Mag., 2(6), 559-572. 

   Rao, C.R. (1964), "The Use and Interpretation of Principal Component
   Analysis in Applied Research," Sankya A, 26, 329-358. 

   Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge:
   Cambridge University Press. 

   Ritter, H., Martinetz, T., and Schulten, K. (1992), Neural Computation
   and Self-Organizing Maps: An Introduction, Reading, MA: Addison-Wesley. 

   Sarle, W.S. (1994), "Neural Networks and Statistical Models," in SAS
   Institute Inc., Proceedings of the Nineteenth Annual SAS Users Group
   International Conference, Cary, NC: SAS Institute Inc., pp 1538-1550, 
   ftp://ftp.sas.com/pub/neural/neural1.ps. 

   Tarpey, T., Luning, L>, and Flury, B. (1994), "Principal points and
   self-consistent points of elliptical distributions," Annals of
   Statistics, ?. 

   Utsugi, A. (1996), "Topology selection for self-organizing maps,"
   Network: Computation in Neural Systems, 7, 727-740, 
   http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html 

   Utsugi, A. (1997), "Hyperparameter selection for self-organizing maps,"
   Neural Computation, 9, 623-635, available on-line at 
   http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html 

   Yin, H. and Allinson, N.M. (1995), "On the Distribution and Convergence
   of Feature Space in Self-Organizing Maps," Neural Computation, 7,
   1178-1187. 

   Zeger, K., Vaisey, J., and Gersho, A. (1992), "Globally optimal vector
   quantizer design by stochastic relaxation," IEEE Transactions on Signal
   Procesing, 40, 310-322.

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

Archived related questions and answers

Top Document: comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Previous Document: What is GRNN?
Next Document: Help! My NN won't learn! What should I do?

Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
saswss@unx.sas.com (Warren Sarle)

Last Update March 27 2014 @ 02:11 PM

comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Section - What does unsupervised learning learn?

Search the FAQ Archives

comp.ai.neural-nets FAQ, Part 2 of 7: Learning
Section - What does unsupervised learning learn?

User Contributions:

Comment about this article, ask questions, or add new information about this topic: