comp.ai.neural-nets FAQ, Part 1 of 7: IntroductionSection - What can you do with an NN and what not?

Top Document: comp.ai.neural-nets FAQ, Part 1 of 7: Introduction
Previous Document: Are there any online books about NNs?
Next Document: Who is concerned with NNs?

See reader questions & answers on this topic! - Help others by sharing your knowledge


In principle, NNs can compute any computable function, i.e., they can do
everything a normal digital computer can do (Valiant, 1988; Siegelmann and
Sontag, 1999; Orponen, 2000; Sima and Orponen, 2001), or perhaps even more,
under some assumptions of doubtful practicality (see Siegelmann, 1998, but
also Hadley, 1999). 

Practical applications of NNs most often employ supervised learning. For
supervised learning, you must provide training data that includes both the
input and the desired result (the target value). After successful training,
you can present input data alone to the NN (that is, input data without the
desired result), and the NN will compute an output value that approximates
the desired result. However, for training to be successful, you may need
lots of training data and lots of computer time to do the training. In many
applications, such as image and text processing, you will have to do a lot
of work to select appropriate input data and to code the data as numeric
values. 

In practice, NNs are especially useful for classification and function
approximation/mapping problems which are tolerant of some imprecision, which
have lots of training data available, but to which hard and fast rules (such
as those that might be used in an expert system) cannot easily be applied.
Almost any finite-dimensional vector function on a compact set can be
approximated to arbitrary precision by feedforward NNs (which are the type
most often used in practical applications) if you have enough data and
enough computing resources. 

To be somewhat more precise, feedforward networks with a single hidden layer
and trained by least-squares are statistically consistent estimators of
arbitrary square-integrable regression functions under certain
practically-satisfiable assumptions regarding sampling, target noise, number
of hidden units, size of weights, and form of hidden-unit activation
function (White, 1990). Such networks can also be trained as statistically
consistent estimators of derivatives of regression functions (White and
Gallant, 1992) and quantiles of the conditional noise distribution (White,
1992a). Feedforward networks with a single hidden layer using threshold or
sigmoid activation functions are universally consistent estimators of binary
classifications (Faragó and Lugosi, 1993; Lugosi and Zeger 1995; Devroye,
Györfi, and Lugosi, 1996) under similar assumptions. Note that these results
are stronger than the universal approximation theorems that merely show the
existence of weights for arbitrarily accurate approximations, without
demonstrating that such weights can be obtained by learning. 

Unfortunately, the above consistency results depend on one impractical
assumption: that the networks are trained by an error (L_p error or
misclassification rate) minimization technique that comes arbitrarily close
to the global minimum. Such minimization is computationally intractable
except in small or simple problems (Blum and Rivest, 1989; Judd, 1990). In
practice, however, you can usually get good results without doing a
full-blown global optimization; e.g., using multiple (say, 10 to 1000)
random weight initializations is usually sufficient. 

One example of a function that a typical neural net cannot learn is Y=1/X
on the open interval (0,1). An open interval is not a compact set. With any
bounded output activation function, the error will get arbitrarily large as
the input approaches zero. Of course, you could make the output activation
function a reciprocal function and easily get a perfect fit, but neural
networks are most often used in situations where you do not have enough
prior knowledge to set the activation function in such a clever way. There
are also many other important problems that are so difficult that a neural
network will be unable to learn them without memorizing the entire training
set, such as: 

 o Predicting random or pseudo-random numbers. 
 o Factoring large integers. 
 o Determing whether a large integer is prime or composite. 
 o Decrypting anything encrypted by a good algorithm. 

And it is important to understand that there are no methods for training NNs
that can magically create information that is not contained in the training
data. 

Feedforward NNs are restricted to finite-dimensional input and output
spaces. Recurrent NNs can in theory process arbitrarily long strings of
numbers or symbols. But training recurrent NNs has posed much more serious
practical difficulties than training feedforward networks. NNs are, at least
today, difficult to apply successfully to problems that concern manipulation
of symbols and rules, but much research is being done. 

There have been attempts to pack recursive structures into
finite-dimensional real vectors (Blair, 1997; Pollack, 1990; Chalmers, 1990;
Chrisman, 1991; Plate, 1994; Hammerton, 1998). Obviously, finite precision
limits how far the recursion can go (Hadley, 1999). The practicality of such
methods is open to debate. 

As for simulating human consciousness and emotion, that's still in the realm
of science fiction. Consciousness is still one of the world's great
mysteries. Artificial NNs may be useful for modeling some aspects of or
prerequisites for consciousness, such as perception and cognition, but ANNs
provide no insight so far into what Chalmers (1996, p. xi) calls the "hard
problem": 

   Many books and articles on consciousness have appeared in the past
   few years, and one might think we are making progress. But on a
   closer look, most of this work leaves the hardest problems about
   consciousness untouched. Often, such work addresses what might be
   called the "easy problems" of consciousness: How does the brain
   process environmental stimulation? How does it integrate information?
   How do we produce reports on internal states? These are important
   questions, but to answer them is not to solve the hard problem: Why
   is all this processing accompanied by an experienced inner life? 

For more information on consciousness, see the on-line journal Psyche at 
http://psyche.cs.monash.edu.au/index.html. 

For examples of specific applications of NNs, see What are some applications
of NNs? 

References: 

   Blair, A.D. (1997), "Scaling Up RAAMs," Brandeis University Computer
   Science Technical Report CS-97-192, 
   http://www.demo.cs.brandeis.edu/papers/long.html#sur97 

   Blum, A., and Rivest, R.L. (1989), "Training a 3-node neural network is
   NP-complete," in Touretzky, D.S. (ed.), Advances in Neural Information
   Processing Systems 1, San Mateo, CA: Morgan Kaufmann, 494-501. 

   Chalmers, D.J. (1990), "Syntactic Transformations on Distributed
   Representations," Connection Science, 2, 53-62, 
   http://ling.ucsc.edu/~chalmers/papers/transformations.ps 

   Chalmers, D.J. (1996), The Conscious Mind: In Search of a Fundamental
   Theory, NY: Oxford University Press. 

   Chrisman, L. (1991), "Learning Recursive Distributed Representations for
   Holistic Computation", Connection Science, 3, 345-366, 
   ftp://reports.adm.cs.cmu.edu/usr/anon/1991/ 

   Collier, R. (1994), "An historical overview of natural language
   processing systems that learn," Artificial Intelligence Review, 8(1),
   ??-??. 

   Devroye, L., Györfi, L., and Lugosi, G. (1996), A Probabilistic Theory of
   Pattern Recognition, NY: Springer. 

   Faragó, A. and Lugosi, G. (1993), "Strong Universal Consistency of Neural
   Network Classifiers," IEEE Transactions on Information Theory, 39,
   1146-1151. 

   Hadley, R.F. (1999), "Cognition and the computational power of
   connectionist networks," http://www.cs.sfu.ca/~hadley/online.html 

   Hammerton, J.A. (1998), "Holistic Computation: Reconstructing a muddled
   concept," Connection Science, 10, 3-19, 
   http://www.tardis.ed.ac.uk/~james/CNLP/holcomp.ps.gz 

   Judd, J.S. (1990), Neural Network Design and the Complexity of
   Learning, Cambridge, MA: The MIT Press. 

   Lugosi, G., and Zeger, K. (1995), "Nonparametric Estimation via Empirical
   Risk Minimization," IEEE Transactions on Information Theory, 41, 677-678.

   Orponen, P. (2000), "An overview of the computational power of recurrent
   neural networks," Finnish AI Conference, Helsinki, 
   http://www.math.jyu.fi/~orponen/papers/rnncomp.ps 

   Plate, T.A. (1994), Distributed Representations and Nested
   Compositional Structure, Ph.D. Thesis, University of Toronto, 
   ftp://ftp.cs.utoronto.ca/pub/tap/ 

   Pollack, J. B. (1990), "Recursive Distributed Representations,"
   Artificial Intelligence 46, 1, 77-105, 
   http://www.demo.cs.brandeis.edu/papers/long.html#raam 

   Siegelmann, H.T. (1998), Neural Networks and Analog Computation:
   Beyond the Turing Limit, Boston: Birkhauser, ISBN 0-8176-3949-7, 
   http://iew3.technion.ac.il:8080/Home/Users/iehava/book/ 

   Siegelmann, H.T., and Sontag, E.D. (1999), "Turing Computability with
   Neural Networks," Applied Mathematics Letters, 4, 77-80. 

   Sima, J., and Orponen, P. (2001), "Computing with continuous-time
   Liapunov systems," 33rd ACM STOC, 
   http://www.math.jyu.fi/~orponen/papers/liapcomp.ps 

   Valiant, L. (1988), "Functionality in Neural Nets," Learning and
   Knowledge Acquisition, Proc. AAAI, 629-634. 

   White, H. (1990), "Connectionist Nonparametric Regression: Multilayer
   Feedforward Networks Can Learn Arbitrary Mappings," Neural Networks, 3,
   535-550. Reprinted in White (1992b). 

   White, H. (1992a), "Nonparametric Estimation of Conditional Quantiles
   Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings
   of the 23rd Sympsium on the Interface: Computing Science and Statistics,
   Alexandria, VA: American Statistical Association, pp. 190-199. Reprinted
   in White (1992b). 

   White, H. (1992b), Artificial Neural Networks: Approximation and
   Learning Theory, Blackwell. 

   White, H., and Gallant, A.R. (1992), "On Learning the Derivatives of an
   Unknown Mapping with Multilayer Feedforward Networks," Neural Networks,
   5, 129-138. Reprinted in White (1992b).

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

Archived related questions and answers

Top Document: comp.ai.neural-nets FAQ, Part 1 of 7: Introduction
Previous Document: Are there any online books about NNs?
Next Document: Who is concerned with NNs?

Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
saswss@unx.sas.com (Warren Sarle)

Last Update March 27 2014 @ 02:11 PM

comp.ai.neural-nets FAQ, Part 1 of 7: Introduction
Section - What can you do with an NN and what not?

Search the FAQ Archives

comp.ai.neural-nets FAQ, Part 1 of 7: Introduction
Section - What can you do with an NN and what not?

User Contributions:

Comment about this article, ask questions, or add new information about this topic: