Top Document: comp.ai.neural-nets FAQ, Part 1 of 7: Introduction Previous Document: Are there any online books about NNs? Next Document: Who is concerned with NNs? See reader questions & answers on this topic! - Help others by sharing your knowledge In principle, NNs can compute any computable function, i.e., they can do everything a normal digital computer can do (Valiant, 1988; Siegelmann and Sontag, 1999; Orponen, 2000; Sima and Orponen, 2001), or perhaps even more, under some assumptions of doubtful practicality (see Siegelmann, 1998, but also Hadley, 1999). Practical applications of NNs most often employ supervised learning. For supervised learning, you must provide training data that includes both the input and the desired result (the target value). After successful training, you can present input data alone to the NN (that is, input data without the desired result), and the NN will compute an output value that approximates the desired result. However, for training to be successful, you may need lots of training data and lots of computer time to do the training. In many applications, such as image and text processing, you will have to do a lot of work to select appropriate input data and to code the data as numeric values. In practice, NNs are especially useful for classification and function approximation/mapping problems which are tolerant of some imprecision, which have lots of training data available, but to which hard and fast rules (such as those that might be used in an expert system) cannot easily be applied. Almost any finite-dimensional vector function on a compact set can be approximated to arbitrary precision by feedforward NNs (which are the type most often used in practical applications) if you have enough data and enough computing resources. To be somewhat more precise, feedforward networks with a single hidden layer and trained by least-squares are statistically consistent estimators of arbitrary square-integrable regression functions under certain practically-satisfiable assumptions regarding sampling, target noise, number of hidden units, size of weights, and form of hidden-unit activation function (White, 1990). Such networks can also be trained as statistically consistent estimators of derivatives of regression functions (White and Gallant, 1992) and quantiles of the conditional noise distribution (White, 1992a). Feedforward networks with a single hidden layer using threshold or sigmoid activation functions are universally consistent estimators of binary classifications (Farag and Lugosi, 1993; Lugosi and Zeger 1995; Devroye, Gyrfi, and Lugosi, 1996) under similar assumptions. Note that these results are stronger than the universal approximation theorems that merely show the existence of weights for arbitrarily accurate approximations, without demonstrating that such weights can be obtained by learning. Unfortunately, the above consistency results depend on one impractical assumption: that the networks are trained by an error (L_p error or misclassification rate) minimization technique that comes arbitrarily close to the global minimum. Such minimization is computationally intractable except in small or simple problems (Blum and Rivest, 1989; Judd, 1990). In practice, however, you can usually get good results without doing a full-blown global optimization; e.g., using multiple (say, 10 to 1000) random weight initializations is usually sufficient. One example of a function that a typical neural net cannot learn is Y=1/X on the open interval (0,1). An open interval is not a compact set. With any bounded output activation function, the error will get arbitrarily large as the input approaches zero. Of course, you could make the output activation function a reciprocal function and easily get a perfect fit, but neural networks are most often used in situations where you do not have enough prior knowledge to set the activation function in such a clever way. There are also many other important problems that are so difficult that a neural network will be unable to learn them without memorizing the entire training set, such as: o Predicting random or pseudo-random numbers. o Factoring large integers. o Determing whether a large integer is prime or composite. o Decrypting anything encrypted by a good algorithm. And it is important to understand that there are no methods for training NNs that can magically create information that is not contained in the training data. Feedforward NNs are restricted to finite-dimensional input and output spaces. Recurrent NNs can in theory process arbitrarily long strings of numbers or symbols. But training recurrent NNs has posed much more serious practical difficulties than training feedforward networks. NNs are, at least today, difficult to apply successfully to problems that concern manipulation of symbols and rules, but much research is being done. There have been attempts to pack recursive structures into finite-dimensional real vectors (Blair, 1997; Pollack, 1990; Chalmers, 1990; Chrisman, 1991; Plate, 1994; Hammerton, 1998). Obviously, finite precision limits how far the recursion can go (Hadley, 1999). The practicality of such methods is open to debate. As for simulating human consciousness and emotion, that's still in the realm of science fiction. Consciousness is still one of the world's great mysteries. Artificial NNs may be useful for modeling some aspects of or prerequisites for consciousness, such as perception and cognition, but ANNs provide no insight so far into what Chalmers (1996, p. xi) calls the "hard problem": Many books and articles on consciousness have appeared in the past few years, and one might think we are making progress. But on a closer look, most of this work leaves the hardest problems about consciousness untouched. Often, such work addresses what might be called the "easy problems" of consciousness: How does the brain process environmental stimulation? How does it integrate information? How do we produce reports on internal states? These are important questions, but to answer them is not to solve the hard problem: Why is all this processing accompanied by an experienced inner life? For more information on consciousness, see the on-line journal Psyche at http://psyche.cs.monash.edu.au/index.html. For examples of specific applications of NNs, see What are some applications of NNs? References: Blair, A.D. (1997), "Scaling Up RAAMs," Brandeis University Computer Science Technical Report CS-97-192, http://www.demo.cs.brandeis.edu/papers/long.html#sur97 Blum, A., and Rivest, R.L. (1989), "Training a 3-node neural network is NP-complete," in Touretzky, D.S. (ed.), Advances in Neural Information Processing Systems 1, San Mateo, CA: Morgan Kaufmann, 494-501. Chalmers, D.J. (1990), "Syntactic Transformations on Distributed Representations," Connection Science, 2, 53-62, http://ling.ucsc.edu/~chalmers/papers/transformations.ps Chalmers, D.J. (1996), The Conscious Mind: In Search of a Fundamental Theory, NY: Oxford University Press. Chrisman, L. (1991), "Learning Recursive Distributed Representations for Holistic Computation", Connection Science, 3, 345-366, ftp://reports.adm.cs.cmu.edu/usr/anon/1991/ Collier, R. (1994), "An historical overview of natural language processing systems that learn," Artificial Intelligence Review, 8(1), ??-??. Devroye, L., Gyrfi, L., and Lugosi, G. (1996), A Probabilistic Theory of Pattern Recognition, NY: Springer. Farag, A. and Lugosi, G. (1993), "Strong Universal Consistency of Neural Network Classifiers," IEEE Transactions on Information Theory, 39, 1146-1151. Hadley, R.F. (1999), "Cognition and the computational power of connectionist networks," http://www.cs.sfu.ca/~hadley/online.html Hammerton, J.A. (1998), "Holistic Computation: Reconstructing a muddled concept," Connection Science, 10, 3-19, http://www.tardis.ed.ac.uk/~james/CNLP/holcomp.ps.gz Judd, J.S. (1990), Neural Network Design and the Complexity of Learning, Cambridge, MA: The MIT Press. Lugosi, G., and Zeger, K. (1995), "Nonparametric Estimation via Empirical Risk Minimization," IEEE Transactions on Information Theory, 41, 677-678. Orponen, P. (2000), "An overview of the computational power of recurrent neural networks," Finnish AI Conference, Helsinki, http://www.math.jyu.fi/~orponen/papers/rnncomp.ps Plate, T.A. (1994), Distributed Representations and Nested Compositional Structure, Ph.D. Thesis, University of Toronto, ftp://ftp.cs.utoronto.ca/pub/tap/ Pollack, J. B. (1990), "Recursive Distributed Representations," Artificial Intelligence 46, 1, 77-105, http://www.demo.cs.brandeis.edu/papers/long.html#raam Siegelmann, H.T. (1998), Neural Networks and Analog Computation: Beyond the Turing Limit, Boston: Birkhauser, ISBN 0-8176-3949-7, http://iew3.technion.ac.il:8080/Home/Users/iehava/book/ Siegelmann, H.T., and Sontag, E.D. (1999), "Turing Computability with Neural Networks," Applied Mathematics Letters, 4, 77-80. Sima, J., and Orponen, P. (2001), "Computing with continuous-time Liapunov systems," 33rd ACM STOC, http://www.math.jyu.fi/~orponen/papers/liapcomp.ps Valiant, L. (1988), "Functionality in Neural Nets," Learning and Knowledge Acquisition, Proc. AAAI, 629-634. White, H. (1990), "Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings," Neural Networks, 3, 535-550. Reprinted in White (1992b). White, H. (1992a), "Nonparametric Estimation of Conditional Quantiles Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings of the 23rd Sympsium on the Interface: Computing Science and Statistics, Alexandria, VA: American Statistical Association, pp. 190-199. Reprinted in White (1992b). White, H. (1992b), Artificial Neural Networks: Approximation and Learning Theory, Blackwell. White, H., and Gallant, A.R. (1992), "On Learning the Derivatives of an Unknown Mapping with Multilayer Feedforward Networks," Neural Networks, 5, 129-138. Reprinted in White (1992b). User Contributions:Top Document: comp.ai.neural-nets FAQ, Part 1 of 7: Introduction Previous Document: Are there any online books about NNs? Next Document: Who is concerned with NNs? Part1 - Part2 - Part3 - Part4 - Part5 - Part6 - Part7 - Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: saswss@unx.sas.com (Warren Sarle)
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: