## Help with Neural Networks

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Moderators General, Prelates

### Help with Neural Networks

ohaiplzhelpzkthx

Could anyone please give me a hand with understanding Neural Networks? I'm supposed to code some stuff that uses them for my final exams, but so far I haven't been able to understand them clearly. I know what they are (or at least I think I do), I know what they're said to do (pattern recognition), and I can write them down to code. I've even seen some examples that seem to work, I just can't understand HOW the hell they work. I mean, they're freaky - you start changing the weight values and suddently your network is able to tell apart a '4' and a '5', what the f*ck?

If anyone has a basic, simple explanation of these that helps me understand the concept, please? Anything will do, a book, a webpage, a quote...
(and I'd be very thankful if the code could be in Java or plain english).
[-]>+++++++++[<+++++++++++>-]<.+++++.---..>+++[<+++++++>-]<.[-]>+++++++++[<+++++++++++>-]<-.>++++[<+++++>-]<-.---.-----------.--.+++++++++++++.[-]>++++++[<++++++++++>-]<+++.
dam 255 char limit

gabrielkfl

Posts: 7
Joined: Sat Oct 10, 2009 7:02 pm UTC

### Re: Help with Neural Networks

skelterjohn

Posts: 32
Joined: Mon Oct 01, 2007 4:17 am UTC

### Re: Help with Neural Networks

basically, the weights of a neuron forms a line (or hyperplane) through your input space, and the activation function tells you which side of the line you're on (and how far from the line, if you're using a sigmoid function).

if you have a bunch of neurons organized in layers, the first layer chops up your input space into sections and "transforms" your input into a new space, then each layer transforms that space into another one, etc.
ONE PART CLASS, ONE PART WHISKEY, TWO PARTS GUN! SERVE NEAT!

necroforest

Posts: 194
Joined: Tue Apr 10, 2007 3:46 pm UTC

### Re: Help with Neural Networks

necroforest wrote:basically, the weights of a neuron forms a line (or hyperplane) through your input space, and the activation function tells you which side of the line you're on (and how far from the line, if you're using a sigmoid function).

if you have a bunch of neurons organized in layers, the first layer chops up your input space into sections and "transforms" your input into a new space, then each layer transforms that space into another one, etc.

This explains the part "how do feed-forward neural networks interpret patterns". To put it just slightly more precise:
For a start, you must realise that nodes in connectionist networks (this name is somewhat more correct than "neural networks", but it basically means the same) basically only say "yes" or "no". Like "the pixel is on" or "the pixel is off". Often it is allowed to say how much "yes" or "no", like with a sigmoid activation function, but still it's basically just "yes" or "no". So when "the activation function tells you which side of the line you're on", as necroforest put it, the node in question is really telling you whether the input is on the "positive side" of the (hyper-) line, yes or no.

Suppose you have a set of input nodes that all link to one output node, with no layers in between. We call such a network a perceptron. A perceptron can tell (through the output value of the output node) how far a given input vector is from the linear separator, just as necroforest said. A different way to say this, is that perceptrons can (always and only) represent functions that are linearly separable.
Functions like "left or right", "majority", "AND" and "inclusive OR" are linearly separable. "XOR" is not linearly separable, so it cannot be represented by a perceptron.

Now, take a feed-forward network with any number of layers and any number of nodes in each of the layers. You could consider each node in the network that is not in the input layer, together with all nodes from the previous layer (= all nodes that link to it), as a perceptron. So for example if we have 6 nodes in the input layer and 3 nodes in the next layer, then we have 3 perceptrons that each process information from the 6 input nodes. Each perceptron can represent a different linearly separable function (and that's most probably the case, because otherwise you could do with less perceptrons). So we are able to cut up our input space into 8 sections (why?) and determine in which one our input vector resides.
We repeat this pattern with our 3 intermediate nodes and the second next layer. The output nodes of our old perceptrons have now become variables in a new input space. So the new perceptrons are functions over a space of partition locations in the original input space. This is rather abstract, but fortunately you don't really need to grasp this because some smart people already figured out the consequence of it quite long ago, also for networks with even more layers:

While perceptrons can only represent linearly separable functions, feed-forward networks with a hidden layer can represent any continuous function, and networks with two hidden layers can even represent any discontinuous function.
(EDIT: the part about networks with two hidden layers and discontinuous functions turned out to be incorrect. See my later post below for details.)

However, maybe this wasn't exactly the question the OP meant to ask.
gabrielkfl wrote:I've even seen some examples that seem to work, I just can't understand HOW the hell they work. I mean, they're freaky - you start changing the weight values and suddently your network is able to tell apart a '4' and a '5', what the f*ck?

I think this boils down to the question "how is it possible that we can train a network to represent some function when starting out with a completely random weight pattern?". The answer is this: when we try an input on a network, we can see how far the output is from what it should have been. We adjust the weights slightly in a direction that would yield a better output. When we repeat this very often with many different examples, we will eventually have nudged the network enough to more or less do what we want.

Again, the idea of a perceptron can help to understand this procedure. The weights adjusting procedure of backward-propagation really works on a per-perceptron base, from the output side of the network towards the input side. In a single perceptron, it is easy to tell how the weights can be adjusted to better reflect the function that you target. From the old weights in your perceptron, you can see how "wrong" the outputs from the perceptrons of the previous layer were, and you adjust those proportionally to their amount of "wrongness".

You should look again at the code that you already know, and now see it in the light of perceptrons. I think it should help!
Last edited by Jplus on Tue Feb 21, 2012 12:09 am UTC, edited 1 time in total.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1301
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Help with Neural Networks

skelterjohn wrote:http://lmgtfy.com/?q=neural+networks

You thought I didn't try that already? I even got a couple books, but no good.

JPlus wrote:I think this boils down to the question "how is it possible that we can train a network to represent some function when starting out with a completely random weight pattern?". The answer is this: when we try an input on a network, we can see how far the output is from what it should have been. We adjust the weights slightly in a direction that would yield a better output. When we repeat this very often with many different examples, we will eventually have nudged the network enough to more or less do what we want.

Yes, that was my question. I've seen definitions similar to this before, the point was just "why it does what it does". How is it possible for a neural network to identify a pattern based on numeric (weight) values? What is that "ideal number of neurons" and how do I figure it? Why does each neuron have a different weight if all get the same inputs? Things like that. I don't need the code, I want to understand the logic that makes the whole thing work.

But that was kind of helpful either way. I think I'm starting to see it a bit more clearly now. Thanks
[-]>+++++++++[<+++++++++++>-]<.+++++.---..>+++[<+++++++>-]<.[-]>+++++++++[<+++++++++++>-]<-.>++++[<+++++>-]<-.---.-----------.--.+++++++++++++.[-]>++++++[<++++++++++>-]<+++.
dam 255 char limit

gabrielkfl

Posts: 7
Joined: Sat Oct 10, 2009 7:02 pm UTC

### Re: Help with Neural Networks

gabrielkfl wrote:How is it possible for a neural network to identify a pattern based on numeric (weight) values?

If the weights are mostly zero, with some large positive and negative values thrown in, they effectively will encode a boolean logic circuit. Making them fractional allows learning by gradually adjusting them and tuning to the most typical input while responding in mostly the same way to small deviations.
samk

Posts: 54
Joined: Mon Feb 09, 2009 12:33 pm UTC

### Re: Help with Neural Networks

samk wrote:
gabrielkfl wrote:How is it possible for a neural network to identify a pattern based on numeric (weight) values?

If the weights are mostly zero, with some large positive and negative values thrown in, they effectively will encode a boolean logic circuit. Making them fractional allows learning by gradually adjusting them and tuning to the most typical input while responding in mostly the same way to small deviations.

I got that already. I know how they work. I just don't know how come they work.
[-]>+++++++++[<+++++++++++>-]<.+++++.---..>+++[<+++++++>-]<.[-]>+++++++++[<+++++++++++>-]<-.>++++[<+++++>-]<-.---.-----------.--.+++++++++++++.[-]>++++++[<++++++++++>-]<+++.
dam 255 char limit

gabrielkfl

Posts: 7
Joined: Sat Oct 10, 2009 7:02 pm UTC

### Re: Help with Neural Networks

They work because of the training they go through. Because each nodes weight/firing power is adjusted over a training set, the neural net is able to get "tuned" to a state where the correct neurons are triggered by the correct amount of weight.

Think if you see an animal. One way to envision your thought process of identifying said animal is to go through a list of attributes for it. You might see it flying, which will immediately trigger "bird" in your mind. However, if it isn't flying, then you start going through what you can tell about it "does it have wings?" yes, "Is it brown" yes, "does it have a beak" yes, It must be a bird. Now, each of the attributes might not be enough to trigger a positive bird id for you (except for maybe the flying one). It is only after assigning appropriate weight to each property that you identify it.

For the training, it is much akin to changing each attribute and having the NN see the differences. It mights see "It is brown" "It has a beak" "It isn't flying" "It doesn't have wings" and guess 0.78 that it is a bird. Your response will be to slap the system and say "No, its a platypus" To which it places a higher value on the wing node and flying node, and lower values on the beak and brown nodes. (based on how wrong it is. It should change the values based on how far off the final conclusions, but not all the way"

That is, essentially, how a single layered NN works. Multilayered NN are much the same, they just abstract the weights a little further.
cogman

Posts: 114
Joined: Sun May 18, 2008 2:17 pm UTC

### Re: Help with Neural Networks

cogman wrote:For the training, it is much akin to changing each attribute and having the NN see the differences. It mights see "It is brown" "It has a beak" "It isn't flying" "It doesn't have wings" and guess 0.78 that it is a bird. Your response will be to slap the system and say "No, its a platypus" To which it places a higher value on the wing node and flying node, and lower values on the beak and brown nodes. (based on how wrong it is. It should change the values based on how far off the final conclusions, but not all the way"

Oh! I got it now.

Thank you thank you thank you thank you thanks...

(though I still don't know how to figure that "ideal number of neurons" thing, or the reason for multi-layered NNs)
[-]>+++++++++[<+++++++++++>-]<.+++++.---..>+++[<+++++++>-]<.[-]>+++++++++[<+++++++++++>-]<-.>++++[<+++++>-]<-.---.-----------.--.+++++++++++++.[-]>++++++[<++++++++++>-]<+++.
dam 255 char limit

gabrielkfl

Posts: 7
Joined: Sat Oct 10, 2009 7:02 pm UTC

### Re: Help with Neural Networks

My compliments to cogman for the elegant explanation.

As cogman said, the reason for multiple layers is that you can abstract more. With a single layer network you can typically only answer yes/no questions (yes, it's a bird, no, it's not a platypus, yes, it might also be bat -- *SLAP*).
If you add a hidden layer, you can calculate continuous functions like z = x2 + y2 (and many, many other things). If you add another hidden layer, you can calculate pretty much anything. The only problem is that the training gets harder for every layer that you add.

For as far as I know there is no theory about how to figure out the ideal number of nodes in each hidden layer for a given task. Usually you'll just guess, then try whether your network will learn faster and better if you use more nodes, and if it doesn't, see how many nodes you can leave away without losing training speed and accuracy. The reason for the latter is that it reduces the risk of overfitting.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1301
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified

### Re: Help with Neural Networks

Jplus wrote:As cogman said, the reason for multiple layers is that you can abstract more. With a single layer network you can typically only answer yes/no questions (yes, it's a bird, no, it's not a platypus, yes, it might also be bat -- *SLAP*).
If you add a hidden layer, you can calculate continuous functions like z = x2 + y2 (and many, many other things). If you add another hidden layer, you can calculate pretty much anything. The only problem is that the training gets harder for every layer that you add.

Uhum... got it.

Jplus wrote:For as far as I know there is no theory about how to figure out the ideal number of nodes in each hidden layer for a given task. Usually you'll just guess, then try whether your network will learn faster and better if you use more nodes, and if it doesn't, see how many nodes you can leave away without losing training speed and accuracy. The reason for the latter is that it reduces the risk of overfitting.

Ah, so I just keep trying until I get an optimal result?
Hehe, strange.

Alright, I think that's it.
Thanks everyone, you really helped me out here.
[-]>+++++++++[<+++++++++++>-]<.+++++.---..>+++[<+++++++>-]<.[-]>+++++++++[<+++++++++++>-]<-.>++++[<+++++>-]<-.---.-----------.--.+++++++++++++.[-]>++++++[<++++++++++>-]<+++.
dam 255 char limit

gabrielkfl

Posts: 7
Joined: Sat Oct 10, 2009 7:02 pm UTC

### Re: Help with Neural Networks

I have run into some algorithms for determining good MLP configurations for a given training set without just doing a huge search ...

Only one which comes to mind is "Neural network design using Voronoi diagrams". Pretty old, so if you follow citations you can probably find other algorithms as well.
Pinky's Brain

Posts: 177
Joined: Mon Apr 28, 2008 11:46 pm UTC

### Re: Help with Neural Networks

Jplus wrote:While perceptrons can only represent linearly separable functions, feed-forward networks with a hidden layer can represent any continuous function, and networks with two hidden layers can even represent any discontinuous function.

Somebody sent me a PM with the question where I got this from, particularly the claim that a network with two hidden layers can compute any discontinuous function.

The short answer is that I got it from Russell & Norvig, Artificial Intelligence: A Modern Approach, Second Edition (2003), Ch. 20.5, subsection "Multilayer feed-forward neural networks".

The slightly longer answer is that the second paragraph of that subsection contains the claim, and it has a footnote that reads as follows:
The proof is complex, but the main point is that the required number of hidden units grows exponentially with the number of inputs. For example, 2n/n hidden units are needed to encode all Boolean functions of n inputs.

Initially no reference is given to the work where this was actually proven, but fortunately I found it in the "Bibliographical and Historical Notes" section at the end of the chapter. It was Cybenko (1988, 1989) who proved that two hidden layers are enough for any function and that one layer is enough for any continuous function. Full references:

Cybenko, G. (1988). Continuous valued neural networks with two hidden layers are sufficient. Technical Report, Department of Computer Science, Tufts University, Medford, Massachusetts.

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Controls, Signals and Systems, 2, 303-314.

EDIT: the person who PMed me contacted Cybenko and found that he's actually been misquoted by Russell and Norvig in the second edition. He proved that neural networks with one hidden layer can represent any continuous function, but not that an additional layer would enable them to represent any discontinuous function. He did derive some other interesting results in the 1989 paper, including approximations of discontinuous decision functions in networks with one hidden layer.
Feel free to call me Julian. J+ is just an abbreviation.
coding and xkcd combined

Jplus

Posts: 1301
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: classified