## Am I back propagating correctly?

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Am I back propagating correctly?

I decided to finally try to make a back propagation neural net. I did some research online and found a few helpful guides. Using those two as a basis I coded something up that would hopefully be able to do an xor operation via neural net.

I've had little success, however. Near as I can tell the neural processing is working correctly, but the back propagation isn't. It seems that it always wants the answer to be .5, rather than trying to actually xor the two binary digits. I know it's asking a lot to hope that someone would be willing to go through my neural processor and find what I've fucked up, but I've been beating my head against this particular wall all weekend, and I've reached my wits' end.

Spoiler:

Code: Select all

`package trainer;import java.io.BufferedWriter;import java.io.FileOutputStream;import java.io.IOException;import java.io.OutputStreamWriter;import java.util.Random;public class Trainer {        public static float fOfX(float x)    {        return (float)(1/(1+Math.exp(x*-1)));    }        public static float fPrimeOfX(float x)    {        return fOfX(x) * (1-fOfX(x));    }        public static float[][] procNode(float[] in, float[][][] net)    {        float sum;   int i;        net[0] = new float[in.length][];        float[][] out = new float[net.length][];        for (i = 1; i < net.length; i++)        {            out[i] = new float[net[i].length];        }        out[0] = new float[in.length];   //   assign content to input layer   for(i=0;i<in.length;i++)            out[0][i]=in[i];  // output_from_neuron(i,j) Jth neuron in Ith Layer   //   assign output(activation) value    //   to each neuron usng sigmoid func   for(i=1;i<net.length;i++)        {            // For each layer            for(int j=0;j<net[i].length;j++)            {      // For each neuron in current layer                sum=0.0f;                for(int k=0;k<net[i-1].length;k++)                {      // For input from each neuron in preceeding layer                    sum+= out[i-1][k]*net[i][j][k];   // Apply weight to inputs and add to sum                }                //sum+=net[i][j][net[i-1].length];      // Apply bias                out[i][j]=fOfX(sum);            // Apply sigmoid function            }   }                return out;    }        public static float[][][] backProp(float[][] actual, float[] tgt, float[][][] net, float[] input)    {        float sum;   int i;        float[][] delta = new float[net.length][];        float[][] out = new float[net.length][];        float[][][] newNet = new float[net.length][][];        for (i = 0; i < net.length; i++)        {            delta[i] = new float[net[i].length];            out[i] = new float[net[i].length];            if (i > 0)            {                newNet[i] = new float[net[i].length][];                for (int j= 0; j < net[i].length; j++)                {                    newNet[i][j] = new float[net[i][j].length];                }            }        }   //   find delta for output layer   for(i=0;i<net[net.length-1].length;i++)        {      delta[net.length-1][i]=actual[net.length-1][i]*(1-actual[net.length-1][i])*(tgt[i]-actual[net.length-1][i]);   }   //   find delta for hidden layers      for(i=net.length-2;i>0;i--)        {            for(int j=0;j<net[i].length;j++)            {                    sum=0.0f;                    for(int k=0;k<net[i+1].length;k++)                    {                            sum+=delta[i+1][k]*net[i+1][k][j];                    }                    delta[i][j]=actual[i][j]*(1-actual[i][j])*sum;            }   }        float alpha = 0.0f;   //   apply momentum ( does nothing if alpha=0 )   for(i=1;i<net.length;i++){      for(int j=0;j<net[i].length;j++){         for(int k=0;k<net[i-1].length;k++){            newNet[i][j][k]+=alpha*net[i][j][k];         }         newNet[i][j][net[i-1].length]+=alpha*net[i][j][net[i-1].length];      }   }        float beta = 1.0f;   //   adjust weights usng steepest descent      for(i=1;i<net.length;i++)        {      for(int j=0;j<net[i].length;j++)                {         for(int k=0;k<net[i-1].length;k++)                        {            float change=beta*delta[i][j]*actual[i-1][k];            newNet[i][j][k]=net[i][j][k]+change;         }         float change =beta*delta[i][j];         newNet[i][j][net[i-1].length]+=net[i][j][net[i-1].length] + change;      }   }        return newNet;    }            public static float[][][] randomizeInitialNet(int input, int output, int hiddenAmtMin, int hiddenAmtMax, int minHidden, int maxHidden)    {        Random rand = new Random();        int numLayers = rand.nextInt(hiddenAmtMax-hiddenAmtMin) + hiddenAmtMin + 2;        int[] layerNodes = new int[numLayers];        for (int i = 1; i < numLayers-1; i++)        {            layerNodes[i] = rand.nextInt(hiddenAmtMax-hiddenAmtMin)+hiddenAmtMin;        }        layerNodes[0] = input;        layerNodes[numLayers-1] = output;        float[][][] net = new float[numLayers][][];                net[0] = new float[layerNodes[0]][];   for(int i=1;i<numLayers;i++)        {            net[i] = new float[layerNodes[i]][];            for(int j=0;j<layerNodes[i];j++)            {                net[i][j]=new float[layerNodes[i-1]+1];            }   }                   for(int i=1;i<numLayers;i++)        {            for(int j=0;j<layerNodes[i];j++)            {                for(int k=0;k<layerNodes[i-1]+1;k++)                {                    net[i][j][k]=rand.nextFloat()*2-1;                }            }        }        return net;    }        public static String brainToString(float[][][] output)    {        String out = new String();        for (int i = 0; i < output.length; i++)        {            for (int j = 0; j < output[i].length; j++)            {                for (int k = 0; k < output[i][j].length; k++)                {                    out = out + String.valueOf(output[i][j][k]);                    if (k < output[i][j].length-1)                    {                        out = out + ";";                    }                }                if (j < output[i].length-1)                {                    out = out + ",";                }            }            if (i < output.length-1)            {                out = out + "~";            }        }        return out;    }        public static void main(String[] psvm) throws IOException    {                float[][][] net = randomizeInitialNet(2, 1, 5, 6, 4, 5);        Random rand = new Random();        float[][] inputs = {{0,0}, {1,0}, {0,1}, {1,1}};        float[][] validOuts = {{0}, {1}, {1}, {0}};        for (int i = 0; i < 50000; i++)        {//            float[] nums = {rand.nextFloat()*64.0f, rand.nextFloat()*64.0f};//            float[] exp = {((int)nums[0]^(int)nums[1])/64.0f};                        float[][] out = procNode(inputs[i%inputs.length], net);            net = backProp(out, validOuts[i%inputs.length], net, inputs[i%inputs.length]);                        //if (i%999 == 0)            System.out.println(i + ": " + validOuts[i%inputs.length][0] + "/" + out[out.length-1][0] +                    ": " + validOuts[i%inputs.length][0]/out[out.length-1][0]);        }        System.out.println("fail");    }}`
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

### Re: Am I back propagating correctly?

Having once successfully implemented a backwards-propagating neural net myself, I'm willing to take a look at your code. However I need a quiet moment and my own code, so it might take more than a week before I get back to you.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Re: be I back propagating correctly?

I think someone changed my thread title...

Nope, seeing that I wrote "I think" and a filter changed it to "I reckon" it seems my thread title was victim to some new word filter.

But anyway, thanks for the help. I was losing hope on anyone responding to this. Would you be willing to share the source code for your back propagation? I might could compare it to mine and find what I'm doing wrong.
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Xanthir
My HERO!!!
Posts: 5426
Joined: Tue Feb 20, 2007 12:49 am UTC
Contact:

### Re: be ic back propagating correctly?

Check the perma-threads. It's M-O-D-M-A-D-N-E-S-S, which means lots and lots of f-i-l-t-e-r-s for a week or so.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Re: Am I back propagating correctly?

Okay, so, there's a very real possibility I'm about to look very stupid, but I have to ask this question. What happens if there are too many layers and/or nodes in a neural net?

I ask because, just for funsies, I reduced the number of hidden layers from 4 to 2, and reduced the nodes per hidden layer from 4 to 2, and suddenly I'm getting exactly the results I want.
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Xanthir
My HERO!!!
Posts: 5426
Joined: Tue Feb 20, 2007 12:49 am UTC
Contact:

### Re: Am I back propagating correctly?

Nothing in particular will happen. More layers/nodes can more easily handle complex functions, but are harder to train, is all.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))

Divinas
Posts: 57
Joined: Wed Aug 26, 2009 7:04 am UTC

### Re: Am I back propagating correctly?

Actually, that's not exactly true. Having too many layers and nodes makes your neural net more prone to overfitting. This however is most probably not the reason why you are getting the wrong results, in your particular case (disclaimer: I haven't looked at your code).

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Re: Am I back propagating correctly?

Well, I don't think it was the only problem. While doing some work last night I found that rather than adding delta to the previous weight, I was actually setting the new weight to delta. That ended up not fixing it, though, it just made it go back and forth around .5 rather than asymptotically approaching .5.

Interestingly, as I've been experimenting I've found that reducing the number of layers hasn't completely fixed things. Sometimes I still get nets that will zero in on .5 rather than on the desired output. Some of them only do this for certain input sets. I had one that was giving me good output for (0,0), (1,0), and (1, 1), but thought the correct output for (0, 1) was .5. Since I'm using a randomized net that changes every time I suspect that's there are some that are completely or partially "untrainable" for some reason.

To get an idea for what my output is supposed to look like I downloaded Encog which, from what I've seen googling around, seems to be just about the state-of-the-art of neural nets. The Hello World example program worked marvelously, so I decided to try some more complicated problems to get it to solve. I got a pretty good result with multiplication, so I moved on to something that would require more than two inputs and more than one output.

The specific problem I decided on was using the direction and magnitude of two vectors finding the direction and magnitude of their sum. Well, imagine my surprise when my output ended up the same regardless of the input, just like the problem I was having initially. Is this just a thing with Neural Nets? I'm trying some tinkering about to see if any combination of test cases and hidden layer arrangements works. No dice thus far. The code, if you're curious. Just plug that into a main function.

Spoiler:

Code: Select all

` /**            * The input necessary for XOR.            */                    Random rand = new Random();                        double[][] XOR_INPUT = new double[150][];            double[][] XOR_IDEAL = new double[150][];                        for (int i = 0; i < XOR_INPUT.length; i++)            {                double a = rand.nextDouble();                double b = rand.nextDouble();                XOR_INPUT[i] = new double[4];                double x1 = a * Math.cos(b);                double y1 = a * Math.sin(b);                XOR_INPUT[i][0] = a;                XOR_INPUT[i][1] = b;                a = rand.nextDouble();                b = rand.nextDouble();                XOR_INPUT[i][2] = a;                XOR_INPUT[i][3] = b;                double x2 = a * Math.cos(b);                double y2 = a * Math.sin(b);                double finX = x1 + x2;                double finY = y1+y2;                double mag = Math.pow(finX*finX+finY*finY, .5);                double dir = Math.atan(finY/finX);                XOR_IDEAL[i] = new double[2];                XOR_IDEAL[i][0] = mag;                XOR_IDEAL[i][1] = dir;            }            /**            * The ideal data necessary for XOR.            */            // create a neural network, without using a factory      BasicNetwork network = new BasicNetwork();      network.addLayer(new BasicLayer(null,true,4));      network.addLayer(new BasicLayer(new ActivationSigmoid(),true,1));      network.addLayer(new BasicLayer(new ActivationSigmoid(),false,2));      network.getStructure().finalizeStructure();      network.reset();       // create training data      MLDataSet trainingSet = new BasicMLDataSet(XOR_INPUT, XOR_IDEAL);       // train the neural network      final ResilientPropagation train = new ResilientPropagation(network, trainingSet);       int epoch = 1;       do {         train.iteration();         System.out.println("Epoch #" + epoch + " Error:" + train.getError());         epoch++;      } while(train.getError() > 0.00000000000000000001 && epoch < 20000);      train.finishTraining();       // test the neural network      System.out.println("Neural Network Results:");      for(MLDataPair pair: trainingSet ) {         final MLData output = network.compute(pair.getInput());         System.out.println(pair.getInput().getData(0) + "," + pair.getInput().getData(1)               + ", actual=(" + output.getData(0) + ", " + output.getData(1) + ") ideal=("                                + pair.getIdeal().getData(0) + ", " + pair.getIdeal().getData(1) + ")");                }      Encog.getInstance().shutdown();`
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

### Re: Am I back propagating correctly?

I finally found the time to have a look at your code. As far as I can tell from close reading, you're calculating the deltas correctly and you're also updating the weights correctly (except that the block "apply momentum" seems to make no sense to me and should be removed since you set alpha=0.0 anyway). The only way to improve upon the back-propagation itself could be to try lower values of beta, for example 0.3.

I did find several other things that you could improve upon, however:
• You should use a stopping criterion to determine whether to continue training. For example, stop training when the total error (sum of squares) of the network goes below a given limit. Encog does that. Training a fixed number of epochs will be overkill for some sets and insufficient for others. 12500 epochs (50000 sessions for XOR) is insanely many; XOR should be able to fit much faster. By the way, in your Encog example you're setting the limit incredibly low; I don't see why you wouldn't tolerate an error of 0.01.
• You should not repeat your training data in a fixed order, because that allows your network to walk around in circles. The order of the data should be randomized on every new epoch.
• When you initialize the weights of the network, try to avoid anything close to zero. Setting a weight (close) to zero initially basically means that you've already pruned the connection before you know whether you want that. Use an absolute value of at least 1 for each weight, and use absolute values that are greater than 1 as well. For my own implementation I used integer weights between -8 and +8, excluding zero.
• Randomizing the number of layers and the number of nodes in each layer makes no sense. You want to have control over those things. For XOR, one hidden layer is the maximum (also the minimum), with just two nodes. You may find that three hidden nodes fits faster, but that should be your limit. In general, one hidden layer is nearly always enough (including but not limited to all continuous functions) and going higher than two hidden layers is something you only do if you really know what you're doing.
• Coding nitpick: you defined the fPrimeOfX function, but you're not using it anywhere. It would be better to have a fPrimeOfF function which simply returns f*(1-f) and use that to factor out all of your actual[i][j]*(1-actual[i][j]) snippets.
As for your last post: yes, in general neural nets aren't that easy to train. The backwards propagation updating is essentially a hillclimbing algorithm, so depending on the initial weights it might get stuck in a local maximum and never reach a state that you consider good enough. You might need to try several times before you get what you want. To add insult to injury, training becomes explosively harder (especially slower but also more prone to over-fitting) with every additional hidden layer. This is why you should always use as few hidden layers as possible and try to avoid anything with more than two hidden layers. Fortunately the power of a neural net also increases explosively with every hidden layer, so as I said before you're unlikely to ever need more than two.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

Naurgul
Posts: 623
Joined: Mon Jun 16, 2008 10:50 am UTC
Location: Amsterdam, The Netherlands
Contact:

### Re: Am I back propagating correctly?

About the number of hidden layers, there is a theoretical result suggesting that the one hidden layer is enough to approximate any function. Obviously, that doesn't necessarily mean that one layer is always optimal: sometimes having more layers will help convergence.

As Jplus said, you should randomise the initial weights for each run but you should decide the structure of the network (number of hidden layers, number of nodes per hidden layer) yourself.
Praised be the nightmare, which reveals to us that we have the power to create hell.

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Re: Am I back propagating correctly?

Thanks, man. I kind of forgot about this thread, but I actually figured out what my problems were a week or so ago. The reason it was always converging on one wrong value was because I had two inputs in my training set that were identical, but mapping to different outputs. I had simply made a typo when setting up the net for training, and assumed that the problem must be with the training logic and not in main.

I have since managed to model several systems by back propagation, and with encog I've tried rProp a couple times, too.

Anyway, thanks for the help!
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

### Re: Am I back propagating correctly?

I would have appreciated if you posted that a week ago, because I just spent a few hours close-reading your code as well as my own, and consulting my textbook.

Good to know that you solved the problem, though.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)

Spambot5546
Posts: 1466
Joined: Thu Apr 29, 2010 7:34 pm UTC

### Re: Am I back propagating correctly?

At least we all learned something, right?
"It is bitter – bitter", he answered,
"But I like it
Because it is bitter,
And because it is my heart."

Jplus
Posts: 1721
Joined: Wed Apr 21, 2010 12:29 pm UTC
Location: Netherlands

### Re: Am I back propagating correctly?

Yes. I learned (or was reminded) that misbehaviour of software may be caused by the inputs rather than by the program.
"There are only two hard problems in computer science: cache coherence, naming things, and off-by-one errors." (Phil Karlton and Leon Bambrick)

coding and xkcd combined

(Julian/Julian's)