Dr. Mark Humphrys

School of Computing. Dublin City University.

Online coding site: Ancient Brain

coders   JavaScript worlds

Search:

Free AI exercises


Initialising weights

For the network to work, it is crucial that the hidden nodes do different things. They cannot all be the same. They must specialise on different aspects of the input-output mapping. All we need do is start them off randomly different to get this process going.

So what should we start weights and thresholds at?




Small weights are best

Note from graph of sigmoid function that large positive or negative Summed x has a very small slope dy/dx.

dy/dx = y(1-y), and at either end, one of these terms is near zero.

Hence for large absolute xk, is near zero, and is near zero too.

Large absolute Summed x (caused by large absolute weights) causes a small change in weights, slow learning.

Small weights give fast learning. All things being equal, small weights tend to put us in the middle of the sigmoid curve, the area of rapid change.

Small weights and fast learning is what we want at the start, when we know nothing.

  


All zero weights are bad

OK, so very large weights are bad and we should have small weights to start. How about zero?

Let all weights and thresholds start at zero.

Short version:

  


All weights the same are bad

Following from the above, identical nodes are a bad idea, even if weights are non-zero.

Multiple hidden nodes the same are useless. You can achieve the same effect with one hidden node and different weights. Consider the following.

Q. A neural network has 1 input node, n hidden nodes, and 1 output node. The weights on the input layer are all the same: wij = W1. The thresholds of the hidden nodes are all the same: tj = T. The weights on the output layer are all the same: wjk = W2.
This network is equivalent to a network with 1 input node, 1 hidden node, and 1 output node, where (wij, tj, wjk) = what?




Initial weights should be random, diverse, small

So now we have our strategy to initialise weights:
Initial weights should be:
  1. random
  2. different
  3. small in absolute size (plus or minus)
See C++ Sample code for initialisation.

How small is "small"?
Like many other things to do with the neural network, we may need to experiment.

  


Stopping large weights developing

If exemplars give the correct outputs as 0 or 1, and we are using the sigmoid function, then very large weights will develop. Can't actually get 0 or 1 output without at least one weight going to plus or minus infinity.

One way to stop this is for exemplar outputs to be 0.1 to 0.9, rather than 0 to 1.



ancientbrain.com      w2mind.org      humphrysfamilytree.com

On the Internet since 1987.      New 250 G VPS server.

Note: Links on this site to user-generated content like Wikipedia are highlighted in red as possibly unreliable. My view is that such links are highly useful but flawed.