Machine Learning: A Mechanical Approach to Learning

A Guide to the Mathematics Behind ANNs

Dhruv Mangtani
6 min readJun 17, 2017

Java Code: https://github.com/1dividedby0/Synthesized-Children

http://hmaus.org/

Imagine a newborn child; out of its mother’s womb, it doesn’t recognize the nurse’s hands as human, let alone hands at all. However, these hands are eventually assigned mental labels, usually by the time the child is two months old. Their brain consists of, among other things, networks of neurons attached to each other via synapses. Every interaction with world leads to a reaction from the brain. Each neuron recursively triggers the next, passing information throughout the body at lightning speeds. If optically received images could be interpreted in this way, patterns and decisions could be made and identified by machines using a similar system. In the early 1940s, researchers proved that a machine could be trained to recognize anything within a couple of days, and, by the mid 2000s, a few seconds given a set of objects(inputs) and their labels(outputs). They were the first to model an Artificial Neural Network computationally.

Understanding the ANN (Artificial Neural Net)

An ANN is formed by multiple interconnected layers of neurons. Each neuron in one layer holds a synaptic connection with every other neuron in the next.

There can be more than one hidden layer and one output neuron.

The bottom or first layer is the only one that directly receives inputs or parameters. The topmost and last layer outputs a result or label. Each layer receives the last’s outputs as inputs. Because we only specify the dimensions of the neural net and never its internal contents, the hidden layers are often considered part of a “black box” as no one really knows what values and functions are defined there. The synaptic connections each hold their own “weights.” As we are training the neural net, we will be using training data as ANNs are supervised learning models, meaning that the training data consists of input values as well as what their predictions/outputs should be. This way, we can determine how our net is performing by comparing the predicted output to the actual output while training. The ANN will automatically reduce the error the next time it runs through its loop.

Predicting or Recognizing Using an ANN

i.e 0.8 is the weight for the synapse connecting the first input neuron to the first hidden neuron. https://stevenmiller888.github.io/mind-how-to-build-a-neural-network/

To start training our net to recognize or predict probabilities or images(i.e identifying an animal), we must “forward propagate” through the neural net. First, the input layer is filled. If the input were a greyscale image, each neuron would represent a pixel. In this case, the inputs are boolean values, and we are trying to predict another boolean. The first time we forward propagate, the weights are filled with random values on a Gaussian distribution. We then multiply them by the values of the neurons attached at the bases of their synapses. The input value for every neuron in the next layer is simply the sum of the products(found in the last step) of the synapses attached to it. For example, if we want to find the input of the first hidden layer neuron, we find the weights of the synapses attached, multiply them by the neurons at the bases of said synapses, and find the sum of the products: 0.8 × 1 + 0.2 × 1 = 1. Finally, we apply an “activation function” to the sum/input. We use an activation function to model more complex, non-linear patterns. In this case, we are dealing with a logistic/probability prediction, so we will use the sigmoid function as it always results in a value between 0 and 1. Sig(1)≅0.731

Logistic Sigmoid Activation Function
Forward Propagated with Random Weights

If we are training, we check how well our net just performed by comparing the predicted outputs to the ones in the training data. In this case, the training data output is 0, but our prediction is 0.77. This is normal because we initialized the net with random weights. Now, the net moves on to back propagation in order to refine the weights.

Training or Teaching an ANN

After testing the ANN against training data with our random weights, we use a process called “back propagation.” To put it simply, it involves moving backwards through a neural net(like the one above) in order to adjust the weights of every synapse. We start at the output layer and calculate the error using a cost function. In this case, we will use a logistic cost function J as our net is intended for classification:

y is the given training data output and the hypothesis(x) is the predicted output. Notice how we use the logistic regression cost function for classification with probabilities.

To refine the weights for the synapses attached, we use the gradient descent algorithm. Gradient descent gradually increments or decrements weights until they minimize the cost or error of the net:

Convergence is defined as the point where the cost function J is minimized.

In the image above, “a” is the learning rate. A large learning rate means greater decremental/incremental “steps” in editing the weights. If it is too large, it can skip past the optimal value for a synaptic weight. If it is too small, the algorithm will take too much time to converge upon a global optima. The second part of the term is defined as the derivative of the cost function. It too defines the rate of change, however it is variably dependent on the logistic sigmoid cost(J(𝜃)) and the current weight(𝜃j).

Afterwards, we forward propagate again with the new weights to see if the error was reduced. The process is repeated until the cost converges to a minimum.

Applications

ANNs like this can and have been used in numerous ways. For example a neural net can predict stock market ticks or identify new planets. If reversed, they can even produce art or poetry by inputting classifications and outputting images’ pixels. Apps like Prisma have made this commercially available. Keep in mind that the algorithms shown here can be used with multiple classification outputs. For example, given an input layer of an image’s pixels, we can have multiple output neurons in the final layer. The neurons in the output layer may be the probability of a dog being in the picture, the dog’s food, and an intruder stealing the dog’s food. However, these models are still prone to error even with ridiculous optimization:

Sources

--

--

Dhruv Mangtani

17y/o @ BLUFFNet, SnapGrub, Virtual Rewards, Nudge Debate, Ezi