This blog is theoretical and little mathematical Explanation of working of Artificial Neural networks. Try to understand as much as you can, In next tutorial I will walk you through step by step implementation of neural network.
A neural network can be defined as a model of reasoning based on the human brain. The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called neurons. The human brain incorporates nearly 10 billion neurons and 60 trillion connections, synapses, between them. By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in existence today. Each neuron has a very simple structure, but an army of such elements constitutes a tremendous processing power. A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single long fiber called the axon.
Figure 1. Biological neural network
Our brain can be considered as a highly complex, non-linear and parallel information-processing system. Information is stored and processed in a neural network simultaneously throughout the whole network, rather than at specific locations. In other words, in neural networks, both data and its processing are global rather than local. Learning is a fundamental and essential characteristic of biological neural networks. The ease with which they can learn led to attempts to emulate a biological neural network in a computer.
An artificial neural network consists of a number of very simple processors, also called neurons, which are analogous to the biological neurons in the brain. The neurons are connected by weighted links passing signals from one neuron to another. The output signal is transmitted through the neuron’s outgoing connection. The outgoing connection splits into a number of branches that transmit the same signal. The outgoing branches terminate at the incoming connections of other neurons in the network. Architecture of the Artificial Neural Network is very similar to the Neural Networks found in our brain. You can see one neuron can have my input and one output. all connection have weights associated with it. In figure 3. w1, w2, w3 are input weights to perceptron.
Figure 2. Architecture of a typical artificial neural network
Figure 3. The neuron as a simple computing element
The neuron computes the weighted sum of the input signals and compares the result with a threshold value, Θ. If the net input is less than the threshold, the neuron output is –1. But if the net input is greater than or equal to the threshold, the neuron becomes activated and its output attains a value +1. Here Y one of the class to be predicted and X is the actual output from output neuron.This type of activation function is called a sign function.
For example your neural output is 0.56 and you threshold is 0.5 then as 0.56 > 0.5, class is said to be 1.
The neuron uses the following transfer or activation function:
General Meaning of activation function
Beside sign activation function there are other activation function exist as given below:
Figure. 4 Activation functions of a neuron
In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter (activation used).
Figure. 5 Single-layer two-input perceptron
The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter (a type of activation function). The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and 1 if it is negative. The aim of the perceptron is to classify inputs, $ x1, x2, . . ., xn,$ into one of two classes, say A1 and A2. In the case of an elementary perceptron, the n-dimensional space is divided by a hyperplane into two decision regions. The hyperplane is defined by the linearly separable function:
To understand below given figure I will walk you through one example: lets say we have to predict house price that depend on two variables x1 and x2 then we have two dimensional plane (Figure 7A )of results (just like we plot 2D graph with ). When we add one more factor x3 then search dimensional becomes 3 dimensional (Figure 7B). A swe go on adding more and more variables (features), we will be able to conquer more complex spaces and such solutions can handle non linearity well. We will also see one example as what is the impact of having smaller network trying to predict bigger problem.
Figure 6. Linear separability in the perceptrons
How does the perceptron learn its classification tasks? This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [-0.5, 0.5], and then updated to obtain the output consistent with the training examples.
What is actually back propagation: The network computes its output pattern, and if there is an error or in other words a difference between actual and desired output patterns the weights are adjusted to reduce this error. In a back-propagation neural network, the learning algorithm has two phases. First, a training input pattern is presented to the network input layer. The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer. Second, If this pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated.
If at iteration p, the actual output is Y(p) and the desired output is Yd (p), then the error is given by:
where p = 1, 2, 3, . . . Iteration p here refers to the pth training example presented to the perceptron. If the error, $e(p)$ , is positive, we need to increase perceptron output $Y(p)$, but if it is negative, we need to decrease $Y(p)$ .
Updates to weights to decrease error Δw
where $p$ = 1, 2, 3, . . . α is the learning rate, a positive constant less than unity.The perceptron learning rule was first proposed by Rosenblatt in 1960. Using this rule we can derive the perceptron training algorithm for classification tasks.
I recommend you to read this paper if you are looking for more mathematically and theoretical formalized version. This is the shortest theory about working of neural networks. in upcoming tutorial we will see how to practically implement the Artificial Neural Network Algorithm.