With new devices in daily life, buzzwords like machine learning, artificial intelligence, and neural networks have become ubiquitous. Despite their overuse in popular culture, neural networks are valuable additions to any algorithm focused on making informed decisions. Despite seeming quite complex, their core procedures are crucial for future machines. Neural networks are a vital component of a machine learning algorithm.
Neural networks often work similarly to human brains– they are made up of individual nodes which make their own decisions and pass on their results to the other nodes surrounding them. Neural networks use “weighted” decisions, which are decisions based on the importance of a specific event. If the sum of the events exceeds a threshold, data is passed on to the next node; otherwise, no data is passed. The event would be multiplied by a specific weight corresponding to its importance. For example, to pass the threshold of 9 for “Can I go to the mall today?”, the node may take in two inputs: “GPA” (number) with a weight of 2 and “are chores done?"""""""" (true/false) with a weight of 3. If the child's GPA was 3.5 and they finished their chores, the weighted sum would be (3.5*2)+(1*3) = 10, which is greater than 9 so the child can go to the mall. However, if their GPA was 4.0 but they didn't finish their chores, the weighted sum would be (4.0*2)+(0*3) = 8, which doesn't exceed 9, so the child cannot go to the mall. Of course, these are straightforward examples, while real neural network nodes may take in hundreds of inputs with varying weights. Other models may even have biases, which adjust the final sum to make the result more accurate and realistic.
However, neural networks have more than one node; they are often made up of multiple layers of nodes. The first layer is the input
layer, and as the name suggests, it takes inputs and transfers them to the hidden layers, which do all of the computation. The hidden layers take the weighted sum of the inputs referenced above and add any biases present. When computing large amounts of inputs, they are often stored in a vector, represented as numbers or symbols in a column. The inputs are then multiplied by a series of weights, represented by a matrix of numbers or symbols. The result would be another matrix of values passed onto the hidden layer. If only one hidden layer existed, the result would be multiplied by more weights and then passed on to the output. However, most neural networks have multiple hidden layers, so the result would be passed on to other hidden layers, provided that the results exceed the given threshold value.
Neural networks can make decisions every day, but whether their decision is correct is different. Neural networks need to be able to learn based on their past mistakes or successes. This is calculated through the weights and biases of the network. Let's say a given neural network has 10 possible outputs (i.e., the numbers 0-9) from trying to identify a certain hand-written digit (e.g., 3). The neural network should return a confidence value close to 1 (the possible confidence values range from 0-1) for the number 3 and a confidence value closer to 0 for all the other numbers. However, the neural network is inaccurate and must be fixed if it returns varied results for each number. This is where the "cost function" comes into play. The cost function calculates the sum of the squares of the differences between the expected value and the resulting value. When the cost function's result is high, the neural network is far from its expected outputs and needs to be corrected. The goal of fixing the neural network is to minimize this cost function.
For a parabolic function with one input and one output, finding the minimum is as "easy” as finding where its derivative equals zero. However, some neural networks may have thousands of inputs, so it's" not always so simple. Gradient descent becomes helpful here. Gradient descent is an algorithm that calculates the instantaneous rate of change (the derivative at a specific point) and decides whether to step left or right on the graph. If the slope at that point is negative, a new point is checked to the right of that older point. Likewise, if the slope at that point is positive, a new point is checked to the left of that older point. The distance between the new and older points is proportional to the derivative value at the old point to eliminate overshooting the actual minimum value. Repeatedly running this algorithm at different random points on the graph will find multiple minima, which can then be compared to themselves to find the true global minimum of the function.
Though these algorithms are quite complex, they will start to seem a little easier with practice. With programming, and any other new topics, one of the best ways to learn it is to try it hands-on. Start with simpler algorithms and work your way up to higher-level ones until you reach neural networks. Programming can boost your experience if you're heading into that field and improve your critical thinking skills, which can be used for everyday life