__Article Sections__

__(1) Introduction To Artificial Intelligence (AI)__

Artificial Intelligence is a field of Data Science. Machine Learning (ML) in Artificial Intelligence (AI) involves Machine learning developers training data informed models which utilise mathematical algorithms, so that those developers can direct those models to automate predictive analysis of novel qualitative or quantitative data tasks that the models and algorithms have "learned" to complete automatically from simulated task training processes. The different Machine Learning models differ according to how the models are organised and what mathematical algorithms they utilise which both determine the calculatory potential and therefore purpose of the models. These Machine Learning models can be created using electronic hardware models or by creating software models using coding language on a coding interface.

The results of AI Machine Learning research have been applied to the fields of Healthcare, Technology, Financial Analysis, Business Operations, Manufacturing, Scientific Research and other research areas. The most advanced areas of Machine Learning research include Computer vision, Speech Recognition, Natural language processing (NLP), Time Series analysis.

Computer vision enables machine learning models to process, classify and determine the features that are present within images and video effectively allowing AI developers to attach eyes to AI applications that they create. Facial Recognition models, Self Driving car models, Medical Radiological imaging analysis are examples of AI applications that utilise computer vision machine learning.

Speech Recognition involves training machine learning models to create learned association between accented verbal language and the written words that correspond to the meaning of those language vocalisations. Speech recognition has enabled the automation of typing for businesses, the automation of customer care communication, voice activated authorisation and NLP development.

Natural Language Processing involves training machine learning models to comprehend vocalised language and written text in a manner that reflects human processing and comprehension of the meaning, intent and emotional context of vocalised language and written text. NLP models can be observed in applications such as digital personal assistants (Siri or Alexa), chatbots including Chat GPT, machine translation which includes Google Translate, healthcare analytics models including IBM Watson.

Time Series analysis involves training machine learning models by using the analysis of changes in data values over time in order to extrapolate data observations and therefore produce evidence based predictive analysis about the future patterns that the data will follow. Time Series machine learning models can be observed in weather forecasts, traffic forecasts, bioinformatics in Healthcare, demand and sales forecasting, stock market forecasting.

__(2) Introduction To Machine Learning (ML)__

Different types of Machine Learning models include:

Neural Networks

Regression Analysis Models

Cluster Analysis Models

Gaussian processes

Decision Trees/Random Forests

Support Vector Machines

The different types of Machine Learning mathematical algorithms employed in the above models are used to complete three main tasks which include:

1. Classification of data patterns and sequences.

2. Function approximation and Regression analysis.

3. Data Processing through filtering and clustering.

Classification algorithms include Linear Classifiers, Naive Bayes, Logistic Regression, K Nearest Neighbours.

Regression algorithms include Linear Regression, Lasso Regression, Logistic Regression, Multivariate Regression.

Clustering algorithms include K Means Clustering, Fuzzy-C Means Clustering, Expectation Maximisation, Hierarchical Clustering.

Machine learning uses statistics, probability and calculus through the primary data training methods of Supervised Learning and Unsupervised Learning or the trial and error training method of Reinforcement Learning to develop these models and their algorithms. Throughout the learning process the models will update and store changes to parameters within their algorithms in their hardware or software filing components. The ultimate goal of the learning process in altering and improving model algorithm parameters is maximising model algorithm data processing accuracy.

Machine Learning developers will train Supervised or Unsupervised models by feeding data sets into the algorithms. In Supervised Learning the datasets will have multiple inputs and desired outputs (the outputs are known as labels or supervisory signals). The purpose of using Supervised Learning is to force the algorithm to learn an association between the input datasets and the desired output or label. For example when the algorithm is given the task of identifying the same data to label association in novel similar data not part of its training dataset it should deliver successful rates of doing so.

In Unsupervised Learning the algorithm is trained with a dataset that only contains inputs and no output or labels. Unsupervised Learning involves getting the algorithm to automatically identify different patterns and content within the dataset and then cluster the dataset inputs according to their similar patterns and content. This implies the algorithm itself labels the dataset according to the algorithms cognition of and differentiation between the different patterns and other variables within the dataset. For example this allows the algorithm to identify the same data to cluster association when presented with novel similar data tasks not within the training dataset.

Reinforcement Machine learning is another method of training algorithms but does not involve feeding datasets into an algorithm. Reinforcement Learning is used when exact algorithms are not possible. Instead it involves training algorithms through trial and error to learn the right action to take in an environment. Developers will then reinforce the correct actions taken by the algorithm to establish desired response to a given task.

Before the Machine Learning process can take place, the datasets need to be translated into numerical values according to numerical scales that create a computer readable map of the dataset labels. For example datasets containing images/video are mapped into 24 bit pixel colour numbers, or datasets containing books/articles can be mapped by word/character number correspondence using these numerical scales. The mathematical algorithm outputs from each hidden layer neuron can also be mapped by the dataset numerical scale. Numerical translation enables developers to (1) run the dataset through the models mathematical algorithms to obtain the models intended predictive dataset interpretation output, (2) calculate the error present between the ground truth labels (the output according to the dataset) and the predictive training output produced by the models algorithms, (3) perform adjustment of the deficient parameters within the algorithms which removes calculation error present in the models data interpretation output.

__(3) Introduction To Neural Networks (NN)__

Neural Networks are the Machine Learning models that have become the most advanced field of AI in terms of their applicational use in Healthcare, Technology, Business, Manufacturing, Scientific Research and other spheres. Neural Networks are composed of a series of connected artificial neurons and layers similar to the layout of neurons in animal and human brains. The Neural Networks can consist of these artificial neuron layers, interconnective artificial synapses, neuron weights, biases, and mathematical functions.

Different types of Neural Networks include:

Perceptron

Feed Forward Neural Network

Multilayer Perceptron

Convolutional Neural Network

Radial Basis Function Neural Network

Generative Adversarial Network

Recurrent Neural Network

LTSM - Long Short-Term Memory Network

Sequence to Sequence Models

Modular Neural Network

To repeat, these ML Neural Network types are differentiated according to network structure and also what machine learning mathematical algorithms are used within the networks which both affect network calculatory potential and therefore network purpose. The possible variables in network structure and algorithms that separate and define neural network model type include:

(1) The directional capabilities of data in Neural Networks.

(2) The number/type of neuron layers in the networks.

(3) The amount/type of neurons in the neuron layers.

(4) The sequences in which different layers are placed.

(5) The summation functions, activation functions, filter functions being utilised in each neuron layer and output layer neurons.

Neural Networks can be separated into different sections consisting of the visible input layer, the layers of artificial neurons and the visible output layer. A basic neural network consists of 3 layers; the single visible input layer, 1 hidden layer of neurons and the single visible output layer. A Deep Learning neural network will contain between 4 - 150 layers which include the single visible input layer, multiple hidden neuron layers and the single visible output layer. The term "hidden" describes the multiple central layers that distill the datasets and refers to the unknown or "hidden" state of their outputs before forward propagation.

The visible input layer is where the unprocessed dataset is first entered into the neural network. This unprocessed dataset then becomes the inputs for the first hidden layer of neurons which will begin to differentiate, categorise and describe the data. The dataset then cycles through the multiple hidden layers of neurons that consist of divided layers of the artificial neurons electronically connected through the artificial synapses. Each layer of neurons performs layer specific analysis and transformations of the data outputs from the immediately previous layer.

This means that different neuron layers within neural networks are defined by an individualised data processing task which is the particular uniform mathematical functions that the neurons within that layer perform on that layers data inputs. The amalgamation of multiple layer dataset transformations then flow from the last hidden layer of neurons to the visible output layer which performs the last mathematical function which produces the neural networks final output or intended predictive analysis of the dataset inputs. The final output production can be followed by visible output error testing which in turn enables network optimisation through backpropagation.

A hidden layer neuron normally consists of an input vector value (from a previous layer neuron) which is entered into a summation function along with synaptic weights values and a bias weight value. The result of this summation function is then fed in to an activation function which produces the neuron output vector value which is then fed into the next layer of neurons. The diagram below is an illustration of a neuron algorithm processing.

X = Input Vector values, W = Synapse Weight values, W4 = Bias Neuron Weight, X4 = Input vector from Bias Neuron, T = The Summation Function, H = The Activation Function, Z = The activated output vectors produced by the neuron.

Within the neural network, an individual neurons processing output vector value is a representation of how the outputs from the preceding layer of neurons have inputed and answered the dataset variable processing (or question) posed by that individual neuron. For example a neurons output can be binary in 0 or 1; where 0 = no or 1 = yes to a question that further classifies and filters the original visible input layer dataset into a new refinement which can then input the next hidden layer of neurons to complete further data refinement processing. The attaching of mathematical algorithms used within neurons to precise machine learning neural network analytical purposes is informed by biological neuroscience but also mathematical and scientific research that maximises statistical efficiency. Examples of how binary numerical neuron outputs could correspond to dataset distillation can be viewed here: __https://www.ibm.com/topics/neural-networks__.

With regards to the direction that data can follow within Neural Networks there are three main possibilities of Forward Propagation, Backpropagation and Non-Linear data cycling. Forward Propagation directs dataset inputs linearly forward through the Neural Network layers from visible input layer to visible output layer to obtain predictive output calculations. Backpropagation cycles data in a reverse motion from visible output layer to visible input layer in order to remove error and optimise the network. The Non-Linear motion of data through Deep Learning Neural Network layers is carried out to replicate the structural activity and therefore calculative capability of human neurons.

The most used type of neuron layer in Neural Networks are feed forward fully connected neuron layers. Each of the neurons in these layers receives an input from every neuron output in the previous neuron layer and then follow the same principle regarding their individual outputs becoming inputs for each neuron in the next layer of neurons. Convolutional Neural Network Layers are the second most prominent type of neuron layers, but this neural network type will be discussed in the next feature article detailing types of Deep Learning models. Modular Neural Network layers and Recurrent Neural Network layers are two other types of neural network layers that stand alone with regards to their design.

In order to illustrate in more detail how a Neural Network operates we will observe the mechanical structure and operation of the Multilayer Perceptron which is a Deep Learning Neural Network.

__(4) The Multilayer Perceptron__

The Multilayer Perceptron consists of a fully connected Deep Learning network that is capable of both Forward Propagation and Backwards Propagation which enables developers to review and optimise the parameters of the networks algorithms. You can observe an illustration of a Multilayer Perceptron below.

I1,I2,I3 = The data input points in the input layer.

A1,A2,A3 = The neurons of Layer A, which is the first hidden layer of neurons.

E1,E2,E3 = The neurons of Layer E, which is the second hidden layer of neurons.

O1,O2,O3 = The output neurons of the output layer, which produce the MLPs final predictive analysis of the dataset inputs.

X Values = The output vector values produced by neurons or input layer data entry points.

W values = The synaptic weight values of connections between neuron to neuron or data input points to layer A neurons.

B1 = The Bias neuron for the entire network. This vector value is always 1.

B1X values = The B1 output vector values for inputing distinct neuron layers.

B1W values = The synaptic weight values of connections between the B1 neuron and distinct neuron layers.

* Notice the difference between number 0 and letter O for the Output Layer neurons O1,O2,O3.*

In order to be as concise but informed as possible we will follow the top track (I1,A1,E1,O1) in the forward propagation operation culminating in the O1 neurons predictive output. We will then conduct the Backpropagation/Optimisation from the output layer to the input layer. The progressive steps of this Multilayer Perceptron include:

__(5) The Multilayer Perceptron Steps__

For the following sections, Mobile users should tap images for full view.

__Step 1: Calculate the Summation Function of the A1 neuron.__

The Summation/Transfer Function for each neuron is used in order to obtain the values needed to calculate the Activation Function which in turn produces an individual neurons final output. The Summation function for individual neurons in Neural Networks is usually a Linear operation that expresses the data in a linear format. The Summation function that is chosen for a neuron depends upon the data input requirements of that same neurons activation function and also the overall data analysis trajectory of the neural network. The most used Summation functions in fully connected forward propagation Neural Networks are linear regression algorithms. These linear regression algorithms usually require three types of values in varying quantities; (1) previous neuron output vectors, (2) synaptic weight values, (3) bias weight values.

A basic example of a Linear Regression Summation Function would be:

Where T is the summation function output, X is an input vector value, W is a weights value and B is the bias value.

However an example of a Multiple Linear Regression algorithm being used as a Summation Function in one individual neuron where the previous layer contained 3 neurons (or 3 input layer dataset entry points) would be the following:

Where B = X4(W4)

Where T is the Summation Function output, X1/X2/X3 are input vectors from 3 neurons (or input layer points) in the previous layer, W1/W2/W3 are the synaptic weight values between the 3 neurons (or input layer points) and the present neuron, and B (assumed value of X4(W4)) is the bias value. When the Linear Regression algorithm output T is calculated, the resulting T value is used in that neurons Activation Function H to obtain an output for that neuron.

Therefore our Summation Function for the A1 neuron will be the following:

The numerical weight values are assigned to each synaptic neuron connection in neural networks. This means that if we take one neuron in the first of two consecutive layers of neurons, that individual neuron will have different weight value outputs for each neuron in the next layer of neurons. In simpler terms the weight values measure the strength of the synaptic connections between every pair of neurons in two consecutive layers. The analysis of the strength of these synaptic connections reveals what previous layer neuron data outputs are most incremental in helping produce maximal accuracy in next layer singular neuron output. These weight values can be fixed throughout the network training process but preferably they are randomised when first designing the network with the final weight values being optimally calculated through the trial and error Backpropagation process in order to reduce error in the neural networks predictive analysis capability.

Bias values are additional weights that enable an increased ability to vary the internal neuron calculations of neuron output values. A neural network can employ a singular bias neuron, located outside the main network, whose output vector (B1X) is always equal to 1 but the bias neurons individual synaptic weight connections to the various neuron layers can be arbitrary. Each neuron layer has one attributed bias weight value which is applied to all neurons in that layer. These bias neuron weights are multiplied by the B1X output vector value of 1 thus creating the bias values for neuron layer mathematical functions, furthering neuron output reevaluation capability. Without the bias values, the adjustment capability of the networks analysis is reduced as the result of the summation function would always originate through (0,0) in linear graphs of the parameters. When the negative of the bias weight value is used in neuron functions, it is referred to as the Threshold. Along with regular neuron optimum weight calculations, Backpropagation functions assist in determining the optimum bias neuron weight values that will contribute to decreasing the neural network Error function output.

__Step 2: Calculate the Activation Function of the A1 Neuron.__

Activation Functions are usually preceded by the Summation functions and are the second and last set of mathematical functions applied to data values within individual neurons in order to obtain the data processing output produced by those individual neurons. There are a number of different types of activation functions, both linear and non linear, that can be applied in neuron layers. Non linear activation functions can re-express the data in a multi-dimensional format which is more effective for recognising patterns and complexity within datasets. The type of Activation function chosen for a layer of neurons depends on what data transformations need to be performed on that neuron layers inputs in order to further the pursuit of the neural networks overarching data analysis aim. The output values from these Activation functions can be binary meaning either 1 or 0, or these Activation function outputs can range from -∞ to ∞.

A basic example of a non linear Activation function would be a Logistic Sigmoid Activation function:

Where H is the activation function, T is the summation function output, E is the constant 2.71828.

Therefore the Activation Function for the A1 neuron will be:

Where A1H is the activation function of the A1 neuron, A1T is the summation function output, E is the constant 2.71828.

When we have calculated the Activation Function A1H(A1T) the resulting value becomes the output vector values of the A1 neuron (A1X1, A1X2, A1X2) which get forwarded to the Layer E neurons of the MLP in order to be the input vectors for the Layer E neurons.

The process that we have gone through in the above for Step 1 and Step 2 can be repeated from Step 3 to and including Step 6. However as previously mentioned hypothetically there may be different Summation or Activation Functions used in the Layer E neurons or the Output layer neurons as each individual layer of neural networks perform particular mathematical algorithm transformations on the dataset in order to obtain the combined predictive analysis of a neural network. The extent of the mathematical process involved in solving the different Summation Functions or Activation Functions available does not differ extensively in magnitude. Different Activation Functions could include;

The ReLU Activation Function

The Swish Activation Function

We will presume for the sake of concisely demonstrating the full forward propagation process of the MLP, that the Summation Functions and Activation Functions remain the same throughout our Multilayer Perceptron including for the MLPs final outputs from the Output Layer.

__Step 3: Calculate the Summation Function of the E1 neuron.__

__Step 4: Calculate the Activation Function of the E1 neuron.__

__Step 5: Calculate the Summation Function of the O1 neuron.__

__Step 6: Calculate the Activation Function (MLP output) of the O1 neuron.__

__Step 7: Calculate the Cost Function (Mean Squared Error Function) to obtain the MLP error value.__

The Cost function is calculated to determine the average MLP network error present in the numerical difference between (1) the Neural Networks predicted visible output results after forward propagation of the data inputs through networks layers and (2) what the predicted output values are supposed to be as detailed in the training dataset labels (Ground Truth labels). The obvious end goal of this mathematical algorithm is to reduce this error of discrepancy in order to design a Neural Network that is as accurate as possible in its predictive extrapolation ability. Cost functions calculate the average error present across the network, Loss functions calculate error with regards to specific sections of the network. The result of this Cost Function algorithm is used in the Backpropagation optimisation algorithms. There are a number of different Cost functions that can be used to calculate average error in neural networks.

We will be using the Mean Squared Error function which can be observed below:

Where V is the network error, N is the number of observations, Σ = a symbol for the sum, R1 are the actual dataset outputs or ground truth labels, R2 are the observations implied by predictive outputs from the MLP output layer. O1X = R2a, O2X = R2b, O3X = R2c.

In order to calculate the Cost Function we must first separate the different observations and subtract the MLP predictive outputs from the actual dataset outputs.

We must then square the results of these subtractions while still keeping the different observations separate.

The next step of the Cost Function involves adding the results from the above squaring and then dividing the result of that sum by the number of observations produced by the output layer.

_________________________________

The result from the above formula will produce a value for V which is the result of the Cost function of the MLP. The value for V is a numerical representation of the error present in the predictive outputs of the MLP algorithms relative to the Ground Truth labels.

__Step 8: Begin Backpropagation: Calculate the Partial Derivatives of the MLP error value with respect to the weights and bias values.__

To repeat, Backpropagation is the neural network optimisation process and is carried out to reduce the error present in neural network algorithmic ability to accurately interpret data. This process uses trial and error to automatically gradually modify the individual neuron weight and bias values and their effect on neuron algorithm calculations. Backpropagation begins by using the Calculus Chain Rule and finding the partial derivative of the Error function result with respect to a particular network weight or bias. This operation is carried out backwards through the network in order to determine the numerical effect and therefore significance a singular weight or bias value exerts upon the prediction error of the network. When the MLP error (Cost Function output V) to weight/bias partial derivative chain rule function has been calculated the result is a numerical representation of the influence that the weight or bias in question has on the network error value.

Again for the purpose of being concise, due to the extent of the MLP model, we will only demonstrate the process of calculating the routes from the MLP output error back to one weight as opposed to all the different weights/biases individually. We will use a singular example of I1W1 as the intended weight for the partial derivative function in order to examine the effect I1W1 has on the MLP error. If you will observe the MLP diagram there are 9 possible routes from the outputs to I1W1 which means we will need to include all these routes in the function. This component operation would normally conducted for every weight and bias value in networks performing backpropagation optimisation.

The partial derivative function can be observed below:

=

+

+

+

+

+

+

+

+

__Step 9: Calculate the Gradient Descent Function values for the new weights and bias values.__

The result of the partial derivative function is then used in the Stochastic Gradient Descent Function (there are different gradient based optimisation algorithms that can be used) which uses the weights/bias error effect value produced by the partial derivative function so that a new weights/bias value can be recalculated. The new weights/bias value will be reinserted into a repeat forward propagation of the neural network in order to improve the mathematical algorithm calculations within that neuron layer thus improving accuracy of overall network interpretation output. The learning rate in the formula is an configurable value normally between 0.00 and 1.00.

The Gradient Descent Function can be observed below:

Where I1W1b = the new weight value, I1W1 = the old weight value, l = the learning rate which is multiplied by the partial derivative function output.

__Step 10: After optimising the MLP network by readjusting the parameters within the neuron algorithms, restart a forward propagation cycle of the MLP.__

__(6) Final Review__

Forward Propagation in the Machine Learning process consists of neural networks processing different types of datasets that have been re-expressed as sets of numbers according to numerical scales that map and translate the datasets. The neural networks use different formations and varying mathematical algorithms within those formations in order to automate the data examination required for neural networks to extrapolate comprehension and identification of that data. The numerically translated datasets enter through the input layer, and are subject to different mathematical operations conducted in each separate layer of neurons. These mathematical operations, which can also be mapped out by the dataset numerical scale, progressively classify, filter and dissect the datasets. The final output layer contains the last mathematical operations that produce the neural networks predictive computation of the dataset.

Backpropagation in the Machine Learning process involves calculating the error present in the neural networks dataset extrapolation and then readjusting the networks mathematical algorithms to maximally remove that calculation error. The process of backpropagation is conducted through trial and error by determining particular weight/bias effect on network error through the partial derivative function and then readjusting those weight/bias values with the Gradient Descent function. After a cycle of Backpropagation is completed, Forward Propagation is restarted with this forward and backward process being repeated until the minimum possible amount of numerical error exists between the neural network predictive computation output and the Ground Truth labels (actual outputs).

The entirety of the Machine Learning process is carried out in order to train a neural network/ML model so that the network algorithms accurately produce the correct predictive examinations of a training dataset where the correct examinations are already known. This training process is conducted so that when completely novel but similar datasets (not from the training dataset) that require a similar analytical resolvement are inputed into the neural network, the networks algorithms have "learned" (through the above training process) how to automatically produce accurate predictive interpretive resolvement of those similar novel data inputs. Considering present commercial and medical deployment, the above Machine Learning process and resulting principle of automated ML model functionality allow the reasoning for the term Artificial Intelligence being a descriptor of these systems regarding system corridor of computation. Where Artificial Intelligence equates to the human level ability to solve tasks.

__(7) Definition Of AI References__

## Comentarios