Read Latex

Friday, April 13, 2018

A Blazing Fast Introduction to Machine Learning

Introduction

In what follows I'm going to talk about Artificial Intelligence (AI), Machine Learning (ML) and Artificial Neural Networks (ANN's). More to the point, "What are they good for?", in practical terms. If you are wondering what you can use them for the answer is, "most everything". From light to radio, from radio to sound, from graphics to games, from design to medicine, the list goes on. Here are some working concepts.

Artificial Neurons (ANN's) are an abstraction of Biological Neurons. The first thing we notice is that biological neurons use "many inputs to many outputs" connectivity. So in a mathematical sense they are not classic functions, because functions only have one output.
Anatomy of a Neuron
Biological Neuron

Artificial neurons have a "many inputs to one output" connectivity. So they are functions. Functions can have many inputs, provided they only have one output.


Artificial Neuron aka 'Perceptron'

This apparent shortcoming is remedied by connecting the output of a single artificial neuron to the inputs of as many other artificial neurons we want. This happens in subsequent or "hidden" layers, restoring their power by forking or duplicating their outputs. It seems forgivable to think of biological neurons working this way also, but improvements in the future may reexamine this assumption. The Perceptron link records its invention in 1958 where it was envisioned as a machine rather than a software entity. This pivoting between software and hardware continues as special purpose processors are being developed to speed machine learning computations.


Neural Network with Four Layers


Another thing we notice about the Artificial Neuron is that the magnitude of the output of any given neuron is clamped to some maximum value. So if you are in space staring at the sun, your brain doesn't fry because of the neural output, it fries because you standing next to the sun. How peaceful.

I would be remiss here if I didn't mention that until recently, programming languages implemented the concept of functions in a similar way, many inputs were allowed but only one output was returned per function call. 



Python, which has become the de facto language of AI allows one to return many outputs from a procedure call, thus implementing many to many relations. This is extremely convenient, amplifying the expressive power of the language considerably.

There are many details one must attend in programming neural nets. These include the number of layers, the interconnection topology, the learning rate and the activation function - that include S-shaped sigmoid function shown above at the tail of the Artificial Neuron. Activation functions and come in many flavors. There are also cost or loss functions that enable us to evaluate how well a neuron is performing for the weights of each of its inputs. These cost functions come in linear, quadratic and logarithmic forms, the latter of which has the mystical name "Cross Entropy". Remember if you want to know more about something you can always google it. 

Great strides have been made in neural networks by adjusting the input weights using "Gradient Descent" algorithms which attempt to find optimal combinations of input weights that maximize the effectiveness of each neuron. A neuron has many inputs to consider - many things shouting at it simultaneously -  and its job is to figure out who to listen to, who to ignore and by how much. These inspirational and optimal values are "Back Propagated" using the Chain Rule from our dear friend Calculus. This is repeated until the ensemble of neurons as a whole are functioning their best as a group. The act of getting this to happen is called "Training the Neural Network". You can think of it as taking the Neural Network to school. So the bad news is, robots in the future will have to go to school. The good news is that once a single robot is trained a whole fleet can be trained for the cost of a download. This is wonderful and scary but I digress.

TensorFlow Playground

Before we go any further, you must visit TensorFlow Playground. It is a magical place and you will learn more in ten minutes spent there than doing almost anything else. If you feel uneasy do what I do. Just start pushing buttons willy nilly until things start making sense. You will be surprised how fast they do because your neurons are learning about their neurons and its peachy keen.


TensorFlow Playground

Types of Neural Networks

CNN - Convolutional Neural Networks

Convolutional Neural Networks are stacks of neurons that can classify spatial features in images. They are useful for recognition problems, such as handwriting recognition and translation. 


Typical CNN
CNN's can also be used to recognize objects in an image such as these that occur in the CIFAR database. In this case the input to the CNN is an image and the output is a word such as "truck", "cat" or "airplane".


CIFAR Database
MNIST is a famous database of carefully curated handwriting samples used to train and subsequently recognize handwriting.

MNIST Training and Recognition
CNN's can also be used to transfer the style of one image to another as in Google's Deep Dream generator.  


Deep Dream Generator

RNN - Recurrent Neural Networks

Just as Convolutional Neural Networks can be used to process and recognize images in novel ways, Recurrent Neural Networks can be used to process signals that vary over time. This can be used to predict prices, crop production, or even make music. Recurrent Neural Networks use feedback and connect their outputs into their inputs in that deeply cosmic Jimi Hendrix sort of way. They can be unwound in time and when this is done they take on the appearance of a digital filter.
Unwinding an RNN in Time
RNN's can be used to predict the next most likely word in a sentence. They can also continue patterns seen in periodic functions.


Predicting Periodic Functions with an RNN


AE - AutoEncoders

Autoencoders are a unique topology in the world of CNN's because they have as many output layers as input layers. They are useful for unsupervised learning. They are designed to reproduce their input at the output layer. They can be used for principal component analysis (PCA) and dimensionality reduction, a form of data compression. 

RL - Reinforcement Learning

With Reinforcement Learning a neural net is trained subject to conditions of rewards, both positive and negative until the desired behavior is encoded in the neural net. Training can take a long time, but this technique is very useful for training robots to do adaptive tasks like walking and obstacle avoidance. This style of machine learning is one of the most intuitive and easiest to connect to.


Components of a Reinforcement Learning System

GANS - Generative Adversarial Networks

GANS are useful for unsupervised learning, an echelon above routine categorization tasks. They typically have two parts, a Generator and a Discriminator. The Generator creates an output, often an image, and the Discriminator decides if the image is plausible according to its training. In the MNIST example below the gist of the program is, "Draw something that looks like a number". In an interesting limitation, the program does not know the value of the number, but simply that the image looks like a number. Of course it would be a quick trip to the trained CNN to get the number recognized.
GAN Instructed to "Draw Something That Looks Like A Number"

Conclusion

This short note details several approaches to and application of machine learning. Hope you found it interesting. For more information just follow the links above.