Read Latex

Saturday, January 19, 2019

Computing and the Future 3 - Algorithms and Prediction


Underfitting and Overfitting the Future


An example of Python Jupyter Notebook running in the MyBinder environment is "Fit - Polynomial Under and Overfitting in Machine Learning", written by the author. Because the source code is present in the notebook it can be adapted to search for polynomial laws that predict trends such as those modeled by Berleant et.al. in "Moore’s Law, Wright’s Law and the Countdown to Exponential Space". MyBinder enables web publication of a Python Jupyter Notebooks. There is a little overhead (yes, waiting) to run these remotely, but it is not punitive compared to the utility obtained. The user can develop in the high-performance personal environment for high-speed, privacy and convenience,  then deploy the result in a public setting for review and general edification of the many.

MyBinder Deployed Notebooks


Taylor Series Expansion for Future Prediction

The Taylor Series approximation of a function is a mathematical way of forecasting the future behavior of a spatial or temporal curve using a polynomial and derivatives of the function. With each additional term, the approximation improves, and the expansion is best in a local neighborhood of the function. Here is an animated example I did for John Conway. In this example we are trying to approximate a cyclic future, represented by the red sine wave with polynomials of ever higher degree, starting with a green line, a blue cubic, a purple quintic and so forth. The more terms we take, the better our forecast is, but our horizon of foresight remains limited. Notice that if we had chosen to represent our future using cyclics like sines and cosines, our predictions could be perfect, provided we sampled at the Nyquist frequency or better.




Taylor Series introduces the notion of a "basis function". The world looks like the sum of the primitive functions that one uses to model it. In the above example we are trying to model or approximate a periodic function - the sine wave, with a polynomial, that is not periodic. This phenom appears in machine learning also. If you do not have a given feature (aka basis function) in the training set, that feature will not be properly recognized in the test set.

Identifying Fallacies in Machine Learning


Consider training an AI to recognize different breeds of dogs. Then show that same AI a cat. It will find the dog that is closest in "dog space" to the given cat, but it still won't be a cat. If the basis function one chooses does not mimic the property you want to detect or approximate, it is unlikely to be successful. This behavior can be seen in the TensorFlow Neural Network Playground mentioned previously. This is an important principle that helps us to cut through deceitful glamour, false-mystique and unrealistic expectations of machine learning. It is so fundamental we could place it in the list, "Fallacies of Machine Learning", filed under the header, "If all you have is a hammer, everything looks like a nail". See "Thirty-Two Common Fallacies Explained" written by the author.

In conclusion, we discover, "the basis function should have behaviors/ingredients such that their combination can approximate the complexity of the behavior of the composite system", be they cats, or sine waves. Basis functions occur in finite-element analysis of structures which asks "Does the bridge collapse?". They appear in computational geometry as rational B-Splines, rational because regular B-Splines cannot represent a circle. Periodic functions appear in Fourier Analysis, such as the wildly successful Discrete Fourier Transform (DFT), audio graphic equalizers and Wavelet Transforms, a class of discrete, discontinuous and periodic basis functions. The richness of the basis functions we choose strictly limit our accuracy in prediction, temporally (the future), or spatially (the space we operate in). 

Genetics Algorithms

Predicting the future can be made to look like an optimization problem where we search a space of parallel possibilities for the most likely candidate possibility.
Sometimes the space that we want to search for an optimal solution is too large to enumerate and evaluate every candidate collection of parameters. In this circumstance we can use grid sampling methods, either regularly or adaptively spaced, refining our estimates by sampling those regions which vary the most.


We can use Monte Carlo random methods and we can use Genetic Algorithms to "breed" ever better possible solutions. For sufficiently complex problems, none of these methods are guaranteed to produce the optimal solution. When sampling with grids, we are not guaranteed that our grid points will be on the optimal set although we do have guidelines, like the Nyquist sampling theorem that says if we sample at twice the rate of the highest frequency of a periodic waveform, then we can reproduce that waveform with arbitrary accuracy. If we sample at a coarser resolution than the highest frequency then we get unwanted "aliases" of our original signal. 

An example of spatial aliasing is the "jaggies" of a computer display:


An example of temporal (time) aliasing are propeller spin reversals when filmed with a motion picture camera:


But sometimes the future we are trying to predict is not periodic and "history is not repeating itself". But forget all that. I mentioned all this to tell you about genetic algorithms. These are explained here by Marek Obitko who developed one of the clearest platforms for demonstrating them. Unfortunately Java Applets no longer work in popular browsers due to the discontinuance of NPAPI support. A world of producers and consumers demonstration is here.




Approximation and Refinement of Prediction


Sometimes we want to make an initial guess, and then refine this guess. An intuitive model for this is Chaikin's algorithm:


In this case we have some expected approximation of the future represented by a control polygon with few vertices. As we refine the polygon by recursively chopping off corners we end up with a smoothly curve or surface.

Iterated Systems

These are my favorite, so much that I've written a book on them. They are truly "future-based" equations that only assume that the future evolves from the past according to some set of steps. Because of their similarity to fractals I like the chaotic nature of the representation. If we want to model a chaotic future we need an chaotic underlying basis function. 



When the coefficients of the iterated system have a component of "noise" or randomness we can simulate an envelope of possible futures. Take for example the prediction of the future of the landing spot of an arrow. Random factors such as the wind, temperature, humidity, precipitation and gravitational constant (which varies with elevation) can all affect the final outcome. My final project may draw from this area.

Virtualizations

There are some virtualizations that are so effective they have become working principles in science and engineering. An example of one these is the principle of virtual work which is used to derive strain energy relationships in structural mechanics to enable the prediction of whether bridges will collapse at some point in the future. The amazing thing about the principle of virtual work is that a load is put on the bridge and then the displacements of various points on the bridge are imagined, even though the bridge is not really moving at all. The degree of this virtual displacement is used to calculate the strain in each element of the bridge or structure. If in any member element the strain exceeds the strength of the material, that member fails, and the collapse can be predicted and avoided by strengthening just the weak member.

Another virtualization that is timely are complex numbers that occur in wave functions, such as acoustics and quantum computing. (Imagine a sound machine that simulates quantum superposition!) When two complex waves meet they each contain imaginary components that do not exist. Yet if they combine additively or multiplicatively they can produce real outcomes. This is spooky and interesting.

Other virtualizations include:


All of which can be used in computer simulations of complex system like the stock market, terrorism and cancer.

Robert H. Cannon Jr. of CalTech in his 1967  book, "Dynamics of Physical Systems" discusses the convolution integral in control theory which incorporates the past state of the system to continuously impact the current and future states. This book written at the height of the Apollo era was amazingly ahead of its time in codifying control system analysis techniques using Laplace transforms, a complex number transform similar to the Fourier transform.  Kalman filtering can be applied to complex systems where the state of the system is changing rapidly. 

The Stop Button Problem

The Stop Button Problem in Artificial General Intelligence (AGI) is a fascinating study in what happens when what the Robot wants is different from what the Person wants. The video, The Stop Button Problem,  by Rob Miles describes the problem in detail.

The I, Robot Problems mentioned in the ICF course April 2013 based on Isaac Asimovs, "Three Laws of Robotics" discuss also.

Miles proposes proposes a solution to this problem using Cooperative Inverse Reinforcement Learning. 



The take home message is, "Make sure that humans and robots want the same thing or there will be trouble."

Sentiment Analysis

Clive Humby, a UK mathematician has said that, "Data is the new oil". Andrew Ng, a leading ML researcher makes the statement that "AI is like electricity", compounding this information-as-{utility, power, energy} metaphor. I have used the phrase, "Hot and Cold Running Knowledge" to describe the situation we currently find ourselves in.


- From DreamGrow


Social networks like Twitter, Facebook, Instagram, are fertile fields for harvesting user sentiments. User sentiment affect purchasing decisions, voting decision, and market prices for commodities such as stocks, bonds and futures.

Machine Learning is being increasingly used to scrape these social networks to determine sentiments in a kind of superpredictive Delphi method.


Handy's Law in Geotagging

Mentioned in the course notes for ICF January 2014, "Handy's Law" states, "pixels/dollar doubles yearly". Consider Nokia's new five camera design.



The question in my mind is this a tip of the hat to superfluous excess like the fin race of the fifties, a tech/fashion bubble of yesteryear - or does it represent a true increase in utility?

Computational Medicine

This is a separate essay.



No comments: