Read Latex

Thursday, April 04, 2019

Computing and the Future HW 9 - Spoil Sports of the Prediction Game

Q1) Find videos on youtube about each of the spoilsports. Give
  • the web addresses of the videos
  • your commentary on them
  • how good they are.

5/5 The Observer Effect
This video is entitled, "The Quantum Experiment That Broke Reality". It is found when searching for, "The Observer Effect". It a PBS-produced 13 minute piece that starts by reviewing the double slit experiment, but extends the kinds of particles that are used, from photons, to electrons, to buckyballs with 60 atoms each. It reviews the work of Niels Bohr and Werner Heisenberg. It captures the notion of the wave function more completely, in ordinary language, and speaks to how the wave function itself appears to examine, "every possible path", as Feynman often speaks of. It raises the question, "How does the wave function know where it should land so as to complete the interference pattern when delivered one particle at a time. The narrator provides "The Copenhagen Interpretation" that is is not until the position of the particle is detected (measured) that its location is decided (determined). Until that moment, the particle lives in a superposition of possible states that include every possibility. This measurement requires the presence of an Observer in the form of a measuring instrument which can have an effect on the outcome, as we saw in the Dr. Quantum Double Slit Experiment Video.

5/5 The Heisenberg Uncertainty Principle (HUP)
This video is entitled "Understanding the Uncertainty Principle with Quantum Fourier Series. It is also a PBS-produced 15 minute piece. I chose it because Fourier series was the first mathematical method used to codify Heisenberg's principle. The narrator develops an analogous uncertainty principle for sound waves and introduces the notion that momentum is a generalization of the notion of frequency. This affected me greatly.  This can be demonstrated beyond the scope of the video as follows: Say you want to know the momentum of a photon of a certain color, that is how much pressure the photon will exert when it reflects off of a mirror. To find the momentum of a photon you multiply its color (its frequency) by a constant. That constant is Planck's constant divided by the speed of light which is also a constant. The Born rule tells us the probability that we will find that the particle, the wave function we are looking for in a specific position. But in fact it only gives us a range of probabilities. 
 

I have been looking for a long time to represent all common calculations as interval arithmetic. That is, instead of adding two numbers we are uncertain of, we instead add two ranges or intervals that include the number, and account for our uncertainty. So instead of adding numbers of everything we add intervals in which the numbers are certain to be. Here's a quick example:

Let's say we want to add 2 and 4, but we aren't sure that two is exactly 2 or that four is exactly 4, but that each could be off by 1 in either direction. We would say that [1,3] + [3,5] is the actual situation we are trying to represent. Now in the case that both numbers were at the minimum of their possible values the answer would be 1 + 3 or just 4. In the case that both numbers were at the maximum of their possible values the answer would be 3 + 5 or just 8. We would then write the result as [4,8] and be certain that we were correct. To collapse this notation to ordinary numbers we would just average the extrema, 4 and 8, to obtain 6, which correctly answers the original problem. Now let's try multiplication of the two numbers in a similar way. We repeat the process above using multiplication everywhere addition was used before to obtain [3,15]. A problem pops up with multiplication in that we can no longer average the values a the endpoints of the interval to produced a correct, 'collapsed' calculation, because the multiplication of intervals (or errors) behaves differently than the summation of same. Computing the product turns out to be the min of all possible products on the left and the max of all possible products on the right or [min(3,5,9,15),max(3,5,9,15] which is just [3,15], but now collapsing does not work out to the average because the errors multiplied instead of added!

It is important to remember that HUP is a product rule stating that the product of our uncertainty in position and momentum must be greater than or equal to some constant.


We can use the delta 𝝙 character to imply differencing or σ character to imply the standard deviation. Let's try using interval arithmetic as an estimation method for the uncertainty principle. In that case we can restate the uncertainty principle as:
where x and p represent the position and momentum of the particle after some kind of measurement event. The 1 subscript implies 'before the event' and the 2 subscript implies 'after the event'. Now since photons have no mass we have to use de Broglie's equation for the momentum of a photon which multiplies it's frequency (color) by a composite constant. The composite constant is Planck's constant divided by the speed of light. Assuming k = h/c we write:
Referring to the frequency or color of light directly is a little clunky so we can use the wavelength λ instead which removes the need to use a composite constant. I find terahertz frequency numbers harder to remember. Remembering the color of light in nanometers is very easy - red is 800 nm and blue is 400 nm. In this case we have:
As we discovered above, interval products require we compute the interval:
If we do this for an example problem we notice that all the p values contain a factor of h which conveniently cancels out of both sides. This is easier than lugging h along due to its miniscule magnitude.

For our example problem let's assume that the wavelength, and therefore the momentum of the photon did not change at all. What uncertainty in the position of the photon would this confer on us? First we make lambda the same before and after the event:
Since lambda is a common factor, we cross multiply it to move it from the interval to the right-hand side:
At this moment, with no loss of generality, we can assume that x1 is less than x2, which enables us to resolve the max and min functions and obtain our final general purpose result that accounts for photon momentum even though photons are massless:
Now let's plug in some numbers and ask, what uncertainty in the position of the photon is conferred on us when we are exactly certain of the momentum? Now we can notice that both sides of this equation have units of meters. This allows us to express the result in the non-dimensional format:
This is a very exciting result! It tells us that no matter what the wavelength is, that the uncertainty in position is at least 1/4π, which is about 1/12, which is about 8 percent. In other words, we can know no better than within 8 percent the position of a photon by any measurement no matter how careful we are.
This provides a quick back-of-the-envelope method for classroom use.

This 8 percent number is quite important to resolve the following question I had:
I wanted to know relationship between the Uncertainty Principle and the Nyquist sampling theorem. The latter states. "We must sample a periodic signal at least twice as often as the highest frequency appearing in the signal", if we want to reproduce that signal. Harry Nyquist created this principle in the context of communication theory, but it reminds me very much of HUP and made me wonder if there isn't some fundamental connection between the two. But Nyquist is on the order of twice, or half, depending on how you slice it, while my HUP photon calculation is on the order of 8 percent, so that tends to disconnect them as fundamental ideas.
The Nyquist principle, which is also a great 'spoilsport' in its own right, explains why we get different results depending on how finely we sample the signal due to aliasing. You may be familiar with spatial aliasing as, "the jaggies" on old-style computer displays. Temporal aliasing occurs when we sample at less than the Nyquist rate in time resulting in propellers, wagon-wheels and helicopter rotors rotating backwards or even standing still. This suggests we can use the sampling rate to appear to travel in time by producing an alias that would resemble the actual phenomenon. Dr. Quantum's double-slit experiment video discusses how the presence of an observer causes a degeneration in the results, but I was curious as to whether or not this is a sampling rate, and therefore a Nyquist sampling rate issue. As we can now see, they don't appear to be related.
5/5 Quantum Tunneling
This video is entitled, "Is Quantum Tunneling Faster Than Light". It is found when searching for the phrase, "Quantum Tunneling". It is also a PBS-produced piece that is 11 minutes in length. It is in the same series as the video cited in The Observer Effect above. In this video the narrator lays out the fact that during quantum tunneling, a photon can appear to move faster than light when compared to an identical untunneling counterpart. But, this faster than light movement is confined to the uncertainty in force for that position and momentum. This means below the level of uncertainty in position and momentum ANYTHING can be happening, but above that, normal quantum rules apply... if you can call quantum weirdness normal.

5/5 The Butterfly Effect
The instructor was kind enough to let us view this video, which I had recently seen independently, in class. The most important line in the video is that, "miniscule disturbances neither increase or decrease the frequency of occurrences of various weather events [], the most they may do is modify the sequence in which these events occur." I made a passing reference to this critical, but little known aspect of the Butterfly Effect. This little-know and little-understood point was made clear in just 13 minutes of high quality computer graphic presentation.
4/5 External Perturbations 
For me the term 'External Perturbations' translates to 'Unwanted Experimental Noise'. In the electronics context there several sources of noise. These are covered in the CalTech video entitled,  "Physics of Shot Noise, Burst Noise, Flicker Noise" in a lecture given by Ali Hajimiri. He did not discuss Johnson-Nyquist thermal noise which was characterized in 1928. The level of mathematical aesthetic  was highly refined - something I've come to expect from CalTech content. I would have given this video five stars, but the sound quality was a bit off, and the lecture wasn't from prepared slides which would have shortened it somewhat, as in the PBS presentations. Hajimiri talks about the transit time of electrons and the intermittent arrival of charge being the source of shot noise. This reminds me of the intermittent change in forces that cause Brownian motion that we discussed in class. Wires and resistors do not manifest shot noise, but p-n junctions, capacitors and other gapped devices do. You are literally measuring quanta of charge in gapped devices from tunneling sorts of arguments. Burst or 'popcorn' noise comes from the trapping of electrons and their subsequent release due to crystal imperfections. Next Hajimiri discusses Flicker or 1/f noise that has entire conferences dedicated to its study. At SIGGRAPH in the eighties I attended a seminar on 1/f noise in the context of fractals by Richard Voss, a student of Benoit Mandelbrot, after whom the Mandelbrot fractals are named. It was fun to revisit this topic. It gets the name 1/f because of the log of its power spectral density curve rolls off with linearly with the log of the frequency. It is also called 'pink noise'. Hajimiri also talks about power laws: If we plot event count on the y axis and size of the event on the x axis, there are lots of 'noise' events in nature that follow this rule. For example there are many small magnitude earthquakes for every larger earthquake. So nature let's lots of small noise events happen for every big noise event that happens. Stock market changes also follow a 1/f noise curve. Hajimiri won the Feynman prize for teaching and has over 100 patents. A complete list of his lectures is here.


5/5 Existentialism
This video, Existentialism: Crash Course Philosophy #16, was 9 minutes long and was less formal than the previous PBS and CalTech videos. It was a commercial, produced ironically by PBS, for several existential points of view from philosophers beginning with Plato and Aristotle and their notion of 'Essence' defined as the thing that makes a thing what it is. A person's purpose derives from their Essence, as in, born to do a certain thing. This gives rise to the dogma of Essentialism. The narrator then ventures to Nietzsche and Nihilism, which is the dogma that life is ultimately meaningless. Then we get to John-Paul Sartre who returns us to Essentialism. This with a chicken vs. egg approach of Existence vs. Essence and it's our job to figure out what that one thing is, as Billy Crystal hears from Jack Palance in the movie, 'City Slickers'. He then leads us to theistic Existentialism, Kirkegaard and teleology, defined as the world was or was not created for a reason and the Absurd define as the search for Answers in an Answerless world. The Narrator then returns to Sartre and the fact that we are painfully free, so free that we must construct our own moral code in the absence of any standard code to live by since there is no absolute authority. This in turn gives rise to living authentically. The video wraps up by quoting the French philosopher and author Albert Camus who said, "The literal meaning of life is whatever you're doing that keeps you from killing yourself." This is my new favorite quote.

5/5 The Care Horizon
This video topic did not respond well to a search, so I used, "The Time Value of Money" instead since that derives directly from the lecture on that topic. I found a 3 minute video narrated by a fast talking economist. It's title was, "Time Value of Money - Macroeconomics 4.3". The narrator opens with a delay discounting question, "Would you rather have $100 now or $200 at some time in the future". He then breaks down exactly what he means by "at some time in the future" and provides interest rate equations for both the future value of the money, and the present value of the money compared to its value in the future. It was quick, mathematical and useful so I'm giving it five stars.

Q2) Discuss how each of the "spoilsports of prediction" applies or does not apply to your project topic.


My project has two parts, artificial neural networks (ANNs) and transcranial magnetic stimulation of real neural networks (rNNs) which I am simulating using an anatomically accurate model. I will answer each of the seven topics for both the ANNs and RNNs. Since both are neural, the answer will sometimes be the same.

The Observer Effect
The Observer Effect in ANNs is very interesting. The cost of documenting each of the millions of decisions a deep layer neural net is making is cost prohibitive. This fact leads to, "The Explainability problem in AI" which I have discussed at length in previous assignments. In short it is not currently possible for an ANN to explain itself and be computed in reasonable time.
The Observer Effect in rNNs can be seen with performers, actors, thespians and speakers. When they know the people in the audience they have different response than when the audience is anonymous. We also see the Observer Effect in candid camera situations where people become aware they are being recorded and behave differently. 

The Heisenberg Uncertainty Principle (HUP)
The manifestation of HUP in ANNs is less pronounced since it is a completely digital system. However ANN's do not give the same answer every time because the style of computing is inherently nondeterministic. This takes some getting used to when working with Machine Learning systems, especially if one has come from a deterministic computing background. For simple models it is possible to seed the random number generators such that they create reproducible results, but in general one can get significantly different answers from run to run with ANNs.
The manifestation of HUP in rNNs is a critical issue. In Transcranial Magnetic Stimulation, one cannot excite individual neurons because the certainty in position of the magnetic field is much larger than the neuron itself. Thus only tracts of neurons can be stimulated. In the case of my project I am trying to use moving permanent magnets instead of electromagnetic pulses being emmanated by TMS coils and capacitor discharges. My approach is similarly encumbered by the fact that the magnets are much larger than the neurons and also in motion. 

Quantum Tunneling
The effect of Quantum Tunneling in ANNs is that of being a facilitating principle. The Machine Learning codes are running on computers that actually use tunneling transistors to enable the hardware to work.
The effect of Quantum Tunneling in rNNs is by analogy. If a neuron is stimulated often enough, at high enough frequency, with sufficient potential, it fires according to its activation function. Tunneling and neural firing are both threshold phenomena. 

The Butterfly Effect
I translate the Butterfly Effect in the neural network case to ask whether small changes in neural state can have large differences in output. This is dependent on the architecture, topology and structure of the deep layer neural net, whether artificial or real. There are certainly ANNs whose architecture is such that the Butterfly Effect is in play, and similarly for rNNs operated by real people. But ANNs and rNNs can be such that small inputs do not cause large changes in outputs of the two systems. So in summary this spoilsport is architecture dependent. There are mellow people and mellow ANNs who don't fly off the handle, or change their computation at the slightest provocation. A good counterexample is driverless vehicle ANNs which must respond in real time to small changes in input, if that input feature represents a danger in the environment.

External Perturbations 
I translate the External Perturbations in the ANN case to ask whether the presence of environmental noise can affect the outcome of the experiment, or in the case of digital electronic case whether noise sources described above affect the outcome. The first comment is that the advent of the digital revolution immediately eliminated the noise that limited the complexity of analog computing systems. Given that the hardware was functioning properly, all decisions were mapped into the Boolean subspace of [0,1] with no intervening continuous maybe's from shot noise, popcorn noise, thermal noise or 1/f noise. Floating point number were then constructed from sequences of Boolean numbers preserving the reproducibility of the outcome. This simple representational decision gave rise to the entire digital revolution. Now we must ask if an earthquake or a power outage or a flood can affect a computer running an artificial neural network. Again this is a threshold phenom. Either the machine is running properly and we can count on reproducible results, or it is not, there is little in-between.
We can ask if External Perturbations affect the rNN case and the answer is certainly and in the analog sense, continuously. People are affected by noise and the TMS equipment can be affected by noise and external perturbations leading to comments like, "It worked better when the rotor was closer to the subject."  

Existentialism
It would seem the quest for meaning is one of the motivating factors of the Artificial Intelligence revolution to start with. We design system ANNs modeled after ourselves both to save labor and as a method of self-reflection.
With respect to rNNs we are the neural networks that are searching for meaning and having the existential crisis in the first place! It makes sense that we would want to probe our minds and ask, if, and to what extent, magnetic fields might be used to engage them.

The Care Horizon
The time value of ANNs can be seen in the application of Deep Learning to skin cancer detection. A 1 mm deep skin cancer lesion can be excised with no harm. A 4 mm deep skin cancer can spread cancer through the entire body and kill the patient. The time value of early detection in skin cancer is thus a matter of life and death.  Skin cancer and melanoma detection is one of the recent areas where great strides have been made. 
The time value of rNNs - how would we phrase that. Would we rather have a learned brain now, or be twice as learned in four years. This it seems is what education is all about. Will transcranial magnetic stimulation affect education? That remains to be seen. 

Q3) Take a favorite topic (your project topic is a good one). Discuss where it may be in 5 years, 10, 20, 50, 100, 200, 1,000, 10,000, and millions of years.


To make the process of answering this question more deterministic, more plausible and more interesting I used:

  • Moore's Law for cost
  • Gilder's Law for Bandwidth
  • Metcalfe's Law for Network Benefit

For our purposes here these laws are:

  • Moore's Law states every 1.5 years computers halve in price.
  • Gilder's Law states that bandwidth doubles every 0.5 years.
  • Metcalfe's Law states network benefit goes as the square of the users.

To populate the spreadsheet for a machine learning problem I contrived an example where there was a GPU-month of training time, which is typical. I used figures from Amazon Web Services for Training, Memory and Prediction Costs. I assumed the resulting system would receive a usage of 1000 predictions per month. The first observation from the simulation is that the 10,000 and one-million year event horizons are undecidable numerically. I used a current world population of 7.7 billion and an annual population growth of 7 percent. One interesting outcome is that the current hourly cost of training a machine learning system is about the same as that of a domain expert in the same discipline. This may bode poorly for domain experts who make their living consulting in the future!





One thing that these laws tell us is how much faster to expect the training system to run, but because these services are already being vended to us using concurrent platforms at Google, Amazon, Apple and Microsoft there seems to be little value in saying how fast running a query or prediction will be. The real cost is in training the model. If the model has already been trained, the answer is effectively instant. Training can take from hours to weeks depending on the complexity of the ANN and the amount of data in the training set. 

Finally, notice that none of the three laws include saturation to a maximum, or peaking followed by roll-off.  To model these effects with effective prognostication would be a time-intensive effort, with limited benefit, since we do not know the saturation or roll-off figures. If we were to attempt a more sophisticated model, we would need to know how many people the earth landmass and oceans can support. For the long term there would have to be an accounting of the sizeable colonies on the Moon, Mars, Mercury, Europa, and Enceladus as well, not to mention those articulated in the Asimov short story.

Q4) Read Asimov's "The Last Question" 
Critique and comment.
Asimov called this story, "One of my favorites", written in the year of my birth, 1956. In several iterations he goes about predicting large mainframe computers, server farms, networking, wireless communication, and the personal computing revolution. A question is asked at each generation of computer development is, "Can Entropy Be Reversed?". The answer from the automaton is the same at all incarnations but the last one, "THERE IS INSUFFICIENT DATA FOR A MEANINGFUL ANSWER."  
I found the story to be entertaining and a fundamental search for meaning. Since the version I listened to was read by Asimov himself it containing interesting verbal idiosyncrasies of his own upbringing that reminded me of the time I've spent with rabbi's and distinguished Jewish people including holocaust survivors. Subtle banter, argumentation and then the sounds of children playing punctuate the story and carry it along.
The story constitutes a prophetic piece of the future of computing with an accuracy even Nostradamus would envy. So I began to be interested in what Asimov got wrong in the story, as opposed to the amazement I had at what he got right. This is interesting to me since the story is highly predictive and there is no direct way Asimov could have known how things were going to turn out 62 years ago. 
One of the things he got wrong was the 'A' in the acronym AC which stood for Analog Computing. Computing went digital instead of analog. He spoke of the clicking of relays, which were digital, then observed that those relays were replaced by switching transistors, which hardly existed in 1956, then by subatomic particles. He got the tendency for rapid population growth correct and the tendency for humans to look for new quarters whenever this happened. 
Another thing he may have gotten wrong was interstellar travel being commonplace, since the speed of light still seems to place that firmly out of reach. But who knows? There's still time for wormholes...

Q5) 
Write up 250 words for the equivalent amount of work [on your project] explain specifically what you did.
This week was spent recovering, successfully, from a hardware failure that fried one of the boards that I was using to control the lighting of the rotor. This necessitated a revision the power supplies so that everything in the vTMS unit could be operated at 12-24 VDC. Of particular difficult was rebuilding the LED panels. This was necessary so the unit could operate at low (and safer) DC voltages instead of operating at 170 volts - which during a test blew a fuse and turned off all the lights. A load limiting resistor was installed after computing its optimal value so that the LEDs are not damaged by excessive current. Now the vTMS unit turns at variable speeds and the LED lighting panel behind the rotor brightens and dims according. This meant many hours soldering controls and configuring the enclosures that hold the rotors, the motors, the lighting panels, the variable speed control, the on-off switch, and the direction reversing switch. Extensive modeling of the all the parts was done to enable the clearances to be optimized. It is a tight fit to get everything in the enclosure to the point that final assembly is facilitated by a tongue depressor blade. Once assembled all the parts have adequate clearance. It would not have been possible to use off the shelf enclosures without modeling the components to determine their sizes and positions. The final wiring is the most difficult step and along with final assembly can best be described as building a ship in a bottle. Not shown here are the three controls at the back of the unit. A revised wiring diagram is also being developed. In addition to this work a formula for the gelatin brain has been developed and an experimental schedule to determine the optimum concentrations and electrical properties of the ingredients. 

  • Prepare an outline for your paper (if you are doing a paper). You can use this outline later to help organize your report or other product, presentation, etc. If not doing a report, do something of equivalent effort instead and describe. 
The outline for my paper, is identical to the outline for my presentation which is presented below. Each slide will generate the verbiage to be included in my final report.

Q6) Provide an outline for your presentation:

Twenty Slides

Part One: TensorFlow Playground and Basis Functions for Prediction


  • Slide 0: A Blazing Fast Introduction to Machine Learning
  • Slide 1: The "Discovered Algorithm" Slide
  • Slide 2: Hyperparameters
  • Slide 3: Explainability
  • Slide 4: Basis Functions: Explicit Functions: definition and example
  • Slide 5: Basis Functions: Implicit Functions: definition and example
  • Slide 6: Basis Functions:  Parametric Functions: definition and example
  • Slide 7: Basis Functions: Iterated Functions: definition and example
  • Slide 8: Basis Functions: Chaotic Functions: definition and example
  • Slide 9: Do not ask a Cat AI what kind of Dog This Is
  • Slide 10: The Existing Tensorflow Playground and a Measurement
  • Slide 11: Modifications to Tensorflow Playground
  • Slide 12: The Modified Tensorflow Playground and a Measurement

Part Two: Transcranial Magnetic Stimulation with Permanent Magnets


  • Slide 0: The Question: Can we perceive a changing magnetic field?
  • Slide 1: The Follow Up: Can Permanent Magnets Produce Perception?
  • Slide 2: Rationale: Why Two Rotors
  • Slide 3: The Caltech Video
  • Slide 4: A Blazing Fast Introduction to TMS
  • Slide 5: My first TMS experience
  • Slide 6: My second TMS experience in the clinic.
  • Slide 7: Building the vTMS† and vBrain†
  • Slide 8: Demonstration of the vTMS†

Q7) Grad students only: continue with the book you obtained. Read the next 20 pages. State the book title, author, and page numbers you have read. Discuss the pages, explaining what you agree with, disagree with, and how your views compare with those of other reviewers on Amazon or elsewhere.

Reviewed This Week: 

  • Chapter 20 - The Teeming Cities of Mars
  • Chapter 21 - Big Ice
I have moved this answer to my ongoing review of the book, "The Human Race to the Future" a single curated document that is here.

This document has become rather large, so I have developed a tool to ease converting it into a form compatible with the blog. There are now 320 review notes for the first 21 chapters.

Amazon Kindle reader, has some note-taking shortcomings. One is that the notes I make are written out from the Kindle reader without Chapter Headings or page numbers. Kindle instead uses the abstraction of Locations, which are invariant of which mobile or desktop device is being used to read the book. So the Chapter headings have to be preserved manually. Another shortcoming is that the output of the notes I make are in html which is different in style than the blogger html. There are also a number of punctuation errors in the html, entirely due to the crappy Kindle html translator that I correct using the new script. Below is a the snapshot of the Unix bash script that saves some labor, but not all, in making the Book Review changes compatible with Google Blogger. I historically avoid perl like the plague, but I was forced to use it because of a text processing issue called 'non-greedy matching' which I won't bore you with.




No comments: