Read Latex

Monday, June 29, 2009

An Excerpt from Ham Radio Field Day 2009

Because of an exhausting 50 mile bike ride in the hot sun, I couldn't make it to Field Day on Saturday. I woke up late on Sunday, hoping to make some kind of belated appearance.

Just for fun, I started my HamTrack system at 9:47 am - a mashup of Google Earth, CW Skimmer, and C++ programs, glued together with some Unix tools, sed, grep, awk, along with the usual database fiddling and geolocating.

It is an end-to-end automated signal tracking system that translates RF morse code into pins on a map. So I left it running and headed over to the real Field Day, where, after catching up with my buds, I managed an impressive 2 contacts 15 minutes before the end of the event at 1 PM.

When I got home I discovered that 308 stations made 917 calls while I was gone, illustrated as pins in a map below. As in the 24 hour case, (previous blog), pins are colored by frequency, red for 6.9 MHz, blue for 7.1 MHz and spectral coloring in-between. My pin AE5CC is arbitrarily assigned red so I can find it in the sea of pins.

You will need the Google Earth browser plug-in to view the interactive map, and it takes a few seconds to load the data - about the time it takes to read this. If you don't use Google Earth, you're missing the best thing since sliced bread. - AE5CC










Monday, June 15, 2009

An Extreme Soft Radio Adventure - 24 Hrs @ 7 Mhz


After some antenna simulations using 4Nec2 (by Arie Voors) I wrapped a wire around my townhouse to create a loop HF antenna. I was curious if it was working and how the actual propagation pattern compared to my predictions. So I left my software defined radio, a Softrock 6.2 (by Tony Parks and Bill Tracey), running for 24 hours. It turned out to be quite an adventure!
Results: 1138 stations made 4907 calls, illustrated as pins in a map below. The pins are colored by frequency, red for 6.9 MHz, blue for 7.1 MHz and spectral coloring in-between.

Mouse over the map to see calls from the Island of Midway to Puerto Rico in longitude, from Alaska to Florida in latitude.

You will need the Google Earth browser plug-in to view the map, and it takes a few seconds to load the data - about the time it takes to read this. If you don't use Google Earth, there is an image at the bottom of the page. - AE5CC


Sunday, February 08, 2009

Printing Circuits


While talking about building circuits, my very talented friend said to me:

“For me, soldering is a way to turn money into smoke.”

If you think about it, soldering is a very primitive activity. It is a cauldron of molten metal from the Middle Ages whose sole alchemy is making a single connection. A connection known for toxicity and burns. Toxicity, because until recently solder was full of toxic lead, and some are talking about returning to lead because of the whiskers that form with lead-free solder (ROHS compliance). Burns? Burns on both hands from soldering accidents over the years. None of which confer the ability to keep from getting burned again. Solder for regular people melts at 360 degrees F. The iron is over 500 degrees F. A soldering iron left unattended can burn down the house.

Most metals conduct with low resistance. You can, if you try hard enough, get other things to conduct, like certain plastics, but they are never as good.

We are children of the integrated circuit, invented by Jack Kilby at Texas Instruments in 1958. From then on printed circuits took off. Now you can get almost any circuit you can think of for FREE in something called a sample program. You ask the vendor and they send sample chips for free. Then you make something, and if it is a hit, they make back their money because you buy the reel of 15,000.

So we regular people could stand on the shoulders of giants but for one obstruction and that obstruction is soldering.

Soldering makes a metal-to-metal connection, nothing more than a bit of
metal-to-metal logic. Now is the time to retire that connection to the Middle Ages.

That would have happened but for one little bug, one little fly in the ointment and that fly is capacitance.

Most of the time, capacitance is your friend. Want to smooth out the bumps in the road from power or switching? Install a capacitor. But when you want things to happen fast, capacitance puts on the brakes.

This bad kind of capacitance is aptly called parasitic. The solution is to use short or if possible, non-existent, wires.

One might say, "Why not cancel that capacitance with a little inductance? After all they ARE opposites aren’t they?" Well adding inductance to capacitance only throws gas on the fire when it comes to slowing things down because of one itty-bitty formula. Like its friend, E = mc2, that formula is absolutely magical. It is (drum roll please):



Where
L is inductance and C is capacitance. If you want to go fast, you have to make L and C small. The smaller you are, the faster you go until you aren’t there. Like Dylan said, "I'm glad I'm not who I am!" You’re traveling the speed of light, trading inductance for capacitance, in waves that shoot through space.

So what do I want already?

I want to be able to sit at my computer and design a circuit that uses a whiz-bang IC and then I want to click PRINT and have the circuit pattern ready to go. Then I want to GLUE the IC and other parts down to the metallic pattern printed on paper, fire it up and watch it blink or glow or do whatever it needs to.

And I don’t want the whole process of printing and gluing and firing it up to take more than 5 minutes, because that is really about as long as I can stand to wait.

And if I had that, I could make all kinds of things, like radios, and robots and glasses to see Dimension-N, just by printing and gluing and firing them up.

And I don’t want this just for me or my friend, I want it for everybody. I don’t want to build a million dollar lab and make a hundred thousand dollar machine.

I want a hundred dollar machine and two dollar glue.

Then I can live in a world that we can make a better and more interesting place.

Wednesday, July 09, 2008

Sunlight at the End of the Tunnel


Here is the performance of a certain solar cell company as seen from Google finance. Solar is up 1157% over a 34 month period. "Solar has tipped", to quote Malcolm Gladwell. I put this chart at the end before, but nobody reads that far anymore.




Sometimes I like to look at what the market is doing to see what collective wisdom is currently in force. This lets me see if what I’m thinking is what the rest of the world is thinking. I would just hate to be out of step. Here is a graph of fossil energy stocks for the past 34 months. Up about 50% on average. Not quite like solar.




Energy is in blue. US industry is in red. The value of energy is rising faster than the dollar or the value of industry. There are 334 energy companies. Over half, 186, of these companies are worth a billion dollars or more. One company, Exxon is worth half a trillion dollars. I took the time to add up their values. Their combined value is 4.13 trillion dollars. If you gave 4 million people a million dollars each, what might they do? At least be quiet, right?

Now let’s look at what the market says about coal.
Maybe it will say, “"Say No to Coal"”, and we can take our yard signs down.



Nope. The market says coal is increasing in value more rapidly than gold. Here is gold over the same period:





There are 22 coal companies with a combined value of 103 billion dollars. One company, Peabody Energy is worth 21 billion dollars. Guess we better leave the yard signs up. But there is one piece of good news. Solar cells are rising faster than gold or coal or most anything. Over 1000% in the last two years. Don’t you wish you had been in that?


Tuesday, July 08, 2008

PC Security Checklist


As more activities migrate to personal computers, system security becomes a greater concern. Threats to PC security include viruses, Trojans, worms, phishing schemes, buried processes and distracting scams. This note is Wintel-centric but applies to Mac and Linux boxes as well. This note addresses five categories of personal computing security.

A) Physical and Site Security - Routers and Locks

The web connection coming into your house is just another sewer pipe. Treat it accordingly. Use a router, lock it down.

1) Avoid connecting your DSL or cable modem directly to your computer. Instead,isolate your IP address by placing a router between you and the outside world. This also gives you additional ports that you can control access to and from. A router makes it difficult for an outsider to see your IP address (your internet phone number) or your MAC address (your hardware unique identifiers).

2) Install your router where you can see it. Control physical access to it.

3) Change your router name and password to something besides admin, admin.

4) Change your router IP address to something other than 192,168.1.1. Your browser will remember the new address. The router address can be reset by rebooting your router, but not without physical access.

5) The internet is NOT ham radio. Goodwill, Character and Integrity do not apply as in the licensed arts. Use 128-bit WEP or better encryption. Any device that connects to my router (the internet equivalent of a repeater) must have permission.

B) Soft Security - Anti-virus Software

You can do everything right and still get infected.

1) Install good anti-virus software. I currently use McAfee because it comes free with my Scottrade account and I can run three legal copies of it on other computers in the household. I have used Norton, but it costs too much, expires frequently and hogs system resources. I really like the free AVG software. It is excellent and they don't try to elbow out everything else. Computer Associates gives you a free trial and then makes uninstalling a total nightmare. This goes for several other packages. If a vendor doesn't provide a clean uninstaller, don't use them, because THEY are a virus.

2) Use firewall software. Insert exceptions for required sites and services like Echolink.

C) Email Security

Scan inbound and outbound email and attachments using anti-virus software.
1) Don't open attachments from people you don't know.

2) Google gmail allows you to report items as spam. Use it.

3) Report fraud and phishing emails to their respective agencies including the ISP, Paypal, Ebay, FBI and Attorney General. Some ebay frauds have been really authentic looking. Check for spoof URL's before responding.

4) Keep a primary email account, and route all other email accounts to and
from it. This is for convenience as much as security.

D) Browser Security - Plug-ins and Spyware

Try Firefox 3.0 or later. It is multi-platform, open source, and accountable.

1) McAfee red lights troublesome web destinations, including bad ham radio destinations which are rare. I average 490 searches a month so this is quite handy. Other products also do this. Do not let anyone or anything obstruct your access to good information.

2) Don't use products (e.g. Real Video) that monopolize services such as video display and attempt to be the end all. If you give them your name and address you will get on "some list". Some lists go everywhere. Some programs will leave background processes running to report back to the mother ship. Besides invading your privacy these make browsing and computing slow.. AT&T Yahoo DSL is notorious about filling your PC with wasteful market-driven processes. They have destroyed the quality of many a newcomer's experience by marketing them to death. Too many choices.

3) Use Google Safe Search to avoid sites that are a frequent source of viruses. Your computer will get sick. It's karma.

4) Use Microsoft AntiSpyware. Forced by their own losses to develop this product, it works and its free. It is fairly lightweight, process-wise. Enable the auto-download, but require them to ask permission to install. Keep track of what they are adding or subtracting from your computer. Their track record requires them to be supervised.

5) Avoid illegal download sites for music, videos, or software. Your computer will get sick. More karma. Why steal? You will have to make a list like Earl.

E) Kid Security

"Little eyes, watch what you see..."

1) Put kid computers in a public place like the kitchen.

2) Check your kids browser history, chat, IM, Skype, often.

3) Facebook trumps myspace, but not by much. Check online friends and memberships often.

Conclusion

We live in the age of hot and cold running knowledge. Anything that obstructs access to this knowledge is a loss of freedom.

We also live in the wild west of the information age. Forewarned is forearmed.


L. Van Warren - AE5CC

Wednesday, April 09, 2008

Antenna Gain


Gain patterns can be drawn for microphones, radio antennas and light reflecting from surfaces. They are both informative and beautiful.
The following images show the gain of a certain "wideband" herringbone antenna as frequency increases. Gain is simply the sensitivity of the antenna to a signal in a given direction.

When you tune a radio, you are selecting which frequency you want to listen to. But your antenna has to be cooperating by being sensitive to both the frequency of that station, its location, and how the signal bounces off the sky, land, water, trees, mountains and buildings.

So to begin we tune to 1.0 megahertz on our radio dial. In the pictures that follow we will increase the frequency on our radio dial by a factor of ten with each click. That makes for pretty big jumps. I hope to animate the in-between's soon. There are
so many variables one must decide what to show first. In the meantime here is a keyframe warm-up starting at the promised 1 MHz. Captions are below the images.

You Say Tomato
1 MHz - Radially symmetric pattern, more gain at top than bottom.

I Say Potato

10 MHz -More gain at the ends than the middle.


The Edges of Lambda

100 MHz -Nature is more beautiful than I can imagine.



Butterfly Spectacular

1000 MHz -Think about this next time you tune a radio.

The last picture is around the frequency of cellphones and some cordless phones. But their antennas actually have blobby radiation patterns like the first example. Can you think why that might be so?



Friday, March 28, 2008

Copy Number Variation: The Next Big Thing


Copy number variation (CNV) is an important issue in genetics.

It has a beautiful mathematical notation suggestive of text processing:

Huntington’s chorea, produces dementia that does not appear till middle age. It is caused by the presense of too many CAG repeats. What was I saying? Oh yes:

CAG repeats involve three of the four DNA bases, Cytosine, Adenine and Guanine:


If there are too many CAG’s in succession on the short arm of chromosome 4, that individual will develop Huntington’s. Period. Unlike Huntington's, which is caused by repeats, there are other diseases caused by single point mutations. Recent genome studies have focused on these mutations, called SNP’s, and pronounced “snips”, which stands for Single Nucleotide Polymorphisms. This is a fancy word for one letter of DNA, being substituted for another. A bug in the code as it were.

If you live in a world like I do, where the internet is a connected series of pipes, here is how to cure someone of sickle cell anemia, a notable SNP-caused disease.


In a text editor, like “vi”, edit chromosome11.txt:


1)Find line containing beta-globin gene.

:/beta-globin

2)Code for glutamate instead of valine.

:s/GAG/GTG/

3) Save changes and exit file.

:wq

This single SNP is responsible for all human suffering in sickle cell anemia, but it also confers protection against malaria, so there is an up side.

Hemoglobin is the protein in red cells that enables oxygen transport from the lungs to the rest of the body. It tiles in four unit pillows called hemoglobin tetramers.

When the DNA recipe/gene that codes for hemoglobin is altered by a single letter, the hemoglobin forms rigid rods, polymerizing like plastic, which you can show is what people are made of.

This causes the red cells to look like a tent with a pole sticking in the wrong place. These red cells get stuck in capillaries and cause great suffering. But many diseases caused by SNP's have already been identified as such. SNP’s were the “low-hanging fruit” of discovery.

In the long term Copy Number Variation will turn out to be the next big thing, the next frontier. It is already yielding results. You heard it here first.

NOTES:
1 - Click here for more about CNV and on the images for background.
2 -
One can observe an easier to parse notation for the first figure: ABCD, AB2CD, AB3CD, ABC4D3(CD), A(CB)D, the last being “inversion”.
3 - Database of Genomic Variants
4 - Visigene

Thursday, November 08, 2007

That 20 minute delay.

Temporal correlation between Apple and market siblings.
(black bars show synchronization)


That little issue of the 20 minute delay in stock prices is a very interesting one.

I am interested in time, maps and relationships. My computer clocks are synchronized with network clocks that use atomic standards to keep accurate time.

The markets slid yesterday. I checked them first thing today to see how they would respond at opening. I was surprised to find that market reports that I thought were real time are not.

finance.google.com offers an interactive market graph.
As of 9:21 Eastern time, no data.

scottrade.com offers market data to subscribers, which I am,
As of 9:25, no data.

excite.com offers a thumbnail survey that claims a delay,
As of 9:30 Eastern time, no data. Curious.

Is there a twilight between market change, and the rest of the world finding out about it?

Does this delay provide an advantage to the real time trader?

The image above is from an exhaustive numerical analysis to determine the connectivity of various stocks.

Electronic signals travel at nearly the speed of light.

An electronic signal, as a stock price, can circle the earth 7500 times each second.

A fast video gamer can react in less than a tenth of a second.

A computer can react in about a microsecond.

I would wager, that until that 20 minute delay is removed, some people with access to real-time signals and computers will continue to profit exorbitantly off the losses of others.

In trading terms, a half hour is an eternity. It is the chasm between rich and poor, just and unjust, leading and following.


Apple vs. AMD tracking over a year.

Friday, September 14, 2007

Forte! The Loudest HF Radio Source

.


I finally got my program going for visualizing celestial radio sources. It was one of those stay-up-all-night-cause-its-too-exciting-to-sleep kinda deals. The push came at was the end of a week-long programming binge. Now its rehab time.


Here are some preliminary results, seen from several points of view. In these images, the size of the blob is the loudness of the source. The color of the blob is the spectral index. Purply blobs are blue-shifted "hot" sources. Orange-red blobs are red-shifted and thus cooler, in the radiant sense.


To give you a sense of this, here is Andromeda, by its lonesome:

Here it is hanging with its radio friends:


Some friends are louder than others:

Turns out this source is near a couple of black holes in the catalog.
One is "feasting" the other is "spewing". Quite the cosmic party.



I find that very exciting...

You can too. The Google Earth kml file may be downloaded here.

- Van

L. Van Warren MS CS, AE
web wdv.com
FCC AE License AE5CC

"Slow is Fast and Fast is Slow"

Wednesday, September 12, 2007

Third Rock: Get Your Galactic Freak On

Made some more progress getting the Ukranian RF data loaded into Google Earth.
Many of the glitches are gone and I am getting better coregistration.
Here is an example of a “possible” optical correlation with a radio source.
In galaxy zoo this would be classified as an elliptical galaxy. Perhaps it isn’t!


I still have to size the radio emitting sources by their flux, which is a little harder due to the grammar, but is coming into reach. There are a few really loud sources. I can’t wait to get my freak on them. But at 5:30 am, its probably time to go to sleep! This is so exciting, it is literally making me sick to my stomach. I am having a lot of anxiety with each step. This is very powerful stuff. Leverage. And it is so new. No eyes have ever seen this massive correlation.
Don’t know who to tell really, so you guys are it! I will release a layer soon and you can load it into Google Earth yourself!

All the red spots below are radio sources.



All the best,

- Van

L. Van Warren MS CS, AE
web wdv.com
FCC AE License AE5CC
Slow is Fast and Fast is Slow

Tuesday, September 11, 2007

Second Look: Visualizing Radio Sources

This evening's work consisted of rendering the sources as point emitters, instead of envelopes.

Here is the sky before and after visualizing radio sources:

Before:




and After:


The dark band appears to be a gap in the data.

Next tasks are to render sources by their strength or apparent radio brightness, and clear up gaps in the data.

- Van

Sunday, September 09, 2007

First Light: Windows To A New View of Space

There is a saying in telescope construction: “First Light”.

It is at that moment when one finds out if all the work will have a reward. It can be so much work. Just now, while a retrospective of Pavaroti ran on CBS, my radio telescope mapping program got its first light. I am visualizing the Ukrainian UTR-2 Radio Source Catalog.

Here are two pictures showing the sky as we used to see it, and the radio sky as we can now see it.
This is very rough, and maybe even wrong, but it shows that all the pieces can talk to each other, and that is the hard part.

Here is what the sky looked like before:


Here is “after” the first light. Each red window is a radio source. I promise to improve this, but it’s a start. There are 12,000 observations from over 2000 galaxies, quasars and galaxy clusters emitting on frequencies you can hear on a short wave radio 10 – 25 MHz. Hopefully we can do that too!


It is so exciting! Thanks to all the astronomers in the Ukraine and the engineers at Google who made my work visualizing the UTR-2 catalog possible.



Rosette Nebulae with Ukraine 10-20 Mhz Radio Windows

Saturday, September 08, 2007

Zero Point: Visualizing 16.7 MHz Celestial Radio Sources:


This evening was spent processing the Ukraine Russia Radio Telescope Database and extracting celestial coordinates, spectra and signal strengths for 16.7 MHz RF Celestial RF Emitters. Results were obtained for the following numbers of objects:

Quasars: 118
Galaxies: 142
Galaxy Clusters: 15

Subtotal: 275

Other Grade A Emitters: 749 // high quality measurements
Other Grade B Emitters: 596 // medium quality
Other Grade C Emitters: 510 // lower quality


Subtotal: 1855


Grand Total: 2130 celestial radio sources

This required many computational operations, data verification and conversion passes. Scripts were written that convert the raw catalog into a consistent, normalized and usable form.

Now that the database is ready, I intend to plot these radio sources in Google Sky so we can “see” their distribution and galactic neighborhoods. M31, Andromeda, is my focal point, but I’m very excited to have many sources to consider. Color=Frequency, Size=PointingRolloff, Brightness=Flux. It will take me a few days to produce the KML channel. Unix makes cleaning and sifting through the data much easier. This will help us know where to point our software radio antennas. Wish me luck.

Saturday, May 12, 2007

A Switched Variable Capacitor

Lately I have renewed my interest in wideband radio. The question is, what is possible now?

Turns out, it’s all in the amplifier.

Texas instruments makes a very interesting wideband radio frequency (RF) amplifier. It is called the THS3202. There is a saying in radio, “DC to daylight” to describe this kind of part. The 3202 goes from 100 kHz to 2 GHz, not quite daylight, but pretty close.


THS3202 Wideband RF Amplifier -- Texas Instruments

That is sixty-five times more bandwidth than the Hammarlund of the previous article. That’s a lot. The 3202 weighs much less than a gram. That’s a little. You can see it in the center of the board above.

Here is a circuit showing the THS3202 in action:



3202 Amplifier in Action -- Bruce Carter TI

Finding the Needle in the Haystack

There are three ways of teasing a given signal out of the haystack of broadcasting and noise.

1) Resonate incoming RF through LC tank circuit to select a specific signal.

2) Multiply all incoming RF by a frequency you choose and generate.

3) Bandpass filter all incoming RF to amplify the desired frequencies and quench the rest.

Most radios use methods 1 and 2, heterodyning to demodulate the incoming signal. Here we focus on method 3, homodyning, made possible by advances in amplifier technology...

Filtering

Trump, Stitt and Bishop of Texas Instruments have created a program called FilterPro™ that enables the accurate specification of filter banks for a wide range of frequencies. Low-Pass, High‑Pass and Band-Pass filters can be designed using Butterworth, Bessel, Chebychev and other common filter paradigms. FilterPro™ allows cascading stages of bandpass filtering that use identical components, except for the filter capacitors. As a quick example, consider the following two­‑ pole filter centered at 500 MHz:

Two Pole Bandpass Filter created using FilterPro

An iterated use of the program produces the following range of component values in a five-pole bandpass filter:


Capacitor Values Required for Wideband Filter Tuner

Capacitor Values Required for Wideband Filter Tuner

Five stages of the filter shown above enables a very wideband filter to be created that can serve as a radio tuner. From the graph we can see that the bandpass filter not only filters, it amplifies the signal of interest by 40 decibels (dB), or a factor of 10,000.

A Switched Variable Capacitor

“Back in the day” variable capacitors with consecutive ganged stages were used to address the tuning problem. They looked like this:


Air Variable Capacitor from Wikipedia

Modern versions of such capacitors have a range of capacitance from 15 picofarads to 384 pF. Synchronizing 10 instances of 15000 pF each would require 390 such units.

The trouble is that for each section of our five pole bandpass filter, we required two capacitors whose values can range from 1 pF to 15000 pF, that is, five decades of variation. A coordinated five pole array requires ten wideband variable capacitors operating in near-perfect unison. Accomplishing this with combinations of mechanically ganged variable capacitors is not practical.

This paper explores the creation of a variable wideband capacitor constructed using arrays of surface mount capacitors and solid state relays.

Novacap Capacitors and Teledyne RF Relay

The questions are:

· What topology of array?

· What form of switching?

A useful design would require.

Clearly the equivalent series resistance (ESR) of the switched capacitor should be as low as possible to maintain the idealized properties of a capacitor.

Two solutions come to mind. Both give rise to interesting mathematical analysis. We will start with the simplest first.

Note” The term “array” will be used to talk about the ensemble of capacitors used to create a single variable wideband capacitor. The term “bank” will be used to talk about a synchronized set of such variable wideband capacitors.

Point of Clarification

A Simple Decade Array

Consider an abacus like arrangement:

A Switched Variable Capacitor from 0 – 99,999 pF with 1 pF Steps

This capacitor would require 5 columns for each of the five decades. Each column would require 9 capacitors to represent the digits in each decade. The result would be a variable capacitor of 0 – 99,999 pF in 1 pF steps. Now obviously lead capacitance would make absolute values below 5 pF difficult to obtain, but increments of 1 pF could be realized with this device. This bank requires 45 capacitors and 45 switches to accomplish this task. With surface mount technology it would be the size of a postage stamp.

The question is finding those values of capacitance.

Monday, April 16, 2007

My Favorite Radio



-- photo courtesy Universal Radio

This is my favorite radio ever. It is the Hammarlund HQ-140-X shortwave receiver. It was made between 1953 and 1955, one year before I was born. When I was about thirteen I received one of these radios as part of my ham-radio pursuits of the time.

I remember listening to Canadian, South American, English and European broadcasts on it late at night when the reception, "on the skip" was good due to ionospheric conditions. I heard Al Green on the AM, telegraph coding, both manual and automated and strange warbling signals that are quite memorable. I have always wondered if these were pulsars or binary stars, but perhaps that is a bit fanciful.

Sunday, April 15, 2007

Notes on Scanning

I wanted to make a few notes about scanning transparencies and prints.

Epson makes a scanner called the V750. It's pretty cool. It has a built-in light box for transparencies in the lid.


In addition to the usual faire, you can scan transparent positives and negatives up to 8" x 10" at resolutions up to 6400 dpi.

At 6400 dpi, each square inch of the image is 41 Megabytes.

For a 4" x 5" Hasselblad positive or negative transparency this works out to about 819 Megabytes per image. Just remember "4x5 is a gigabyte".

If you have many film products to scan you may want to purchase an additional external hard drive that you can throw in the car and drive to your print fulfillment vendor. A 400 Gig USB 2.0 drive is $139 at this writing, but you might want to go bigger. Uploading gigabyte images is tedious at best.


Preparing Your Machine and Workspace

You will want to max out your machine with RAM. RAM is cheap compared to your time.

You will want to make sure that your machine supports USB 2.0. Dedicate a table to the scanner so that your workflow is convenient, especially if you are working in a team environment. People need room to move and think. You might also want to have a light box and a film loupe so you can review your materials prior to scanning.


A can of compressed air is useful for blowing the dust and lint off of film materials. I highly recommend running a dust remover like the Oreck or Multi-Tech in the room where you are handling film. Also wear inexpensive cotton gloves. Fingerprints contain acids that will deteriorate your materials long-term.


Connecting the Scanner

The vendor provides disks with various drivers and utility software. Plug the scanner into a UPS power strip to protect its delicate circuitry and connect the USB to the back of your machine where it will be out of the way.

Running the Scanner

After the scanner is connected and configured correctly, run some scans at LOW resolution. There is usually a preview mode but keeping the "dpi" low will let you get your feet wet without using large amounts of scanning time or disk space. After you get your workflow down, you can crank up the resolution. I suggest 400 dpi just to see how things are looking. You can also rehearse the corrections you want to make in brightness, contrast, etc without running into space or time problems.

After you are happy with your process, double the dpi to 800 and make sure everything is ok. You can continue doubling until you reach the peak resolution of the scanner. 6400 dpi is four doublings. Plan on the fourth doubling to take sixteen times as much time and disk space as your early fast runs, which should be less than a minute.

Monday, October 23, 2006

Numerical Gold: A Certain Series

Consider the series:

1, 1, 2, 3, 12, 20, 300, 525, 1960, 49392, 1481760, 5821200, 164656800, 336370320, …

Do you, offhandedly recognize the generating function? This is a special series.

These are the inverses of the last solution xn
to the Hilbert matrices of order 1, 2, … 14

These matrices were notorious for being ill-conditioned, they are solved here symbolically using Maxima and rational arithmetic.

Maxima notebook is below.

The solution is to the n x n linear algebra problem [H]x = b.

To reproduce these numbers, or larger ones, one can edit the file with a text editor and then drag it into the Maxima window. The last number required 2 minutes to generate on my new dual-core. To change the order of the Hilbert matrix, just change the order variable at the top of the file. It is currently 4. So a point of curiosity really. The numbers become extremely expensive to find, growing at least as the cube of n times a large constant. So to me they are a kind of symbolic gold. The 100th number for example, is probably not knowable at the current time, but that is speculation on my part. You may notice some functional relationship that allows their simple generation thus “Cracking the Hilbert Code”.

For example

12 = 4 * 4 - 4
20 = 5 * 5 - 5
300 = 20 * 20 - 100
525 = 25 * 25 – 100
and so on.

Ref: The Code

load("eigen");
order : 4;
X : columnvector(makelist(concat(x,i), i, 1, order));
h[i,j] := 1/(i + j -1);
Unity[i,j] := 1;
A : genmatrix(h, order, order);
A . X;
B : genmatrix(Unity, 1, order);
A . X = B;
Ap : triangularize(A);
Ap . X = B;
App : invert(Ap);
App . B;

Friday, May 19, 2006

A Number Theory Problem and the Flexible Base Machine



Summary

A common problem in number theory is addressed using two methods of brute force search. The first search method is implemented as a ‘C’ program. The second search is performed in a spreadsheet. A generalized form of counting is developed to enable generation of the range indices for the second search method. This generalized form of counting takes the form of a flexible base machine.

The flexible base machine is then used to generate the indices for a Cartesian product search. In the process, two observations are made, and an answer to a question posed by James Watson is suggested. The number problem is similar to bin-packing problems. Progressive abbreviation, chunking, and grouping are related ideas.

Preamble

I want to bring a couple of ideas to the front before we embark on our little number problem journey. Try not to let their simplicity distract you. Suppose the following pattern is observed:

{ 101 blah blah blah 101 blah blah blah … }

Let the shorter token A abbreviate 101 in the pattern above.Substitution produces:

{ A blah blah blah A blah blah blah … }

Then let the shorter token B abbreviate blah blah blah in the above pattern to produce:

{ A B A B … }

Continuing this abbreviation progressively with C abbreviating A B yields:

{ C C C … }

and we have progressively abbreviated to saturation. Changing gears a little consider a number in base 10, spoken as "one hundred and one", written as 101, with an optional subscript indicating the base.

This number is the dot product of two vectors, a basis vector and a coefficient vector. Reading right to left the basis vector is:

{ …, 100, 10, 1}

which can also be rewritten:

{ …, 102, 101, 100}

and the coefficient vector is:

{ …, 1, 0, 1}

the dot product is:

1 x 102 + 0 x 101 + 1 x 100 = 10110

where again, the subscript indicates the base.

We can run the same process again in base two, changing the basis vector to base 2:

{ …, 4, 2, 1}

which can also be rewritten:

{ …, 22, 21, 20}

and the coefficient vector is:

{ …, 1, 0, 1}

the dot product is:

1 x 22 + 0 x 21 + 1 x 20 = 1012 = 510

where the subscript indicates the base and superscripts indicate powers of the basis coefficients.

Now consider the process in base Q:

{ …, QQ, Q, 1}

which can also be rewritten:

{ …, Q2, Q1, Q0}

and the coefficient vector is:

{ …, 1, 0, 1}

the dot product is:

1 x Q2 + 0 x Q1 + 1 x Q0 = Q2 + 1

where the result is independent of base.

What happens when Q is -2 or the imaginary number i?

But first, the problem at hand.

Introduction

Daughter and I are at a high school math fair. A seemingly simple problem is written on a file card. The problem is:

Choosing from {16, 17, 23, 24, 29}

Show how you can make a sum of exactly 100.

There are two parts to this development. An elementary part and a subtle part.

Elementary Part:

There are several interesting things about this problem:

a) To proceed, the solver must infer that multiple instances of a given integer are required. What enables this to be obvious to one and opaque to another? Instinct? Training? Pattern Recognition?

b) How much time is required to find all solutions by hand?

c) If trial and error is used, how many trial solutions exist?

d) How many actual solutions exist?

e) Do rearrangements exist that make the problem easier to solve. Recall the famous incident when Gauss was asked by his teacher to add the first 100 integers. He did so by pairing 1 with 100, 2 with 99, etc. recognizing fifty such pairs. This pairing transformed a long addition into a short multiplication: 50 x 101 = 5050. The insight was not merely numerical. It was pattern recognition and viewpoint.

Development

As a professional mental defective with an ‘f’ and no ‘t’, I imagined solutions of the form:

1) a x + b y + c z + d w + e v = 100

Reversing the list, the specific problem would look something like:

2) a 29 + b 24 + c 23 + d 17 + e 16 = 100

Now to enumerate trial solutions in an orderly way, the coefficients a thru e must be bounded from above and below. To first order, trial values of a would vary from 0 to k such that

3) a 29 <= 100

4) k = floor(100/a) = floor(100/29) = 3

where we agree that floor rounds down to the nearest integer. We might limit a further by considering the likely participation of other integers solutions from our set. That is not done here, but I suspect most mathematicians do this instinctively.

So we replace the coefficients a thru e in equation 2 by the intervals computed using the method of equations 3 and 4. Representing intervals in square braces [0-ka-e] we obtain:

5) [0-3] 29 + [0-4] 24 + [0-4] 23 + [0-5] 17 + [0-6] 16 = 100

The number of trial solutions is size of the five-dimensional Cartesian product. The first estimate of this number was:

6) 3 × 4 × 4 × 5 × 6 = 1440

but the answer is defective, consistent with the ‘f’ above. To check the interval from [0-k] requires k+1 operations as any ‘C’ programmer will quickly tell you. Adding one to each upper bound makes a threefold difference in the product. The non-defective answer is:

7) 4 × 5 × 5 × 6 × 7 = 4200

and we have answered question c. There are 4200 trial solutions.

The amount of time required to generate all trial solutions is estimated as follows:

Assume these calculations are performed on an infix calculator at the rate of one keystroke per second. Trial solutions of equation 2 require four adds and five multiplies each. Each add requires six keystrokes and each multiply five, because all adds involve pairs of two-digit numbers and all multiplies involve one-digit numbers times two-digit numbers. The ‘+’, ‘x’ and ‘=’ operators require one keystroke each. Adding it
all up:

8) time per add: 6 seconds

9) time per multiply: 5 seconds

10) time per trial solution: 4 × 6 + 5 × 5 = 49 seconds

10) time for trial solutions: 49 × 4200 = 205,800 sec (57 hours 10 minutes)

Fifty-seven hours requires the patience of David Blaine who at this moment has been suspended in a spherical tank of water for nearly a week. My first impulse was to throw this problem into a spreadsheet to see how it looked. David Blaine looks unexpectedly round due to the spherical nature of the tank. Using a spreadsheet also led to unexpected observations, which motivated this busy little development. Idea! How about a quick ‘C’ program to find all the solutions? The program took 5 minutes to write, 5 minutes to debug and 0.03 seconds to run. That is 343 times faster than doing it by hand for the first run and 6.9 million times faster for subsequent runs. A computer makes one at least 343 times more productive than a magician floating in a tank of water, clever though he is. For completeness, the ‘C’ code for finding the solutions follows:

main()

{

int iteration = 0;

int i, j, k, l, m;

int a[5] = { 3, 4, 4, 5, 6};

int x[5] = {29, 24, 23, 17, 16};

for(i = 0; i < a[0]; i++)

for(j = 0; j < a[1]; j++)

for(k = 0; k < a[2]; k++)

for(l = 0; l < a[3];l++)

for(m = 0; m < a[4]; m++)

{

if( (i*x[0] + j*x[1] + k*x[2] + l*x[3] + m*x[4]) == 100 )

printf

(

"%4d: %2d %2d + %2d %2d + %2d %2d + %2d %2d + %2d %2d = 100\n",

iteration,i, x[0], j, x[1], k, x[2], l, x[3], m, x[4]

);

iteration++;

}

}

This answers question d.

The run yielded three solutions, where iterations are numbered from zero:

iteration 26: 0 × 29 + 0 × 24 + 0 × 23 + 4 × 17 + 2 × 16 = 100

iteration 513: 1 × 29 + 0 × 24 + 1 × 23 + 0× 17 + 3 × 16 = 100

iteration 750: 1 × 29 + 2 × 24 + 1 × 23 + 0 × 17 + 0 × 16 = 100

Placing this in tabular form:


























Solution

Obtained
on Iteration

Value

Equiv.
Hand Cranking

0

26

{0, 0, 0, 4, 2}

24 min: 45 s

1

513

{1, 0, 1, 0, 3}

7 hr: 50 min: 15 s

2

750

{1, 2, 1, 0, 0}

11 hr: 27 min: 30 s


Reversing the coefficients finds the first solution more quickly, a useful trick to remember.

























Solution

Obtained
on Iteration

Value

Equiv.
Hand Cranking

0

20

{0, 0, 1, 2, 1}

18 min: 20 s

1

673

{2, 4, 0, 0, 0}

10 hr: 16 min: 55 s

2

734

{3, 0, 1, 0, 1}

11 hr: 12 min: 50 s



These solutions occupy a five-dimensional discrete counting space, from a vector point of view.

We have answered question b. This leaves questions a and e. Question a dealt with how different people solve the same problem and question e dealt with rearrangements of the problem that speed solution. Note that even though one finds all solutions within eleven hours and change, that one must evaluate all the trial solutions to prove that no others exist.

There is a certain irony to note. I spent about as much time with this problem as that required by manual brute force search. However, subsequent problems of this type would solved faster because the pattern is known. This last remark addresses question a. There are two people in this solution, the one before the pattern is recognized and the one after.

Rearrangements

Refreshing our mental image of the problem, we look for rearrangements.

Choosing from {29, 24, 23, 17, 16} Show how you can make a sum of exactly 100.

Question a implies a follow-up. What mental refresh rate is required by mathematicians vs. non-mathematicians?. Are numbers are stored visually, aurally, linguistically, some other way, or all of the above? When James Watson, the co-discoverer of DNA’s double helix was asked by newsman Charlie Rose what question he would like to know the answer to Watson replied, “Where [in the brain] is the number four stored?”. In the meantime, question e asks if there are rearrangements that make the math fair problem more tractable. We have:

11) a × 29 + b × 24 + c × 23 + d × 17 + e × 16 = 100

In this instance of the problem, two of the coefficients are odd and three
are even, a condition that would vary in any generalization. Exploiting
this idea nonetheless, we expand to:

12) a ( 20 + 9) + b (20 + 4) + c (20 + 3) + d (10 + 7) + e (10 + 6) = 100

There is no factoring (obvious to me) that simplifies this equation, so its solution retains its brute force search nature. We live in an age where brute force search is enabled by computers. This has been exploited in symbolic manipulation. It is exploited here also.

Subtle Part:

Finding the solution using a ‘C’ program is simpler and faster than using a spreadsheet. Looping through the Cartesian product makes setting the bounds on the search intervals easy. However it was in building the machinery to handle the spreadsheet version that two interesting observations were made. Is this because the intermediate steps were explicitly visualized?

As developed in the preamble, it is common practice to represent numbers using placeholder notation. Placeholders are just linear combinations of successive powers of a given base. So ordinary counting bears a an subtle and interesting similarity to our math fair problem.

13) In base 10, the number 101 means 1 x 102 + 0 x 101 + 1 x 100 = 101

14) In base 2, the number 101 means 1 x 22 + 0 x 21 + 1 x 20 = 5.

The same argument works for any base. The original problem was solved by searching linear combinations of integers that satisfied the given equation. Behold, counting numbers can be represented as a linear combinations of a more general set of numbers than successive powers.

Basis and Coefficient Vector Make a Number System

In both the original problem, and in the representation of any counting
number, there are two unstated vectors at play.

One is the basis vector {100, 10, 1} that specifies the decades of the number.

The other is the coefficient vector {1, 0, 1} that defines the values present in each decade.

We could write the number 101 as 1:0:1, but conventional placeholder notation insures that a single digit owns each decade that contributes to the number. To release the restriction we use colons. a:b:c:d to represent each coefficient of a multi-digit number in a flexible base format.

This discussion is more easily demonstrated with a spreadsheet. And to that we must briefly turn. With the spreadsheet is possible to construct working solutions to number theory problems and find the general rules that embody them. Poor man’s induction if you will, that will reach its conclusion in the appendix. Consider the following figure.

Figure 1 - Spreadsheet


The spreadsheet must index through decades of trial values in an orderly fashion, starting with the least significant rightmost trial values and incrementing left values when we complete a trip through a decade and roll over to the next value to the left. This will produce an unexpected reward.

Next, a three decade counter is constructed, which works in any base. For example, base two:


Figure 2 - Spreadsheet

A base 10 version requires only a simple change of basis from {4, 2, 1} to {100, 10, 1}.

Figure 3 spreadsheet


The spreadsheet has been split to account for its size. The split is shown by the horizontal bar. We are almost ready to make our observation. Here are the recurrence formulas the sheet uses:


Figure 4 - Spreadsheet Formula Detail


The second argument in the floor function is the number of significant digits to provide.

The Observation

Returning to our Figure 3 spreadsheet, we are not obligated to provide a basis vector whose numbers are successive powers. We could choose numbers from our original problem:


Figure 5 - Spreadsheet


Linear combinations make the representation of any integer possible with just one restriction.

Using the flexible base, we can write 57 using the basis vector {23, 16, 1} that number is just 2:0:11, which expands to 2 x 23 + 11 x 1 = 46 + 11 = 57. The parity sum at the top proves that the targetnumber and sum are always the same for all target trial solutions examined.

It is strange and beautiful to expand a base represented by a scalar, to a base represented by a vector of numbers that have no relationship except being different in size. In all counting there is the process of progressive abbreviation, we are always using a base that is represented by a vector, whether the base is {100, 10, 1} or the base is {23, 16, 1}. Progressive abbreviation is a powerful and reversible concept. It is linguistic, mathematical, and instinctive. Is progressive abbreviation a mechanism by which concepts are stored and linked in the brain?

To address Watson’s question above in terms of another more testable question:

Is the number four stored as a progressive abbreviation, chosen by each person that links to an expanding neural network of related notions?

If such a progressive abbreviation is removed, are the links to the concept broken with an aphasia of four being the result?

An interesting restriction must be in place for flexible base to work. There must be a 1 in the low-ordered place, or it becomes impossible to represent all numbers. Said another way, absent 1, closure in the counting system is lost. Trading for that loss, we immediately obtain the power to approximate periodic functions. This has interesting properties and applications as suggested below. Absent 1 in the basis set, the sum - target sequence is a generator than undulates with unique patterns that depend on the basis vector. These patterns can be used to approximate discrete functions whose variation undulates in a similar way. Thus we have an observation:


Closure of a counting system can traded for approximation of periodic functions.

Figure 6 - Spreadsheet

Absent 1 in the basis vector there is not full coverage of the linear combination and the representation cannot represent all integers. Here is one interesting pattern of non-closure:


Figure 6a - Spreadsheet Graph – Absent
One Generates Discrete Function


Similar arguments can be made for real numbers.

Figure 7 - Spreadsheet

In base 10 and base 2 number systems, we have a convention for the basis vector - the base itself which gives the rule for creating the basis vector. So “base 10” is the progressive abbreviation for a basis vector of {…, 100, 10, 1} and these in turn progressively abbreviation common counting numbers. We are used to this. Counting numbers are just the coefficient vector written without delimiters and with the assumption of a basis vector. A second observation is that:

A basis and coefficient vector pair implements a number system more general than a conventional number system.

Thus counting is a form of progressive abbreviation. Those possessing the instinctive ability to abbreviate, that is to compress and expand linguistic and symbolic relationships gave us the ability to count and all kinds of mathematical opportunity. This further amplifies the answer to question a above. It may be that concepts of number all originated in the act of abbreviation, historically speaking.

Now we will use the decade counters we generate to solve the original number theory problem in a non-looping construct. This non-looping construct is independent of base.

The original math fair problem is now cast in the non-looping spreadsheet construct. This brings together the naive and subtle developments using the flexible base machine. To accomplish a basis vector must be constructed that is six elements long. The recurrence relationships are similar to above, but complicate with each additional element as shown in the Appendix. Analysis of the recurrence relationships yields the pattern necessary for basis vectors of any length. The recurrence relationships are constructed in a long form and then abbreviated by substitution. This process is shown for the five-element case in Figures 8 and 9. Figure 8 is a transitional construct.


Figure 8 - Spreadsheet


Figure 9 - Spreadsheet Excerpt – Increasing Complexity


A six element version of the flexible base machine will be used to solve the original problem below. We needed a flexible base decade counter and one has been constructed - a bizarre and useful device similar to mixed-base appliances such as analog clocks.

Figure 10 - Spreadsheet Using Flexible Base to Set Bounds

We use a modified form of the decade machine to provide the multipliers that search the space and the same solutions are obtained, but at different iterations:

iteration 30: {1, 2, 1, 0, 0}


iteration 1095: {0, 0, 0, 4, 2}


iteration 1512: {1, 0, 1, 0, 3}


Understanding why the iteration numbers differ between the ‘C’ program and the spreadsheet will be left as an informative exercise for the reader. The answer is given after the Appendix.

Application - Data Compression and Encoding

The patterns that occur absent 1 in the flexible base create unique patterns. When these patterns correspond to the value of a signal, such as intensity or volume, they can be used to encode the pattern of a picture or a sound. The basis vector can be transmitted only as often as a change of basis is necessary to represent the image optimally.

This turns the encoding of an image into a search for the optimal basis and coefficient vector that represents the image. If done in short runs, this need not be overly expensive computationally. One can in fact define the image interpolant by the length of the coefficient and basis vectors. Note that they are always the same length.

One would expect the encoding step to be more expensive than the decoding step provided the basis vectors were published as part of the image data. The basis vectors form the key necessary to unlock the image. This could have applications in selling digital image or sound data. The data is free. You pay for the basis vector key. The technique could be applied recursively so that even the key itself could be represented as a basis vector + coefficient vector + data stream. This might be useful for compartmentalized security where portions of a document are at different levels of classification - one document, many keys. Keys would be distributed to users according to their privilege or classification category. Multi-level security has been a nagging problem which this could solve rather easily.

Application - Encryption

One useful encryption technique is to publish the number (coefficient vector without delimiters except for colons as noted above) without publishing the basis vector.

Breaking the code then becomes a task in searching for the basis vector that creates a recognizable message. Search time grows exponentially with basis vector size.

One tip off to the contents of the basis vector would be the presence of colons. Colons surrounding a number give an upper bound to the size of that entry in the basis vector.

Application – Progressive Abbreviation of DNA

DNA consists of coding and non-coding parts. These parts can be labeled by a simpler basis set, associated with function and the genome can be compacted by the progressive abbreviation technique illustrated above. It is this project which I hope to do next. It is a tribute to Watson. He uncovered the double-helix structure of DNA and then posed a question that led us to the next step.

A spreadsheet that demonstrates all calculations is available from the author. Just leave a comment requesting it.

Appendix – Building the Recurrence Relations for the Six-Fold Case

This will be done without further explanation by induction from the fivefold case which was built from the threefold case.

G8=FLOOR($B8/G$6,1) // Slot 6

I8=FLOOR(($B8-G8*G$6)/I$6,1) // Slot 5

K8=FLOOR((($B8-G8*G$6)-I8*I$6)/K$6,1) // Slot 4

M8=FLOOR(((($B8-G8*G$6)-I8*I$6)-K8*K$6)/M$6,1) // Slot 3

O8=FLOOR((((($B8-G8*G$6)-I8*I$6)-K8*K$6)-M8*M$6)/O$6,1) // Slot 2

Q8=FLOOR(((((($B8- G8*G$6)-I8*I$6)-K8*K$6)-M8*M$6)-O8*O$6)/Q$6,1) // 1

Answer to Question: Loop Order in the ‘C’ program searches right to left.