Copy number variation (CNV) is an important issue in genetics.
It has a beautiful mathematical notation suggestive of text processing:
Huntington’s chorea, produces dementia that does not appear till middle age. It is caused by the presense of too many CAG repeats. What was I saying? Oh yes:
CAG repeats involve three of the four DNA bases, Cytosine, Adenine and Guanine:
If there are too many CAG’s in succession on the short arm of chromosome 4, that individual will develop Huntington’s. Period. Unlike Huntington's, which is caused by repeats, there are other diseases caused by single point mutations. Recent genome studies have focused on these mutations, called SNP’s, and pronounced “snips”, which stands for Single Nucleotide Polymorphisms. This is a fancy word for one letter of DNA, being substituted for another. A bug in the code as it were.
If you live in a world like I do, where the internet is a connected series of pipes, here is how to cure someone of sickle cell anemia, a notable SNP-caused disease.
In a text editor, like “vi”, edit chromosome11.txt:
1)Find line containing beta-globin gene.
:/beta-globin
2)Code for glutamate instead of valine.
:s/GAG/GTG/
3) Save changes and exit file.
:wq
This single SNP is responsible for all human suffering in sickle cell anemia, but it also confers protection against malaria, so there is an up side.
Hemoglobin is the protein in red cells that enables oxygen transport from the lungs to the rest of the body. It tiles in four unit pillows called hemoglobin tetramers.
When the DNA recipe/gene that codes for hemoglobin is altered by a single letter, the hemoglobin forms rigid rods, polymerizing like plastic, which you can show is what people are made of.
This causes the red cells to look like a tent with a pole sticking in the wrong place. These red cells get stuck in capillaries and cause great suffering. But many diseases caused by SNP's have already been identified as such. SNP’s were the “low-hanging fruit” of discovery.
In the long term Copy Number Variation will turn out to be the next big thing, the next frontier. It is already yielding results. You heard it here first.
NOTES:
1 - Click here for more about CNV and on the images for background.
2 - One can observe an easier to parse notation for the first figure: ABCD, AB2CD, AB3CD, ABC4D3(CD), A(CB)D, the last being “inversion”.
3 - Database of Genomic Variants
4 - Visigene