Read Latex

Showing posts with label Genetics. Show all posts
Showing posts with label Genetics. Show all posts

Wednesday, July 20, 2011

Regarding "Informational Derivation of Quantum Theory" by Chiribella, et.al.



Regarding "Informational derivation of quantum theory", "PHYSICAL REVIEW A 84, 012311 (2011)", Chiribella

I was provoked to cast Information Compression as an Iterated Function

So given a string S, of length L, find an iterated function F, that produces a new string S’ of length L’ where L’ < L.
One might also require an iterated function G = F-1 that takes S’ and produces S.
If such a G exists, then F would be called a lossless or invertible iterated compression function.


S = "The rain in Spain stays mainly in the plain."
L = 45

F1 = { s/ain/1/g }

S1 = "The r1 in Sp1 stays m1ly in the pl1."
L1 = 37

F2 = { s/he/2/g }

S2 = "T2 r1 in Sp1 stays m1ly in t2 pl1."
L2 = 35

F()  = F1(F2())
S' = F(S)
L' < L since 35 < 45

G1 = { s/1/ain/g }

G2 = { s/2/he/g }

G = G1(G2())
S = G(S')

F is invertible, iterated and compresses.

Sidebar 1: F can be packaged with S:

Sp = F + S = /ain/1/he/2/"The rain in Spain stays mainly in the plain."

Next Step: Find an Iterated Function that performs Information Expansion

So given a string S, of length L, find an iterated function F, that produces a new string S’ of length L’ where L’ > L.
One might also require an iterated function G = F-1 that takes S’ and produces S.
If such a G exists, then F would be called a lossless or invertible iterated expansion function.

S = "The rain in Spain stays mainly in the plain."
L = 45

F = { s/[a-zA-Z]/[%3d]/ }

S1 = "084he rain in Spain stays mainly in the plain."
L1 = 47

S2 = "084104e rain in Spain stays mainly in the plain.
L2 = 49

S' = "0841041010321140971051100321051100320831120971051100
     3211511609712111503210909710511010812103210511003211
     6104101032112108097105110046010"

L' > L since 135 > 45

G = { s/[%3d]/[a-zA-Z]/ }
S = G(S')

F is invertible, iterated and expands.

Sidebar 1: F can be packaged with S:

Sp = F + S = /[a-zA-Z]/[%3d]/"The rain in Spain stays mainly in the plain."

In this context we call F, the "transcription apparatus".


Next Step: Show that the cost of mutagenesis is higher in the compressed case.


This is trivial and will be left as an exercise to the reader.

Source for F in Expansion Section:

#include

/* convert any number to itself and any letter to a number */
/* hint add a comma to each phrase and watch intronic explosion */
/* hint remove the commas and watch loss of reading frame */
/* hint add triplets and watch need for reading frame */

int isNumber(int c)
{
    return ( ('0' <= c) && (c <= '9') );
}

void echo(int c)
{
    printf("%c", c);
}

void number(int c)
{
    printf("%03d", c);
}

main()
{
    int i;
    int c;

    for(i = 0; (c = getchar()) != EOF; i++ )
    {
        if( isNumber(c) )
        {
            echo(c);
        }
        else
        {
            number(c);
            break;
        }
    }

    for(; (c = getchar()) != EOF;)
    {
            echo(c);
    }
}

Friday, March 28, 2008

Copy Number Variation: The Next Big Thing


Copy number variation (CNV) is an important issue in genetics.

It has a beautiful mathematical notation suggestive of text processing:

Huntington’s chorea, produces dementia that does not appear till middle age. It is caused by the presense of too many CAG repeats. What was I saying? Oh yes:

CAG repeats involve three of the four DNA bases, Cytosine, Adenine and Guanine:


If there are too many CAG’s in succession on the short arm of chromosome 4, that individual will develop Huntington’s. Period. Unlike Huntington's, which is caused by repeats, there are other diseases caused by single point mutations. Recent genome studies have focused on these mutations, called SNP’s, and pronounced “snips”, which stands for Single Nucleotide Polymorphisms. This is a fancy word for one letter of DNA, being substituted for another. A bug in the code as it were.

If you live in a world like I do, where the internet is a connected series of pipes, here is how to cure someone of sickle cell anemia, a notable SNP-caused disease.


In a text editor, like “vi”, edit chromosome11.txt:


1)Find line containing beta-globin gene.

:/beta-globin

2)Code for glutamate instead of valine.

:s/GAG/GTG/

3) Save changes and exit file.

:wq

This single SNP is responsible for all human suffering in sickle cell anemia, but it also confers protection against malaria, so there is an up side.

Hemoglobin is the protein in red cells that enables oxygen transport from the lungs to the rest of the body. It tiles in four unit pillows called hemoglobin tetramers.

When the DNA recipe/gene that codes for hemoglobin is altered by a single letter, the hemoglobin forms rigid rods, polymerizing like plastic, which you can show is what people are made of.

This causes the red cells to look like a tent with a pole sticking in the wrong place. These red cells get stuck in capillaries and cause great suffering. But many diseases caused by SNP's have already been identified as such. SNP’s were the “low-hanging fruit” of discovery.

In the long term Copy Number Variation will turn out to be the next big thing, the next frontier. It is already yielding results. You heard it here first.

NOTES:
1 - Click here for more about CNV and on the images for background.
2 -
One can observe an easier to parse notation for the first figure: ABCD, AB2CD, AB3CD, ABC4D3(CD), A(CB)D, the last being “inversion”.
3 - Database of Genomic Variants
4 - Visigene