Protein folding

In 1962 Max F. Perutz won the Nobel Prize in Chemistry for deducing the structure of haemoglobin from data collected with x-ray crystallography. Today virtual laboratories use supercomputers and computer grids to solve protein structure problems by working from first chemical principles There are nice discussions at Stanford's folding@home page and at the Blueprint project's protein folding page. Each of these sites offers you the chance to enlist your own home computer in the protein folding task. (Also see http://www.technologyreview.com/articles/waldrop0502.asp.)

Recent news events reminding us of mad cow disease point to the importance of folding properly: the disease seems to be the result of a protein that has folded in a wrong way.

Brian White, a colleague here, must teach his beginning biology students that changing a single amino acid in a protein can dramatically alter its shape, and hence its biologicial activity. He asked me if I could write a program to solve the structure problem quickly (in seconds or minutes on a PC) so that students could appreciate this phenomenon. Since that was unlikely, I looked for a simple solvable analogous problem that he could use along with some software he's developed (along with students in my software engineering classes). You can see the project at intro.bio.umb.edu/aipotu/.

We model an amino acid as a disk (or sphere) of fixed radius with a hydrophobic index that describes how much an acid wants (index < 0) or does not want (index > 0) to be exposed to its watery environment. We imagine that the disks can occupy the cells of a regular tessellation of the plane (or space). A chain of amino acids folds so as to occupy a sequence of cells each sharing an edge (or face) with the next, in such a way as to minimize the energy, measured as the total number of edges (or faces) of occupied cells exposed to the environment (unoccupied cells), weighted by the hydrophobic indices of the amino acids occupying the cells.

Here's a Java applet that allows you to experiment with that model - instructions follow if you need them.

To tell the program what you want it to do for you:

Finally, click "fold" to start the program. In time a new window will appear showing the folded chain. Acid color varies from deep red for strongly hydrophobic through yellow for neutral to to deep green for strongly hydrophilic. The polypeptide chain appears as a black line. On the three dimensional grids acids that fold to be neighbors are connected by white lines.

You will find some statistics in the message area.

Even a chain as small as 10 amino acids will allow you explore how small changes in a single hydrophobic index can dramatically affect the folded shape. You might enjoy trying to create various possible patterns. Polypeptide tangrams or polyominoes are a small step on the way toward designer drugs.

The sequences

	-0.96    0.11    0.14    0.68    0.96 
and
	-0.961   0.111   0.136   0.682   0.966 
fold differently. Can you see why? Can you produce the same effect changing just one of the hydrophobic indices?

Bogdan Calota is helping with the code and will help with the statistics.

You can run the application on your own computer by downloading folding.jar and invoking it it from any command prompt this way:

	java -cp folding.jar FoldingWindow
The source code is available too, if you want to play with that.

I conjecture that finding the absolute minimum is an NP-complete problem. (I'd love to find a proof.) The brute force backtracking search through all the candidate configurations is expensive. For the hexagonal plane grid with n=1,2,3,... the number of possibilities is 1, 6, 30, 138, 618, 2730, 11946, 51882, 224130, 964134, 4133166, ... - that's sequence A001334 in the On-Line Encyclopedia of Integer Sequences. The growth is rate is approximately 4n.

Here are pictures of minimum energy foldings for some random chains of length 17. Red acids are hydrophobic, green are hydrophilic. Unfortunately, computing each answer took about 12 hours on my PC.

For a powerpoint presentation of this material, visit clark.ppt.