BIOL 213 Genetics
Protein
Class is one place where we can check how things are going. Class will be useful to you to the extent that you PREPARE BEFOREHAND. Don't wait for the last minute! No matter how many times you've pulled it out in the past, it's not going to happen here. Break it down to daily chunks. Give yourself time to be confused.
Study questions alone
 PDF version
Go to outline

Reading Assignment: pp.345-354 (except Colinearity of Gene and Protein)

Outline:

A. What can protein do?
DNA is often depicted as the blueprint of the cell. A blueprint is something an architect refers to in building a structure. It contains a representation of the final shape, its dimensions, what's connected to what, and so forth. If you examine DNA, you will find none of this. The molecule has no knowledge of the cell's final shape, nor any other of the things that characterize blueprints. DNA merely lists the components that make up the proteins of a cell. But that is enough.

The weight of action, then, lies squarely on protein. Table 1 gives a synopsis of some functions performed by protein. At the top of the list is the catalysis of the chemical reactions, as emphasized in the last section. The enzyme tyrosine hydroxylase, for example, catalyzes the conversion of tyrosine to the neurotransmitter L-DOPA.

Proteins are responsible for other functions besides catalysis. They are required for the transport of a variety of compounds through membranes or, in the case of hemoglobin, the transport of oxygen in solution. Protein also plays a passive, structural role, for example in connective tissue. There are many other roles for protein, and Table 1 could have been many times as big as it is.

Table 1. Some Biological Functions of Proteins

 FUNCTION
 EXAMPLE
 Catalysis  Tyrosine Hydroxylase (hormone & neurotransmitter production)
 Binding: transport  Hemoglobin (oxygen transport)
 Binding: defense  Immunoglobins (immune system)
 Binding: information  Insulin (hormone) & Insulin Receptor
 Mechanical Support  Collagen (connective tissue)
 Mechanical Work  Actin/Myosin (muscle contraction)

 

B. What are proteins?
The function of a protein is determined ultimately by its particular shape and structure. At its most basic level, the structure of a protein is simple. It has to be, otherwise DNA could not specify it. Understanding the structure of protein thus answers two profound questions:

How do proteins control the activities of a cell?

How do genes exert control over those activities?

 In brief, a protein is a linear array of amino acids. If you grasp all that sentence has to say, then you've come a long way towards understanding protein. Notice the pattern in Figure 1c. A protein is a polymer of a unit repeated again and again. That unit consists of a carboxylic acid, connected to a carbon. The carbon is called the "alpha-carbon" because it's the closest one to the carboxylic acid group. An amino group is attached to the alpha-carbon. The subunits are thus (alpha-amino acids. Amino acids differ from one another only in what else is connected to the alpha-carbon, represented in Figure 1a as a variable "R-group" (see also Griffith, Fig. 12.3).

The synthesis of proteins is the process of combining alpha-amino acids in a linear chain, connecting alpha-amino groups to carboxylate groups (Figure 1a and 1b). The backbone of this chain is identical for all proteins. If the R groups were similarly invariable, then all proteins would be alike, and protein would be able to do only one thing, a not very interesting thing at that.


Figure 1. Protein as a polymer of alpha-amino acids. 1a. Structure of alpha-amino acid. "R" represents side group, as shown in Figure 2. 1b. Formation of dipeptide by joining two amino acids. 1c. Polypeptide chain composed of linked amino acids. The shapes represent the different R-groups, each with its own chemical properties.

Fortunately, the R groups vary from one amino acid to the next, amongst the 20 possibilities shown in Figure 2. This listing of the twenty major amino acids is a very good list to get to know, but not to memorize. If you go into biochemistry, you'll find that they will become etched into your brain without having to memorize them, and if you don't, there's probably no need to know the structures.

Some R groups of amino acids are acidic carboxylic acids, giving rise to negative charges at physiological pH. Aspartic acid is an example of an acidic amino acid. Some R-groups are basic, giving rise to positive charges at physiological pH. The charged amino acids interact strongly with water and so are hydrophilic. There are other R groups that interact strongly with water but are uncharged. For example, serine contains a hydroxyl group (an OH group), just like water does, and it's no surprise that serine is hydrophilic. There are also hydrophobic amino acids, like leucine, whose R-groups would tend to segregate away from water, because they interact less strongly with water than water does with itself.

There are many other properties in which the twenty amino acids differ from one another: some are bulky, some small; some are capable of donating electrons, others not; some are chemically reactive. And so forth. Each amino acid represents a different flavor, and the structure and properties of a protein are defined by the properties and order of its amino acids: its primary structure.

There are only twenty amino acids used to synthesize proteins, which limits what proteins are possible in nature. How constricting is this limitation? Consider the number of possible dipeptides (two amino acids joined together by a peptide bond). There are 20 possible amino acids in the first position and 20 possible amino acids in the second position. That makes 202 = 400 possible dipeptides. Similarly, there are 203 = 8000 possible tripeptides. Proteins range in size from a smallish 100 amino acids to a 1000. The number of possible proteins in nature is therefore staggering!

SQ10. What is a protein?

SQ11. Glycogen is a linear array of glucose. Why isn't glycogen as varied in its properties as protein?

SQ12. Find an amino acid with the following properties:

a. Small, negatively charged.

b. Large, has double bonds (and so can participate in electron transfer reactions), and has a free -OH group (and so can participate in hydrogen bonding).

SQ13. (= PS1.9) Peptide hormones can be as small as five amino acids in length, but most protein have polypeptide chains ranging from 70 to 1000 amino acids. Suppose you have some reason to believe that a small protein, precisely 100 amino acids in length, can transform glucose into gold. You set out to synthesize every possible 100-amino acid protein until you find the one you want. How many such proteins may you have to go through? How long would it take?

C. Structure and basis for catalysis
Unfortunately, knowing merely that proteins are linear arrays if (alpha-amino acids doesn't tell us how they can have the varied properties required of proteins in a living cell. In particular, it doesn't explain how proteins can act as catalysts. For this we have to see the protein in three dimensions. The protein hexokinase (Figure 3), is the enzyme that begins the degradation of glucose in the liver. If you were to see this molecule, the first thing you might notice is that the enzyme has a hole just the right size for glucose to fit into. The binding of glucose to the enzyme alters the enzyme in such a way that glucose cannot escape unless the enzyme again changes shape. This normally occurs only after the reaction catalyzed by the enzyme is complete. So glucose goes in and glucose 6-phosphate goes out.

The function of hexokinase is clearly tied up in its shape. How did the protein get to this shape? Fig. 12.5 and 12.7 (Griffiths) show how the amino acids may interact with their neighbors to form coils, called alpha-helices, or other structures. These local interactions lead to what is called the secondary structure of a protein. Note that myoglobin (Fig. 12.8, Griffiths) has short (helices. This is typical of globular proteins, which includes most enzymes. In contrast, proteins with long extended regions of secondary structure are fibrous and generally play a structural role. An example is the protein fibrin, which forms the protein network that makes up blood clots.

In some cases structures common to several proteins with similar functions have been identified. One example is the helix-turn-helix motif (Fig. 8-34 of Griffiths et al.), a stretch of about 20 amino acids consisting of two alpha-helices separated by a bend. Proteins that have this structure, with specific amino acids in key positions, are able to bind to DNA. One of the two alpha-helices fit nicely into the famous double helix of DNA (Figure 4). There are many such motifs known, and it is sometimes possible to guess the function of a protein simply by knowing its primary structure.

Amino acids may have more distant interactions with one another, giving rise to the tertiary structure of a protein, the folding of a polypeptide chain in three dimensions (Fig. 12.7 and 12.8, Griffiths). For example, the hydrophobic amino acids would tend to be sequestered in the middle of the protein, away from water, just as the hydrophobic chains of soap aggregate to minimize contact with water. Charged and other hydrophilic amino acids would tend to lie outside the protein. You can see this so some extent with hexokinase (Figure 3).

It may be, however, that any way the chain may twist, there is no folding that can avoid patches of hydrophobic amino acids from appearing at the surface of the protein. What then? In some cases, further aggregation may occur between separate protein chains, so that in the end, the completely assembled protein consists of multiple chains formed by the interaction between them. Such proteins are said to have quaternary structure (Fig. 12.7, Griffith). An example of this is the protein hemoglobin, the oxygen-carrying protein in blood (Fig. 12.9, Griffith). It consists of four separate polypeptide chains that interact with each other. Separately, each subunit can bind oxygen, due in part to the oxygen-binding molecule, heme, which fits into a hole created by the tertiary structure. But the regulation of oxygen binding, essential to the functioning of hemoglobin in the body, is apparent only when four subunits aggregate together.

The positions of specific amino acids determine not only the shape of the protein but also its capacity for catalysis (Figure 5). The folding of chymotrypsin, a digestive enzyme that catalyzes the hydrolysis (breakdown) of ingested protein in the gut, creates a local region of the enzyme called the active site. The folding happens to place the 195th amino acid in the chain, serine, near a hole that has the shape of the amino acid phenylalanine. When a phenylalanine within a protein you eat finds its way into the phenylalanine-shaped hole of chymotrypsin, the amide bond adjacent to phenylalanine is positioned close enough to serine-195 that a chemical reaction takes place, breaking the amide bond. Once that occurs, the broken protein is released. The ability of chymotrypsin to do this depends upon the precise geometry of the active site. is dependent upon a serine occurring precisely at position number 195 and upon folding occurring that places serine in exactly the right position relative to the protein being digested.

SQ14. If the critical part of an enzyme is its active site, consisting typically of several amino acids, what's the use of the rest of the protein?

D. Targeting protein
Similar considerations govern the placement of protein. Figure 6 shows a cartoon of glycophorin, a protein that spans the membrane of red blood cells. You can see that most of the amino acids in the membrane-spanning region are hydrophobic, while the amino acids inside or outside the cell are generally hydrophilic. This arrangement of amino acids serves to anchor the protein in the membrane, because the hydrophilic amino acids would not be happy in the oily, lipid environment of the membrane, and the hydrophobic amino acids would not be happy outside that environment (or more accurately, the water wouldn't be happy to accommodate the weakly interacting hydrophobic residues). Note that some amino acids in the membrane are hydrophilic and some amino acids in the two aqueous compartments are hydrophobic. Why might that be?

The cartoon of glycophorin raises more questions than it answers. The protein was surely made inside the cell... then how did those many hydrophilic amino acids pass through the hydrophobic environment of the membrane to get outside? Worse, what about the case of the protein hormone insulin, made within pancreatic cells and secreted into the circulatory system? Insulin must have hydrophilic amino acids on its exterior (since it's soluble in blood), so how did it completely cross the hydrophobic cell membrane?

Well, a cell could provide a hole in the membrane for the protein to pass through, but that simply replaces one problem with many: How can you make sure only the protein you want to leave can leave? How can you make sure that protein supposed to leave the cell go through holes in the cell membrane and protein bound to the mitochondria go through holes in the mitochondrial membrane? How come the cell's guts don't spill out the holes?

A blueprint would solve these problems, specifying for each protein where it's supposed to go. This is not the answer nature found. There are no blueprints, and the protein must contain within itself information specifying its ultimate location. Since protein are nothing more than sequences of amino acids, something within the sequence must carry the information, and indeed this is the case.

Protein that must pass membranes have N-terminal amino acid sequences, called signal peptides, that function as routing slips. Transport proteins on certain membranes recognize the appropriate signal peptide and ferry the attached polypeptide chain through the membrane (Fig. 13-45, Griffiths). The signal peptide binds to the membrane protein and passes through and an aqueous channel formed through the bilayer, dragging the rest of the protein with it. Once the signal peptide has initiated transfer across the membrane, it is cleaved off.

What is the nature of the amino acid sequence of a signal peptide that enables it to be recognized by the transport apparatus? Figure 7 shows the Nterminal amino acids of the precursor to bovine growth hormone. The cell export signal peptide consists of a string of hydrophobic amino acids preceded by polar amino acids. The exact amino acids don't seem to be important -- just the types. This signal peptide enables growth hormone made within pituitary cells to be secreted into the circulatory system, and any protein that begins with this pattern of amino acids would also be secreted.

Different membranes recognize different signal peptides. In this way a protein can be directed to the plasma membrane or to an organelle. A protein may even have multiple signals, if it must pass more than one membrane, or a stop transfer signal, if (like glycophorin) it is to only partially pass the membrane. The mechanism of transfer isn't of concern to us right now. The main point is that information regarding destination is encoded directly into the protein. This information ultimately comes from the gene.

SQ15. Describe the process by which glycophorin A presumably gained its proper position in the red blood cell membrane.

E. Alteration of Protein Structure and Function by Mutation
A protein's primary structure (the linear order of its amino acids) ultimately determines the shape of the protein, its function, and its location within or without the cell. The specific characteristics of a protein result from the interplay of the chemical properties of its component amino acids. These properties, particularly hydrophobicity, enable the protein to assemble itself into a structure that places reactive groups critical to protein function at their proper locations in space.

This is the connection between genetics and life. The centrality of the primary structure of protein is so critical to our understanding, that I will restate the point from two directions: What is the nature of mutation? and How can we control protein function?

Most simple genetic mutations cause a change in an amino acid within a protein. What effect might that have? Changing an amino acid at the active site of an enzyme could alter or destroy the catalytic properties of the enzyme. Second, mutation in an amino acid distant from the active site might nonetheless alter the three dimensional structure and, for example, make amino acids within the active site two distant from one another to be effective. More specifically, a mutation might alter the secondary structure of a region, perhaps by inserting an amino acid that prevents an alpha-helix from forming. Alternatively, a mutation might prevent proper placement in the membrane by replacing a hydrophobic amino acid with a charged amino acid. The change in three-dimensional structure might be subtle, just making the structure more prone to falling apart at high temperature, for example. Replacing one amino acid with another might alter a motif that enables the protein to bind to DNA, or perform some other function. Finally, the mutation might affect a purely informational part of the protein, a signal sequence, so that the protein is improperly targeted. We will see that mutation occurs directly in DNA, not protein, but the ultimate effects of mutation are felt as aberrant protein.

The importance of the primary structure of a protein can be restated in the following way: if you can specify a protein's amino acids, i.e. its primary structure, you can determine its properties and its capacity to catalyze biochemical reactions. For example, consider hexokinase once more (Figure 3). If you knew what amino acids to change, you might alter the enzyme so that it could no longer act on glucose but only on the larger sugar, sucrose. As a matter of fact, in principle, you could design a protein to catalyze virtually any energetically feasible reaction you could imagine -- make plastic from starch! Make azaT or other expensive drugs at a fraction of the current cost! We can already make proteins to order. The only reason these applications are presently out of reach is that we don't know how to predict the complete folding of a protein or its catalytic properties from the sequence of amino acids. Most proteins assemble themselves, but what is simple in nature is fiendishly difficult to predict. You can bet that there are a lot of people in laboratories trying to learn how to predict the three dimensional structures of proteins from their primary structures. When this is achieved, you may expect a societal change comparable to what resulted from the transformation of 19th century organic chemistry to 20th century practice.

SQ16. Suppose a gene suffers a mutation and the enzyme encoded by it doesn't work. What kind of change in the amino acid sequence of the protein might account for this outcome?

SQ17. A patient exhibits signs of anemia. The red cell count is normal, as is the amount of hemoglobin, as judged by the binding of antibody directed against hemoglobin, but the binding of oxygen to whole cells is atypical (Fig. A below). You isolate hemoglobin from the patient and test binding of oxygen to the monomeric globin subunits. It is normal (Fig. B below). What mutation might account for these findings?