By: Lydia E. Kavraki
We have already convinced ourselves that most of the activities in living organisms are regulated by proteins. All proteins start out on a ribosome as a linear sequence of aminoacids. This linear sequence must fold during and after the synthesis so that the protein can take up its native conformation . Recall that the native conformation of a protein is a stable three-dimensional structure that strongly determines a protein's biological function. The native conformation of a protein is only marginally stable because it depends on the environment. Modest changes in the environment can cause structural changes in the protein, thus affecting its function. Proteins are very used to the cell environment. Therefore, environmental conditions different from those in the cell can result in structural changes. When a protein loses its biological function as a result of a loss of three-dimensional structure, we say that the protein has undergone denaturation. Proteins can be denatured not only by heat but also by extremes of pH, since these two extreme conditions affect the weak interactions and the hydrogen bonds, which are mainly responsible for a protein's three-dimensional structure. It is important to understand that the denatured state of the protein does not equate with the unfolding of the protein and randomization of conformation. Actually, denatured proteins exist in a set of partially folded states that are currently poorly understood.
The Process of Folding
The folding pathway of a large polypeptide chain is very complicated, and not all the principles that guide the process have been worked out. However, many plausible models have attempted to describe protein folding. One model views folding as a hierarchical process where local secondary structures form first. Under this model, αα helices and ββ sheets form first, with longer range interactions between helices and sheets forming super-secondary structures later. This process continues until the entire polypeptide folds. An alternative model describes folding as a spontaneous collapse of the polypeptide into a compact state. This collapsed state is known as a molten globule. It may be that the actual folding process of proteins incorporates features of both models. Instead of following a single pathway, a population of peptide molecules may take a variety of routes. Thermodynamically, the folding process can be viewed as a kind of free-energy funnel, where the unfolded states are characterized by a high degree of conformational entropy and relatively high free energy. In a trivialized definition, entropy is a measure of chaos, a measure of all different conformational states that the protein can be in. Obviously, there is more chaos in the protein in its unfolded state. On the other hand high free energy is a measure of unstableness, which is higher in a protein's unfolded state. Therefore, as folding proceeds, the narrowing of the funnel represents a decrease in the number of conformational states present. Local minima along the sides of the free energy funnel represent transition states that are semistable and can briefly slow the protein states since it takes some time for the protein to jump out of the local minima. At the bottom of the funnel, also known as the global minimum, an ensemble of folding intermediates are reduced to a single conformation. It is important to realize that although we often describe the free energy funnel as having one global minimum - that is, one native conformation - a protein can have a small set of native conformations, each one important for its biological function(s).
Free Energy Funnel
Spontaneous or Assisted Folding?
It has been experimentally confirmed that not all proteins fold spontaneously in the cell. For many proteins the folding process is facilitated by the action of specialized proteins known as chaperones. Molecular chaperones are proteins that interact with partially folded or improperly folded polypeptides to faciliate correct folding pathways of provide microenvironments so that folding can occur. Chaperones are not the only proteins to facilitate protein folding. Two enzymes, protein disulfide isomerase (PDI) and peptide prolyl cis-trans isomerase(PPI), catalyze isomerization reactions and are required for the folding pathways of a number of proteins.
Current Methods for Protein Structure Prediction
There are three major theoretical methods for predicting the structure of proteins: Comparative Modelling, Fold Recognition, and ab initio Prediction.
Comparative modelling makes use of the fact that evolutionarily related proteins with similar sequences have similar structures. Sequence similarity is measured by the percentage of identical residues at each position based on an optimal structural superposition. The similarity of structures is very high in the so-called ``core regions'', which typically consist of secondary structure elements such as ααhelices and ββsheets. Loop regions such as ββturns connect these secondary structures. The process of building a comparative model is as follows: First, an alignment is performed between the sequence for which the structure has been determined by experimental methods (this protein is deemed the parent) with the sequence to be modelled (the target protein). This sequence alignment is used to construct an initial model by copying over some main chain and side chain coordinates from the parent structure based on the equivalent residue in the sequence alignment. Side chains must be built for residues in the target that does not correspond to an identity in the alignment and for residues where the side chain conformation is thought to vary in the target relative to the parent structure. Main chains must be built in the case of insertions, regions surrounding a deletion, and in other regions of suspected main chain variation. You can find a very good review of Comparative Modelling at Comparative Protein Modelling.
Fold Recognition (Threading)
Threading uses a database of known three-dimensional structures to match sequences without known structure with protein folds. This is accomplished through a scoring function that assesses the fit of a sequence to a given fold. These scoring functions are usually derived from a database of known structures and generally include a pairwise atom contact and solvation terms. Threading methods are very similar to comparative modelling in that threading compares a target sequence against a library of structural templates, producing a list of scores. The scores are then ranked and the fold with the best score is assumed to be the one adopted by the sequence. The methods to fit a sequence against a library of folds can be extremely elaborate computationally, such as those involving double dynamic programming, Gibbs Sampling using a database of threading cores, and branch and bound heuristics, or sequence alignment methods based on Hidden Markov Models. For an example scoring function used in Threading, please read An empirical energy function for threading protein sequence through the folding motif.
Ab initio prediction
The ab initio approach is a mixture of science and engineering. The science is in understanding how the three-dimensional structure of proteins is attained. The engineering portion is in deducing the three-dimensional structure given the sequence. The major challenge with regards to the folding problem is with regards to ab initio prediction, which can be broken down into two components: devising a scoring function that can distinguish between correct (native or native-like) structures from incorrect (non-native) ones, and a search method to explore the conformational space. In many ab initio methods, the two components are coupled together such that a search function drives, and is driven by, the scoring function to find native-like structures. Currently there is no reliable and general scoring function that can always drive a search to a native fold, and there is no reliable and general search method that can sample the conformation space adequately to guarantee a significant fraction of near-natives (less than 3.0 angstroems RMSD from the experimental structure). Some methods for ab initio prediction include Molecular Dynamics (MD) simulations of proteins, Monte Carlo (MC) simulations that do not use forces but rather compare energies, and Genetic Algorithms which try to improve on the sampling and the convergence of MC approaches. For a more detailed discussion, please visit Ab initio protein structure modeling methods.
Novel computational methods and large scale distributed computing are being used by Folding@Home to simulate folding and to examine folding related diseases. Please visit Folding@Home to learn more about this distributed computing project.
It is very important for proteins to achieve their native conformation since failure to do so may lead to serious problems in the accomplishment of its biological function. Defects in protein folding may be the molecular cause of a range of human genetic disorders. For example, cystic fibrosis is caused by defects in a membrane-bound protein called cystic fibrosis transmembrane conductance regulator (CFTR). This protein serves as a channel for chloride ions. The most common cystic fibrosis-causing mutation is the deletion of a Phe residue at position 508 in CFTR, which causes improper folding of the protein. Many of the disease-related mutations in collagen alco cause defective folding. A misfolded protein known as prion appears to be the agent of a number of rare degenerative brain diseases in mammals, like the mad cow disease. Related diseases include kuru and Creutzfeldt-Jakob. The diseases are sometimes referred to as spongiform encephalopathies, so named because the brain becomes riddled with holes. Prion, the misfolded protein, is a normal constituent of brain tissue in all mammals, whose function is not yet known. A complete understanding of prion diseases awaits new information about how prion protein affects brain function, as well as more detailed structural information about the protein. Therefore, improved understanding of protein folding may lead to new therapies for cystic fibrosis, Creutzfeldt-Jakob, and many other diseases.
The folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its energetic minimum under the solution conditions usually encountered in cells. Although energy landscapes may be "rough", with many non-native local minima in which partially folded proteins can become trapped, the folding funnel hypothesis assumes that the native state is a deep energy minimum with steep walls, corresponding to single well-defined tertiary structure.
The folding funnel hypothesis is closely related to the hydrophobic collapse hypothesis, under which the driving force for protein folding is provided by the energetic stabilization associated with the sequestration of hydrophobic amino acid side chains in the interior of the folded protein, and the corresponding isolation of electrostatically charged side chains on the solvent-accessible protein surface or in neutralizing salt bridges within the protein's core. The molten globule state predicted by the folding funnel theory as an ensemble of folding intermediates thus corresponds to a protein in which hydrophobic collapse has occurred but many native contacts, or close residue-residue interactions represented in the native state, have yet to form.
In the canonical depiction of the folding funnel, the depth of the well represents the energetic stabilization of the native state versus the denatured state, and the width of the well represents the entropy of the system (excluding the conformational entropy). The surface outside the well is shown as relatively flat to represent the heterogeneity of the random coil state. The theory's name derives from an analogy between the shape of the well and a physical funnel, in which dispersed liquid is concentrated into a single narrow area.