[ Previous chapter ][
This chapter ][ Next chapter ]
There are various
tools available which allow you to
analyse single protein sequences.
Principle
The desire to predict a secondary or even tertiary structure from
the amino acid sequence is
known as the
folding problem.
Unfortunately, there is no solution available
at this point
of time. Two approaches are in use:
In order to predict the structures of peptides or even proteins
with yet unknown homology to
known proteins, it is required to
use methods which assign parameters to amino acids which
will allow
to get an estimation for a possible secondary structure. The programs
in use frequently
use
three-state
(or four-state) predictions:
All methods work similar to the following assumption:
Based on an analysis of known protein structures, amino acids are
classified into "classes"
which will most frequently occur in one of
the three (or four) states as described above. Statistical
methods are
applied to evaluate the variance of these occurrences in all positions of all
proteins, and numerical values are assigned to each amino acid and
its probability to be found
in one of the three (or four) states.
A
window
is applied (see the section on
window analysis earlier
in this chapter) in order to calculate a value which is significant
for
the given position and amino acid in the peptide which shall be predicted.
Plotting the
curves of the three (or four) states it is possible to
derive a prediction for the whole protein.
Note that, due to this
way of calculation, the states are not mutually exclusive and therefore
a considerable uncertainty is implied.
Chou and Fassmann
pioneered this approach
already many years ago and used the tertiary
and secondary structures of proteins available
at that time. The
applicability of their
method, therefore, is constrained to the set of
proteins available
at that time (
globular, soluble
proteins). The precision
is, on average, estimated to be
60-65%
if the Chou and Fassman method is
used.
Robson and Garnier
have
improved precision to
close to
65%
on average by using a
matrix
of neighbour values rather than
a single window approach.
Eisenberg
has used additional parameters such as
"hydrophaty".
Summarising, the use of secondary structure prediction from
scratch is highly speculative and
should not be the only
method to reach conclusions. Averages on aligned sequences
as described
in later chapters of the BioCompanion
will give the best results with an expectation to be
close to
70%
accurate.
Based on a
sequence
homology
with an existing, structurally known protein fragment,
it is possible to
use advanced computer graphics displays to build the peptide
fragment covering the homology
according to
the known structure of the protein found
in the database. The minimum
sequence homology required to do this kind of model building severely
depends on
the individual peptide but should definitively be better than
30% - which means
that more than 10 amino acids must be
identical
in a sequence of 30 amino
acids. Building the model will be
achieved in three steps:
Initially, the sequence to be modelled is
aligned
to the sequence with the
known structure coordinates. This
procedure is described later
and shall make you familiar with the
kind of replacements which will need to be done in
the model.
Be careful to avoid the "impossible" - glycine residues
frequently are required
to adopt an unusual configuration,
proline residues are either required in turn structures,
or their
introduction will possibly break secondary structure elements, and
disulphide bridges
are extremely important in protein structure.
Secondly, an advanced computer graphics program is used.
Famous representatives include, but
are not limited to, the
Insight II
program package
from Biosym Technologies, the
Cerius
program suite
from Molecular Simulations, and the
'Whatif'
program
as created and used at EMBL, Heidelberg.
The first two examples
are high-end commercial software packages
(since fall 1995, available from a single vendor)
and require significant investment, whereas the 'Whatif' package
is available for negotiable
price to academic researches. All of these
packages usually require a
Silicon Graphics
computer system with high-end graphics and a large monitor.
If possible, a stereo
viewing capability should be available.
Be aware of the complexity of these programs - this
is not rocket
science any longer. Replacing amino acid side chains might
be an easy procedure
on the display but requires considerable
thinking to get
a reasonable
biologically relevant
model.
Last, once the initial model is completed, mathematical procedures
will need to be applied
to
refine
this model. This will require that so-called
potentials
are assigned to the atoms of your peptide - which means that
the groups and residues
are classified by chemical topology
to be of a certain class of arrangement.
E.g., a peptide
group will be semi-planar due to the hybridisation of the
participating
atoms. Subsequently, a semi-empirical
force field
is applied, and the structure
is fit to "ideal" angles,
atom distances and bond geometry with techniques known as
molecular
dynamics
and
structure refinement.
Molecular dynamics will initiate movements of atoms by
introducing a "temperature" which allows
to "shake" the
molecule into a more ideal configuration.
To do this step reliably,
it
is essential to know of constraints
which limit some of the three-dimensional properties
of the
protein or peptide. These constraints will be data from X-ray
crystallography or NMR
structure analysis.
Without these constraints, model building is highly speculative.
You should not desire to start a study involving molecular
dynamics refinement unless
you have at least
some
constraints such as disulphides, maximal cross-section
or other data.
A very serious and hence unsolved problem is to compensate for
interactions
of your model peptide with the environment. This
environment is frequently "vacuum" - at least
this is computationally
the easiest approach. Be aware that this is not a very satisfying
approximation. Water or membrane molecules are much better suited
as interaction partners.
However, supercomputing performance
will be needed to run this analysis, and the results will
be only of limited relevance if you did not use any constraints.
Summarising, it must be stated that the
molecular modelling approach
for
secondary structure prediction may be very useful and suggestive
if
small changes
of a peptide sequence to an
already known
peptide structure are to be applied. The larger the deviation,
i.e., the lower the similarity
to a known structure, the less
relevant will be the results.
Experimental
structure
evaluation
with X-ray and/or NMR techniques
might be required
for
satisfactory models.
Programs for secondary structure prediction
Remember that the prediction of secondary structure without
a
reasonable homology to three-dimensional data is rather unsafe.
Programs which employ three-dimensional
modelling techniques require
special hardware (powerful computers) and dedicated software,
hence, are beyond the scope of the
BioCompanion
.
The programs available to you in the desktop environment wil typically
be restricted to
secondary structure prediction from scratch.
In order to display the secondary structure plots,
you need to
have a computer screen which is capable of displaying graphics.
It is recommended that you
have access to a colour graphics device
if you want to run these programs.
Remember to have set the graphics
environment correctly with
setplot if you work with GCG locally.
X-Windows setups must have set the DISPLAY
environment
correctly.
To display several measures of secondary structure,
use
% pepplot
To generate a table of several measures (with a comparison of
Garnier-Robson and Chou-Fassman
predictions), use
% peptidestructure
The generated output file can be plotted "two-dimensionally",
but for serious inspection
the one-dimensional
plotting is recommended (use the corresponding menu
option):
% plotstructure
Given the assumption that the protein fragment adopts a helical
structure, the program
helicalwheel can be used.
The program moments plots a three-dimensional
map
which displays moments of hydrophathy in dependence
of the
sequence and the rotational angle of the peptide bond (90 - 110
degrees is OK for
helices, 0 or 180 degrees is indicating chances
for beta sheet).
The programs peptidemap and
peptidesort work like the DNA counterparts
.
The program peptidesort can also
be used advantageously
to determine the
composition
of a peptide. If the <SPACE BAR> is hit on the question "Which Enzymes?",
no
fragmentation is calculated. Rather, the composition is detailed to
a much larger extent than
the composition program will
provide.
The isoelectric point
of the denatured protein can be determined
from the titration curve plotted by the program
isoelectric .
Frequently, you might want to know
where "acidic" or other
regions of your protein sequence are located.
As ambiguity symbols
in the single-letter peptide alphabet are
not defined, you might
rewrite
your sequence and use the window program in order to plot
the result with
statplot . The data for the
simplify program are located
in a file which you can
get from the GCG program database with the command
% fetch simplify.txt
This file has a self-describing format, and basically will replace
each amino acid listed
in the second column with an amino acid listed
in the first column:
================================= Begin Exercise 7
Summary of single-sequence tools:
Translate the sequence GENEMBL:M19311 in the determined
reading frame,
perform a secondary structure prediction from scratch, and plot the
acidic
amino acids as function of the sequence.
To use amino acid sequences, the computer needs a defined
reading frame
in
the DNA sequence
which allows the translation into
a peptide
sequence. The translated amino acids are written into a
peptide sequence.
The purpose of this exercise is to create the sequence
M19311.pep
and
predict its secondary structure. Proceed as follows:
================================= End Exercise 7
Subsection 8.6.1 Secondary Structure Prediction
Subsection 8.6.2 Visualisation of Secondary Structure
Subsection 8.6.3 Fragmentation
Subsection 8.6.4 Isoelectric Point
Subsection 8.6.5 Simplification of Protein Sequences
D DEQN
will make all
D, E, Q and N
symbols convert to
D.
This might look biologically irrelevant but a
good
approach to get all acidic amino acids to read "D" - as these
can be plotted now with the 'window/statplot'
programs.
[ previous
chapter ],[ this chapter ][
next chapter ]
, [next page/section] , or [overview] , or [table of contents]