[ Previous chapter ][
This chapter ][ Next chapter ]
Once a sequence search is completed, the question arises whether
the found similarities do
share a similarity amongst each other.
This can be achieved in either automatic or manual fashion
by using
programs which will
align
the
sequences of interest.
If you painted a map from the
result
of your sequence search as described
earlier , it might be obvious
that sequences do usually share similarity only in parts.
This
will leave the ends or overhang parts of two sequences badly
aligned due to low similarity.
Therefore,
before alignments are attempted, it is a good practice to create
sequence fragments
of approximately the same length which will
allow programs to operate more easily.
If sequences are not specifically taylored for multiple sequence alignment,
programs
might fail or report alignmnets unreliably.
The approach used for automatic sequence alignment
can be described
as "clustering" of the most similar sequences. In a first step, the
program
will need to find the sequence pair(s) which share(s) the most
obvious similarity. To achieve
this, each sequence is compared to each,
which results in
(n*n)/2
comparisons if we have
n
sequences to compare.
As in rigorous sequence searching, a comparison is made using
sequence comparison tables to
compute the best possible alignment
and score this appropriately. (Note that the scores will
be
not as desired if the sequences have not been tailored as
mentioned above).
Once the comparison
for each possible sequence pair has been completed ,
the "best" candidates serve as nuclei,
and additional sequences are
aligned to the already existing alignment. This will work well
with similar proteins but too many gaps, in
particular on DNA level, will
most probably not yield the desired result. The largest errors
will
occur if regions with low similarity are used as "closest" set, as
these will cause
trouble for additional sequences to be matched.
If problems are encountered because similarity cannot be determined
well enough automatically,
either manual
alignment is required or the selection of sequences must be improved
by tayloring
or omission of very remotely related fragments.
The result of a multiple sequence alignment will be a block of
sequences which are nicely
painted on top of each other. Programs exist
which
will plot the degree of similarity along the sequence coordinate.
Other programs allow to print
or paint the output nicely. The GCG
programs also produce a figure which schematically displays
the
level
of similarity as a dendrogram. As outlined below,
the dendrogram which illustrates
sequence similarity must not
mistakenly be interpreted as phylogenetic tree,
however,
can be used to verify that the alignment proceeded as
expected.
Multiple Sequence Alignment is NOT the tool for you if you are working
on fragment assembly
or shotgun sequencing. In order to align multiple
sequences reliably, the similarity amongst
the members of the alignment
should be extensive along the entire length rather than only overlapping
fragments.
Section 12.1: Principle of Multiple Sequence Alignment
Subsection 12.1.1 Prerequisites
Subsection 12.1.2 Finding the Best
Subsection 12.1.3 Grouping
Subsection 12.1.4 Result Evaluation
Subsection 12.1.5 Limitations
[ previous chapter
],[ this chapter ][
next chapter ]
, [next page/section] , or [overview] , or [table of contents]