Chapter 12: Sequence Families



Section 12.1: Principle of Multiple Sequence Alignment

[ Previous chapter ][ This chapter ][ Next chapter ] Once a sequence search is completed, the question arises whether the found similarities do share a similarity amongst each other. This can be achieved in either automatic or manual fashion by using programs which will align the sequences of interest.


Subsection 12.1.1

Prerequisites

If you painted a map from the result of your sequence search as described earlier , it might be obvious that sequences do usually share similarity only in parts. This will leave the ends or overhang parts of two sequences badly aligned due to low similarity. Therefore, before alignments are attempted, it is a good practice to create sequence fragments of approximately the same length which will allow programs to operate more easily.

If sequences are not specifically taylored for multiple sequence alignment, programs might fail or report alignmnets unreliably.


Subsection 12.1.2

Finding the Best

The approach used for automatic sequence alignment can be described as "clustering" of the most similar sequences. In a first step, the program will need to find the sequence pair(s) which share(s) the most obvious similarity. To achieve this, each sequence is compared to each, which results in (n*n)/2 comparisons if we have n sequences to compare. As in rigorous sequence searching, a comparison is made using sequence comparison tables to compute the best possible alignment and score this appropriately. (Note that the scores will be not as desired if the sequences have not been tailored as mentioned above).


Subsection 12.1.3

Grouping

Once the comparison for each possible sequence pair has been completed , the "best" candidates serve as nuclei, and additional sequences are aligned to the already existing alignment. This will work well with similar proteins but too many gaps, in particular on DNA level, will most probably not yield the desired result. The largest errors will occur if regions with low similarity are used as "closest" set, as these will cause trouble for additional sequences to be matched.

If problems are encountered because similarity cannot be determined well enough automatically, either manual alignment is required or the selection of sequences must be improved by tayloring or omission of very remotely related fragments.


Subsection 12.1.4

Result Evaluation

The result of a multiple sequence alignment will be a block of sequences which are nicely painted on top of each other. Programs exist which will plot the degree of similarity along the sequence coordinate. Other programs allow to print or paint the output nicely. The GCG programs also produce a figure which schematically displays the level of similarity as a dendrogram. As outlined below, the dendrogram which illustrates sequence similarity must not mistakenly be interpreted as phylogenetic tree, however, can be used to verify that the alignment proceeded as expected.


Subsection 12.1.5

Limitations

Multiple Sequence Alignment is NOT the tool for you if you are working on fragment assembly or shotgun sequencing. In order to align multiple sequences reliably, the similarity amongst the members of the alignment should be extensive along the entire length rather than only overlapping fragments.


[ previous chapter ],[ this chapter ][ next chapter ] , [next page/section] , or [overview] , or [table of contents]