[ Previous chapter ][
This chapter ][ Next chapter ]
Rigorous searching implements the alignment methods used by programs
like bestfit
in a sequence database searching routine.
The usefulness of this enhanced searching
can be enhanced by using so-called
profiles:
Once a sequence search revealed
homologies to several sequences, it is
desirable to identify shared regions of homology in
a multiple sequence
alignment. The information buried in the alignment can be re-utilised
further on to be used in analysis and searches.
Remote similarities of the "twilight zone"
are
not necessarily easily detected by heuristic searching methods. Various
algorithms implement
alignment procedures known from pairwise alignments
but require significantly more resources.
The GCG program package currently
features the profile search method from Gribskov et al.
Profile searching unites the benefit of
comparison matrices
with the features of sequence-specific
allowance of exchanges such as
already used in the
pattern approach . However, the
substitutions of patterns follow a
yes/no
scheme. To enhance sensitivity, the matrix values for a given exchange
in profiles
are
weighted according to the observed alignment.
Profile searching is a complex method and severely depends on the
input sequence diversity
in order to justify extensive work. Please
make sure that you have read suitable introductory
literature.
The GCG Program Reference
Manual, for example, has a
Profile Analysis Essay
which you should read before
you use the methods extensively.
The data used for profile searching must be in GCG
GCG format. Use reformat
or
genmanual sequence_exchange for details.
For multiple sequence alignment, there are several possibilities of file
formats to start with.
The program profilemake generates
a profile
from a set of aligned sequences in
msf
format.
The program profilesearch uses a profile
generated by profilemake
and produces a listing of
best-fitting sequences in a database. For aligning these
with the profile the program profilesegments
is required.
The program profilegap
uses a profile generated by
profilemake and compares this to a
sequence with a
comparison algorithm of an end-to-end alignment ( gap ).
================================= Begin Exercise 13
Understand the benefit, scope and limitations of a rigorous searching method.
Generate
a profile and show the difference in searching the alignment vs.
searching the consensus.
================================= End Exercise 13
[next page] , or [overview] , or [table of contents]
Subsection 12.5.1 Principle
Subsection 12.5.2 Formats of Sequences
| | file |
| type of file | ending | called as (example)
+-------------------------+----------+-------------------------
| normal sequence file | .seq or |
| | .pep | my.seq
+-------------------------+----------+-------------------------
| file of sequence names | .frg and |
| (from 'lineup', etc.) | .fil | @my.fil
+-------------------------+----------+-------------------------
| multiple sequence files | |
| (from 'pileup') | .msf | my.msf{*}
Subsection 12.5.3 Profile Generation
Subsection 12.5.4 Profile Searching
Subsection 12.5.5 Profile Analysis