[ Previous chapter ][
This chapter ][ Next chapter ]
The requirement
to identify a single sequence in a database is very
much different from a
keyword
search.
Keywords are expected to match
exactly. This
type of keyword searching
has been described earlier .
Pattern searching programs
will match a pattern exactly
at single, defined positions
in a sequence.
A
sequence searching
program , however, is expected to report
and, most important,
The major problem of sequence searching, therefore, is to find a
reasonable definition
for the
similarity
of sequences. Application programs doing
"sequence
searching"
in
general imply that each entry of a sequence database is
compared to the query sequence sequentially,
and the result is a list
of database entries which are
similar
to the query
sequence. The receipt of this program type
reads as follows:
The use of so-called
rigorous
searching algorithms such as the Smith and
Watermann type of search
already known from the
'bestfit'
program , doing an accurate comparison.
These programs execute the central searching loop
in
a specifically tuned fashion, and run best on special hardware
such as the
MasPar
and
Bioccelerator
systems.
The use of
heuristic
searching algorithms which reduce the compute time
needed in the
central searching loop. Approximations in comparison are
based on the assumption
that sequence similarity can be expressed as
tightly accumulated regions of identical oligomers
or short peptides.
Either computed on-the-fly (
'fasta'
type of program)
or in a pre-processing step (
'blast'
program type), the heuristic searching
programs will run on smaller,
general-purpose machines and even PCs might occasionally be used.
Section 11.1: Tools for Sequence Searching
start program
initialise top-scoring list
for each entry in database \
{ |
compare database sequence with query sequence |
evaluate similarity as "score" |search loop
compare "score" to top-scoring list |across all
if "score" better than lowest entry |entries in
{ |the database
place entry in top-scoring list |
} |
} /
normalise top-scoring list
for each entry in top-scoring list \ evaluation
{ |of result
determine statistics |using all
align search sequence and database sequence |entries in
print out result |top-scoring
} / list
Depending on the algorithm or implementation, some of the
steps might be missing or
integrated in other steps.
Sophisticated sequence searching which performs extremely fast might
be based on
two approaches
as described below.
Subsection 11.1.1 Sensitive searching, using special computers
Subsection 11.1.2 Extremely fast searching, using approximations
[ previous chapter ],[
this chapter ][ next chapter ]
, [next page/section] , or [overview] , or [table of contents]