[ Previous chapter ][ This chapter ][ Next chapter ] Frequently, the combination of methods will get more comprehensive results than a single search. Therefore, even if the first trial of a sequence search produces apparently satisfactory results, it is suggested to run all available methods. Additionally, the following measures will help.
Use the sequence editor seqed to create smaller sequences
(100 bp, or
30 AA) , or cut out frequently occurring parts such as
ALU I repeats. The following criteria
might be used to split DNA
sequences:
Determine the reading frame with the
single-sequence analysis methods (e.g., frames
and
convert the DNA
sequence to a protein
with
map
followed by extractpeptide ), and run the
search on protein level with tfasta instead of DNA level.
Default settings are:
Default setting:
In case of doubt, you might use the 'bestfit' program
with
the
randomise
option. Make
sure that you give at least 200 randomisations to get
a reasonable statistical distribution.
Alternatively, you might use the
shuffle program to generate a new sequence
with
identical length and composition. However, as
the ordering of the
symbols is different, the subsequent search should give significantly
different groups of hits than the original search sequence.
If you assume that your sequence is similar to a given group but failed
to detect it with the
selected search algorithm, you might consider
to run a "prototype" search and use the list
of sequences as
subset (see below).
The
fastalert program as developed by F.Eggenberger
at
BioComputing Basel is a network application which will do the
statistical analysis for
you.
Sequence
similarity searches will result in a list of sequences which
is reported to be similar to the
original. However, in contrast to
a pattern search, query sequences might be of considerable
length,
and, therefore, show similarity to other sequences in
several regions. This requires
that the inspection of the sequence
searching output is classified by sequence coordinates
of the query
sequence. As no programs
do currently exist which will allow for an
automatic assignment, manual mapping of the detected sequence
features is required. This
manual mapping might also
go along with the labelling of additional sequence features as revealed
by the method of single sequence analysis .
If several hits are encountered in the
result of a sequence search,
a close inspection of the actually occurring hits is essential.
It
might sound trivial but a title of a sequence, if listed in
a search output, will not
allow the conclusion that the segment
of similarity actually counts for the functionality of
spotted protein.
Rather, a look in
the
annotation
of the sequence is required in order to confirm that the segment
of similarity is relevant for protein function. In order to
determine whether the similarity
is accidental or meaningful,
the seqed sequence
editor might be used to partition
the sequence of interest and search the detected similarity
as
separate sequence. The following Figure shall illustrate this
technique schematically:
Subsection 11.4.2 Translate DNA
Subsection 11.4.3 Tuning of the 'fasta' Parameter "word size"
2 for proteins 6 for DNA
To get different output, try
1 for proteins 3 for DNA (or even 1)
Subsection 11.4.4 Tuning of the 'fasta' Parameter "list size"
40
to get longer lists, try:
100
Subsection 11.4.5 Statistics Analysis of Hits
Subsection 11.4.6 Mapping Result Data
Subsection 11.4.7 Analysis of Target Sequences
Region of
similarity
|------------------------------> query sequence
||||||
|------------------------------> database sequence
: :
: : Redo the sequence search with the
------ isolated fragment of database sequence
This second sequence search should retrieve a similar pattern
than the original search
if the homology was significant. Careful
inspection might also be useful to identify this segment
as a
member of a sequence family which can be used further on to
validate the originally
found sequence.
[ previous chapter ],[ this chapter ][
next chapter ]
, [next page/section] , or [overview] , or [table of contents]