Section 11-4: Searching Strategies

[ Previous chapter ][ This chapter ][ Next chapter ] Frequently, the combination of methods will get more comprehensive results than a single search. Therefore, even if the first trial of a sequence search produces apparently satisfactory results, it is suggested to run all available methods. Additionally, the following measures will help.


Subsection 11.4.1

Tuning of your Sequence

Use the sequence editor seqed to create smaller sequences (100 bp, or 30 AA) , or cut out frequently occurring parts such as ALU I repeats. The following criteria might be used to split DNA sequences:


Subsection 11.4.2

Translate DNA

Determine the reading frame with the single-sequence analysis methods (e.g., frames and convert the DNA sequence to a protein with map followed by extractpeptide ), and run the search on protein level with tfasta instead of DNA level.


Subsection 11.4.3

Tuning of the 'fasta' Parameter "word size"

Default settings are:

 

  
           2 for proteins            6 for DNA
  

  
To get different output, try
 

  
           1 for proteins           3 for DNA  (or even 1)
  

  


Subsection 11.4.4

Tuning of the 'fasta' Parameter "list size"

Default setting:

 

  
           40 
  

  
to get longer lists, try:
 

  
          100
  

  


Subsection 11.4.5

Statistics Analysis of Hits

In case of doubt, you might use the 'bestfit' program with the randomise option. Make sure that you give at least 200 randomisations to get a reasonable statistical distribution. Alternatively, you might use the shuffle program to generate a new sequence with identical length and composition. However, as the ordering of the symbols is different, the subsequent search should give significantly different groups of hits than the original search sequence.

If you assume that your sequence is similar to a given group but failed to detect it with the selected search algorithm, you might consider to run a "prototype" search and use the list of sequences as subset (see below).

The fastalert program as developed by F.Eggenberger at BioComputing Basel is a network application which will do the statistical analysis for you.


Subsection 11.4.6

Mapping Result Data

Sequence similarity searches will result in a list of sequences which is reported to be similar to the original. However, in contrast to a pattern search, query sequences might be of considerable length, and, therefore, show similarity to other sequences in several regions. This requires that the inspection of the sequence searching output is classified by sequence coordinates of the query sequence. As no programs do currently exist which will allow for an automatic assignment, manual mapping of the detected sequence features is required. This manual mapping might also go along with the labelling of additional sequence features as revealed by the method of single sequence analysis .


Subsection 11.4.7

Analysis of Target Sequences

If several hits are encountered in the result of a sequence search, a close inspection of the actually occurring hits is essential. It might sound trivial but a title of a sequence, if listed in a search output, will not allow the conclusion that the segment of similarity actually counts for the functionality of spotted protein. Rather, a look in the annotation of the sequence is required in order to confirm that the segment of similarity is relevant for protein function. In order to determine whether the similarity is accidental or meaningful, the seqed sequence editor might be used to partition the sequence of interest and search the detected similarity as separate sequence. The following Figure shall illustrate this technique schematically:

 
                    
  
          Region of        
  
         similarity
  
 |------------------------------>  query sequence 
  
           ||||||
  
        |------------------------------>  database sequence 
  
          :      :
  
          :      : Redo the sequence search with the 
  
           ------  isolated fragment of database sequence 
  

  
This second sequence search should retrieve a similar pattern than the original search if the homology was significant. Careful inspection might also be useful to identify this segment as a member of a sequence family which can be used further on to validate the originally found sequence.
[
previous chapter ],[ this chapter ][ next chapter ] , [next page/section] , or [overview] , or [table of contents]