Section 12-5: Profiles

[ Previous chapter ][ This chapter ][ Next chapter ]


Subsection 12.5.1

Principle

Rigorous searching implements the alignment methods used by programs like bestfit in a sequence database searching routine. The usefulness of this enhanced searching can be enhanced by using so-called profiles: Once a sequence search revealed homologies to several sequences, it is desirable to identify shared regions of homology in a multiple sequence alignment. The information buried in the alignment can be re-utilised further on to be used in analysis and searches. Remote similarities of the "twilight zone" are not necessarily easily detected by heuristic searching methods. Various algorithms implement alignment procedures known from pairwise alignments but require significantly more resources. The GCG program package currently features the profile search method from Gribskov et al.

Profile searching unites the benefit of comparison matrices with the features of sequence-specific allowance of exchanges such as already used in the pattern approach . However, the substitutions of patterns follow a yes/no scheme. To enhance sensitivity, the matrix values for a given exchange in profiles are weighted according to the observed alignment.

Profile searching is a complex method and severely depends on the input sequence diversity in order to justify extensive work. Please make sure that you have read suitable introductory literature. The GCG Program Reference Manual, for example, has a Profile Analysis Essay which you should read before you use the methods extensively.


Subsection 12.5.2

Formats of Sequences

The data used for profile searching must be in GCG GCG format. Use reformat or genmanual sequence_exchange for details. For multiple sequence alignment, there are several possibilities of file formats to start with.

 

  
|                         | file     |
  
|     type of file        | ending   |   called as (example)
  
+-------------------------+----------+-------------------------
  
| normal sequence file    | .seq or  |
  
|                         | .pep     |   my.seq
  
+-------------------------+----------+-------------------------
  
| file of sequence names  | .frg and | 
  
| (from 'lineup', etc.)   | .fil     |   @my.fil
  
+-------------------------+----------+-------------------------
  
| multiple sequence files |          |
  
| (from 'pileup')         | .msf     |   my.msf{*}
  


Subsection 12.5.3

Profile Generation

The program profilemake generates a profile from a set of aligned sequences in msf format.


Subsection 12.5.4

Profile Searching

The program profilesearch uses a profile generated by profilemake and produces a listing of best-fitting sequences in a database. For aligning these with the profile the program profilesegments is required.


Subsection 12.5.5

Profile Analysis

The program profilegap uses a profile generated by profilemake and compares this to a sequence with a comparison algorithm of an end-to-end alignment ( gap ).

================================= Begin Exercise 13

Understand the benefit, scope and limitations of a rigorous searching method. Generate a profile and show the difference in searching the alignment vs. searching the consensus.

================================= End Exercise 13


[next page] , or [overview] , or [table of contents]