[ Previous chapter ][
This chapter ][ Next chapter ]
findpatterns
searches
databases (e.g., genembl:*),
a file of sequence names (e.g.,
@my.fil, or my.msf{*}, see
later ),
or single sequences (e.g., my.seq) for patterns. The patterns are
reported
with exact matches, as shown below. If databases are searched, the
nomonitor
option
is recommended.
Databases available
at the local site usually include:
1) The definition of GENEMBL can vary. Depending on the
location, you can use either
GENBANK with an exclusion set of
EMBL data not found in GENBANK, or vice versa.
Depending
on whether you are connected to a network which is used to update data on a
periodic basis,
the GENEMBL set may include also daily updates.
2),3) The definitions vary. XEMBL, EM_NEW, EMBL_DAILY, GB_NEW,
XSWISS, SW_NEW, PIR4, etc. are
names that denote the character of the
preliminary entries. Depending on your site and/or affiliation,
those entries which are not found in either the EMBL or GENBANK update sets yet,
possibly show
up in the corresponding other set as so-called "exclusion set".
Other site's GB_NEW and EM_NEW
may contain all entries of GENBANK and EMBL, respectively,
which can cause duplications.
NOTE: The term GENEMBLPLUS, introduced in GCG version 8.1, is equivalent
to GENEMBL, which was
used before version 8.1.
% findpatterns -nomon
% findpatterns -mismatch=4
PROSITE is the protein site database from A.Bairoch. It can be searched with
% motifs
If the full text of the abstract is required it can also be
searched with
% motifs -reference
The normal PROSITE search for a pattern does not include
"frequently"
found patterns such as glycosylation sites. If you
want those to be shown as well use
% motifs -frequent
The SRS system
allows you to search for annotation items in the
PROSITE database effectively.
After
a search in PROSITE or PROSITEDOC any resulting hit can be
linked into any other sequence database.
Similarly, any EMBL or
SWISSPROT entry can be linked into the PROSITE database within
navigation
mode. Alternatively, a whole set can be linked with
[X] Expression
and then something like
SQ1 > PROSITE
As a result from research projects,
other protein pattern
databases have been produced, and are available in various ways, such
as BLOCKS, SBASE and PRODOM.
BLOCKS and PRODOM are usually not installed for protein searching
within GCG programs.
================================= Begin Exercise 11
Generate a schematic comparison (using
the
compare and dotplot programs)
with your peptide
and use parameters like window 30, stringency 15. Measure the
positions where "diagonals"
indicate local homology:
Run the bestfit program on the fragments
determined above and write down
the aligned sequences here (6 to 10 might be sufficient, less than 4 are
too few):
Create a pattern and search it with the findpatterns
program. Compare the
pattern
and the search output with the entry you found in the first exercise.
================================= End Exercise 11
[next page] , or [overview] , or [table of contents]
Subsection 10.3.1 The 'findpatterns' Program
Database name GCG name contents
----------------------------------------------------------------
EMBL + Updates
GENBANK + Updates
(GB as exclusion set) GENEMBLPLUS: all DNA databases (1)
SWISSPROT SWISSPROT: most proteins
PIR International PIR: most proteins
NEW entries of EMBL EM_NEW: EMBL new entries (2)
GENBANK updates GB_NEW: GENBANK new entries (3)
FINDPATTERNS identifies sequences with short pattern queries like
GAATTC or YRYRYRYR. You can define the patterns ambiguously and
allow mismatches. You can provide the patterns in a file or simply
type them in from the terminal.
FINDPATTERNS in what sequence(s) ? genembl:*
Enter patterns individually, one per line. End the list with a blank line.
Pattern 1: G(D,E)(X){0,2}R(D,L)
Pattern 2:
What should I call the output file (* FindPatterns.Find *) ? <RETURN>
The data can also be searched using the mismatch option, which
allows a pre-defined number
of matches. Depending on the question
asked, the output can be fairly voluminous.
Subsection 10.3.2 A PROSITE Database Searching Program
Subsection 10.3.3 Other Pattern Motif Databases
|Vertical:from-to | Horizontal:from-to| weak/strong/other
|-----------------+-------------------+------------------
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13
|----+----+----+----+----+----+----+----+----+----+----+----+----
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |