[ Previous chapter ][ This chapter ][ Next chapter ] The following sections describe GCG software as well as additional software which may not be part of your installation.
Databases available at
the local site usually include:
1) The definition of GENEMBL can vary. Depending on the
location, you can use either
GENBANK with an exclusion set of
EMBL data not found in GENBANK, or vice versa.
Depending
on whether you are connected to a network which is used to update data on a
periodic basis,
the GENEMBL set may include also daily updates.
2),3) The definitions vary. XEMBL, EM_NEW, EMBL_DAILY, GB_NEW,
XSWISS, SW_NEW, PIR4, etc. are
names that denote the character of the
preliminary entries. Depending on your site and/or affiliation,
those entries which are not found in either the EMBL or GENBANK update sets yet,
possibly show
up in the corresponding other set as so-called "exclusion set".
Other site's GB_NEW and EM_NEW
may contain all entries of GENBANK and EMBL, respectively,
which can cause duplications.
NOTE: The term GENEMBLPLUS, introduced in GCG version 8.1, is equivalent
to GENEMBL, which was
used before version 8.1.
The program 'lookup', introduced in GCG version 8.1, is GCG's implementation
of the SRS software . In contrast to the original package, it searches
only sequence databases.
lookup has several levels of
menus. The first one presents
a list of
sequence libraries
which you can
select to search.
The next level provides the following option list:
Keep in mind that the searches are very fast, but may give you
a lot of entries. Therefore,
the program does not exit after the
query has been launched with
<CTRL><D>,
but offers a third menu:
NOTE:
1) Since the 'lookup' program requires significant resources, it may not be supported at the
local site.
2) The 'lookup' program may generate lists which cannot be saved due to
disk shortage. All
lists generated are stored in
files. If your query was not selective enough, these lists will
become rather big.
3) Keep in mind that the lists will not be updated automatically.
As sequence databases grow
very fast, your search has to be repeated
periodically if you use the lists for other purposes
(e.g.,
sequence searching ).
Consecutive Searches
Consecutive searches are queries which do not search an entire database but a list
of files
created earlier. 'lookup' can search these
lists effectively if the
following syntax is used:
$ lookup -infile=@lookup.list
The program 'stringsearch' is not as fast as 'lookup' and may not
be optimally suited
for your purpose. (Use this program only if
you work with a GCG version older than version 8.1.)
'stringsearch' has 2 menu options:
The program identifies entries by searching the sequence
documentation with keywords like
'globin' or
'human'. Example:
% stringsearch
The ATLAS program is
the successor of the XQS program, which replaced the NAQ
and PSQ
programs. None of this software is made or
distributed by GCG. These programs have been created
at Protein Identification Resource International
(PIR) and can be obtained from either office
of PIR. ATLAS has an
extensive capability to search databases by author, entry name,
accession
number, or feature, just to name a few.
The SRS program has been written by Thure Etzold, EMBL. The full SRS
software is neither made
nor currently distributed by GCG.
The program has been created at EMBL and can be obtained
from the author or
by 'anonymous ftp'. SRS has an extensive capability
to search databases
by author, entry name,
accession number, or feature, just to name
a few. In addition,
links provided in one database can be followed to get to the next
entry, e.g., an EMBL entry can immediately be viewed as SWISSPROT
entry, provided that there
is an equivalent.
NOTE:
To run the SRS program, your screen
must speak the "vt100" language
to display the text
nicely.
% srs
Once started, select
[U]
for query and
[S]
for
sequence and you will get the mask needed to compose a query.
The
[S]
field
opens upon a <SPACE BAR> and enables you to select the databases.
At the time of this
writing, VMS and UNIX versions of SRS were
available which are expected to have the same functionality.
The new release of SRS, version
4.x, supports an
ASCII interface if the
srscurs
extensions are installed. 'srscurs' can be run
via
HASSLE .
There is also a very powerful command line interface
which is called 'getz'.
This program is included in GCG release 8.1, were it is
called
lookup , and is available from
the author (T.Etzold) on request.
At the time of this
writing, a networked version of SRS was in
preparation. SRS is also accessible via the
World Wide Web .
The
ENTREZ program
has been written by the programming staff at the
NCBI. This software is neither made nor distributed
by GCG.
The program has been created at NCBI and
can be obtained as CD-ROM distribution or
by
'anonymous ftp'. ENTREZ runs on Mac/PCs, VMS, and
UNIX. The latter two require an X-Windows
interface.
The big advantage of ENTREZ is the inclusion of a subset of
MEDLINE, covering the abstracts
of entries submitted to the
sequence databases. ENTREZ requires specific data sets which can
be purchased on CD-ROM.
Database name GCG name contents
----------------------------------------------------------------
EMBL + Updates
GENBANK + Updates
(GB as exclusion set) GENEMBLPLUS: all DNA databases (1)
SWISSPROT SWISSPROT: most proteins
PIR International PIR: most proteins
NEW entries of EMBL EM_NEW: EMBL new entries (2)
GENBANK updates GB_NEW: GENBANK new entries (3)
Subsection 6.2.1 Using the GCG Software: 'lookup'
Complete the query form below:
All text:
Definition:
Author:
Keyword:
Sequence name:
Accession number:
Organism:
Reference:
Title:
Feature:
On or after (dd-mmm-yy): On or before (dd-mmm-yy):
Shortest sequence length: Longest sequence length:
Inter-field operator: AND Form of output list: Whole Entries
There are several types of fields you can fill out:
17110 entries were found.
Do you wish to:
1) write out this list to a file
2) preview the results
3) refine the query
4) choose different libraries
q) quit
Please choose one (* 1 *):
If you select option 1, a file is created which can be used
by other programs.
Option 2 displays the description of the sequence and option 3 allows you
to refine the query.
Subsection 6.2.2 Using the GCG Software: 'stringsearch'
STRINGSEARCH through what sequence(s) (* GenEMBL:* *) ?
Do you want to search through:
A) definitions
B) complete
sequence records
Please choose one (* A *):
Search for what text patterns ? bluescript
What should I call the output file (* genembl.strings *) ?
...
*** Em_syn:ARBLSKP ***
pBluescript SK(+) vector DNA, phagemid excised from lambda ZAP 2,958bp
...
Sequences searched: 69842
Sequences with matches: 8
Patterns sought: bluescript
Output file: genembl.strings
NOTE:
Subsection 6.2.3 Find Sequences in the Databases with ATLAS
ATLAS of
PROTEIN and GENOMIC
SEQUENCE
Version 1.40, June 1992
(C) Copyright 1992 National Biomedical
Research Foundation National Biomedical Research Foundation
3900 Reservoir Road, NW Washington, DC 20007-2195 USA Tel: 202-687-2121
FAX: 202-687-1662E-Mail: PIRMAIL@GUNBRF.BITNET
We do not currently support the ATLAS software in Basel.
Subsection 6.2.4 Find Sequences in the Databases with SRS
Subsection 6.2.5 Find Sequences in the Databases with ENTREZ
[ previous chapter ],[
this chapter ][ next chapter ]
, [next page/section] , or [overview] , or [table of contents]