[ Previous chapter ][
This chapter ][ Next chapter ]
NOTE:
To use the programs described below, it is essential that you are familiar
with
single-sequence editors and
file handling .
If you start from scratch, use the command
% lineup
The screen will ask
Add a New Sequence
Move to the command line with <CTRL><D>, give the command
'new',
type the new sequence
name.
Move to the command line with <CTRL><D>, give the command
'get',
type the new sequence
name.
The sequence given is either
a sequence in your own directory as
created with commands from the GCG package or a sequence
from the
database. You might need to use the import functions
of the
GCG package.
The 'lineup' editor works similar
as the 'seqed' program
discussed earlier for single sequence
input. However,
as multiple sequences will show as several lines, the
<CURSOR-UP> and
<CURSOR-DOWN> keys will be used to jump between
different sequences in the alignment.
The period (.) key will be used to insert gaps.
CAUTION: If you have a key mapping file in the
current directory
such as in use for sophisticated
use of the 'seqed' program the
period might be missing, therefore, does not work in 'lineup'.
SOLUTION: Delete the file or add the period accordingly.
Consensus Calculation
One of the sequences (the one at line
0)
is special: It might hold a consensus
sequence which is automatically
updated upon gap insertion or sequence shifting. To activate this
mechanism,
move to the
move to the command line (<CTRL><D>),
and type
auto.
Get Help
Move to the command line (<CTRL><D>), and type help.
Exit
Move to the command line (<CTRL><D>), and type exit.
If you used the lineup
program earlier, it will have
created a so-called file of sequence names (FOSN, extension *.fil)
and numerous fragments which represent your sequences in its
lined-up form. For example,
to reload an alignment of the group eco, call
the lineup program with
% lineup eco
If you used another program to produce a multisequence alignment
(e.g., the program
pileup ), this might be in the
multiple sequences format ( MSF, extension *.msf).
To use 'lineup'
on a file called
eco.msf,
call
it as
% lineup -msf eco
You can reformat
each of the formats into the other with the
command reformat. Use the section of genhelp
to
learn about how to convert MSF to FOSN format
and FOSN to MSF
format.
If you need to name
more than one sequence, you can use asterisks (*) as "wildcards".
(See section
file handling for use of wildcards
in file naming conventions). The GCG programs, however,
allow you
to write a file which contains only filenames rather than the
files itself. To
create such a
list file,
call the system editor
(see section editing ) and enter all file names of
the
sequences you are interested in after having entered a line with
two periods ("..") as
the first line. You can mix either your own
sequences or use names from the database.
Refer
to the section on Lists for details on
GCG list
files.
As described
above,
you can use several GCG and GCG-like programs to produce a
file of sequence names (also known
as Lists ). Remember
that the file should contain only
sequence names after the two periods (This is taken care of by the
GCG programs automatically
if applied correctly).
The GCG program pileup can align many
sequences by specifying them either
as single files using
wildcards (e.g., *.seq) or by using a file of sequence names and
specifying
these as @my.fil (if the file my.fil contains the
sequence names). (This is usually called
automated multisequence
alignment). As an example, the result of pileup utilising data
from
a findpatterns run is shown:
% pileup
The program pileup
generates an output file which shows the results of
the clustering
process.
NOTE: This visual representation of sequence similarity must
not be used as a phylogenetic
tree because the length and ordering
of sequences is based on sequence similarity and not on
phylogenetic algorithms. For coarse reviewing of sequence
relationships, however, the dendrogram
could be considered.
Otherwise, use the programs distances
and growtree as described below. To visualise the
dendrogram,
remember that you need to define the plotting environment with
setplot
if you did not do this earlier or
work with the Wisconsin Package Interface
(WPI). Eventually,
define the X-Windows environment
correctly. Next,
issue the command
% figure pileup.figure
pretty generates an output file which
shows the results of the automatic
sequence alignment letter-by-letter.
To visualise the
alignment, you can use a variety of special command
line parameters. Use the option
% pretty -check
It is important that you specify the multiple sequence
alignment correctly,
e.g.,
% pretty -cons -diff='.' pileup.msf{*}
Sometimes, the file name
descriptors of the pretty output file
are not needed. In this
case, the replace program can be used to have the
file name
replaced by spaces.
To accomplish this, create a text file (see the
editing section for help on how to edit files)
and write two periods, as well as the replacement
string. If
you use all default
settings of pileup, such a file would be named my.replace
and look like
% replace pretty.pretty my.replace new.pretty
The program
plotsimilarity
uses a window to slide across
the sequence alignment and plots the similarity of the sequences.
To learn more about the options of plotsimilarity, use the
check
option.
This program requires graphics and should only be used
after the plotting environment has been
defined
( setplot ). The
sequence alignment should be specified as an msf file, e.g.,
pileup.msf{*}.
The program 'distances' writes a text file with a matrix
showing each-to-each comparison scores.
It has been changed in version 8 of the GCG software
and is
described below .
Subsection 12.2.1 Manual Editing with the Multisequence Editor
Lineup of what sequence group ?
And you are requested to type in a name (try to use less
than
9 characters, and use only characters or numbers).
Use all-lowercase letters
(in
particular, on UNIX)
. The screen, will then
open and
with a similar display to the normal sequence editor
seqed
.
<CTRL><D>
: new
Create a NEW sequence (ten letter max.):
test1
Move to desired position with arrows. Press <RETURN> to select
position.
<CURSOR-UP> <RETURN>
NEW sequence named test1 placed at 0,1
Include an Existing Sequence
<CTRL><D>
: get
GET what sequence:
protein:mchu
LINEUP get of protein:mchu
Begin (* 1 *) ? <RETURN>
End (* 149 *) ? <RETURN>
Reverse (* No *) ? <RETURN>
That Begins: MADQLTEEQI
and Ends: EEFQMMTAK
Is this what you want to include (* Yes *) ? <RETURN>
Move to desired position with arrows. Press <RETURN> to select
position.
<CURSOR-UP> <RETURN>
Enter name (ten letter max.): Mchu <RETURN>
LINEUP get of "protein:mchu" from: 1 to: 149 ....
Navigation
Subsection 12.2.2 Manual Editing of Sequence Alignments
Subsection 12.2.3 Manual Editing of File of Sequence Names
Subsection 12.2.4 Automatic Creation of File of Sequence Names
Subsection 12.2.5 Automatic Generation of a Multiple Sequence Alignment
PILEUP creates a multiple sequence alignment from a group of
relate sequences using progressive, pairwise alignments. It can
also plot tree showing the clustering relationships used to
create the alignment.
PileUp of what sequences ? @findpatterns.find
1 MCHU 149 aa
2 MCRB 148 aa
3 MCRT 149 aa
4 MCBO 148 aa
What is the gap weight (* 3.00 *) ? <RETURN>
What is the gap length weight (* 0.10 *) ? <RETURN>
This program can display the clustering relationships graphically.
Do you want to:
A) Plot to a FIGURE file called "PileUp.Figure"
B) Plot graphics on HP7550 attached to /dev/tty
C) Suppress the plot Please
choose one (* A *): <RETURN>
The minimum density for a one-page plot is 4.0 sequences/100 platen units.
What density do you want (* 4.0 *) ? <RETURN>
What should I call the output file name (* pileup.msf *) ? <RETURN>
Determining pairwise similarity scores...
1 x 2 1.49
1 x 3 1.50
1 x 4 1.50
2 x 3 1.49
2 x 4 1.49
3 x 4 1.50
Aligning...
1 .......-.
2 .......-.
3 .......-.
FIGURE instructions are now being written into pileup.figure.
Total sequences: 4
Alignment length: 149
CPU time: 00.84
Output file:/biox/biocomputing/doelz/pileup.msf
(The output
file name would be slightly different
but the procedure is
identical on VMS and UNIX).
Subsection 12.2.6 Display of the Dendrogram Generated by the 'pileup'
program
Subsection 12.2.7 Presentation of the Alignment
pileup.msf{*}
is the correct way to specify a sequence alignment. To generate a
sophisticated print
with a consensus sequence and showing the differences
only, use the command
..
"@pileup.msf"
" "
where pileup.msf was the
name of the msf file. The command to be issued for replacement
is
Subsection 12.2.8 Graphic Presentation of Similarity in the Alignment
Subsection 12.2.9 Schematic Presentation Sequence Similarity
[ previous chapter ],[
this chapter ][ next chapter ]
, [next page/section] , or [overview] , or [table of contents]