[ Previous chapter ][
This chapter ][ Next chapter ]
The analysis of a DNA sequence to estimate composition or
codon region was based on little
auxiliary data. If we want to
detect possible cleavage sites in a biological sequence we need
to have the known sites listed in a
database.
In contrast to the codon
usage tables, which are
systematic and complete,
restriction enzyme tables
need to consider different sites,
including variances.
In order to define a pattern in the nucleotide alphabet, the use
of ambiguity symbols is a good way to allow several different
symbols to be used at one position.
Proteins, however, will
need a different mechanism. The definition and properties of
patterns
are described in a later section of the
BioCompanion.
Briefly, the restriction enzyme cleavage sites are described
in a format called a
pattern
with the following properties:
NOTE: This type of program assumes that cleavage and binding site are extremely
close
to each other. The programs using patterns to describe
restriction enzymes are NOT usable for
other purposes unless
explicitly mentioned.
Limitations
of the Pattern Approach in DNA Analysis
Patterns in DNA are known by example mostly. Very little is known on
detailed properties (such
as promoter requirements). Look at the following
example. The pattern language for a simple
promoter, such as
The program prime can predict "good" primers from a
given nucleotide sequence.
Note that the use of this program does
only suggest multiple primers; the user has to evaluate
suitable
positions from the output. The program 'prime' computes a text
output and a graphic
overview which is suitable to identify
regions of good primers; as usually the first top hits
are located
in only two or three regions rather than being equally dispersed on
the entire
region of interest. The 'prime' program has some
limitations, as it should not be used to predict
primers with a
target of more than a certain length and a certain maximum length
for each
region.
Other software packages, specifically, PC-type applications, might
be worth considering
if you use primer predictions frequently.
Restriction enzymes will cleave DNA sequences at certain positions.
A program which analyses
such cleavage sites will, therefore, compare the
entire DNA input sequence versus a
database
of enzymes and locate
matches
of the DNA sequence and the binding site of the enzyme as described
in the database. The
output of the programs will print the
location of the cleavage sites either schematically
(an overview plot, as graphics), or analytically (printed
sequence and restriction enzyme cleavage
sites). The output of the
latter is the most detailed view, however, overloaded with information
and
occasionally too crowded. Therefore, it is possible to
exclude
enzymes from the display even if they would theoretically match.
The criteria for this exclusion
can be the following:
If the size of the fragment matters, programs
are available which
will display the fragments sorted by size rather than by cleavage
position.
For this purpose, it matters whether the sequence is circular
or not (such as plasmids: One
cut will not result in a size difference).
Last not least, the GCG program package provides a functionality for
drawing
plasmids
with their cleavage sites.
Useful options to
'map', 'mapsort', 'mapplot':
The program plasmidmap (*) reads a
*.tick
file generated
by the mapsort program used with the
plasmid
option.
To
get started, you might want to fetch the
example files and try it with these:
% fetch pgamma.*
% plasmidmap @pgamma.fil
Further information is available in printed form, and it is
highly recommended to review
this documentation before you spend
extended time periods with the programs. Alternatively,
on-line help
is available. To get started, use the command
genhelp plasmidmap description
.
NOTE:
The GCG software package graphics cannot be easily transferred into
PC type of graphics in
version 8.x of the package. Encapsulated postscript
will be an option for high-quality prints
in combination with manual reinking
if required. Public domain and commercial software packages
might suit
the purpose better than the 'plasmidmap' program from GCG. Before you investigate
these alternatives, however, please make sure that the effort is worth.
Subsection 8.4.1 Principle of Patterns
TATA box, about 30 to 300 less important base pairs, and the start codon
will read in a pattern language as
TATA(N){30,300}ATG
However, the ATG as required in the pattern must
not be any methionine,
but the
start codon.
Therefore, it depends very
much on the input sequence which is used for
comparison in the pattern analysis whether the
result of this comparison
is of use or not. Most genetically important elements are, unfortunately,
only known as
example. Therefore, if a general pattern is derived from these examples,
we
risk that many comparisons of the pattern to an input sequence are
computationally correct
but biologically irrelevant. Therefore,
the straightforward application of patterns is valid
for restriction
mapping, but will be problematic for genetic motifs.
Subsection 8.4.2 Using the 'prime' Program to Predict Primers in a Pattern
Approach
Subsection 8.4.3 Principle of Restriction Enzyme Mapping in a Pattern
Approach
Subsection 8.4.4 Programs
-once only 1 cut in entire sequence
-sixbase only sixbase cutters
-exclude=200,500 do not consider enzymes cutting between 200 and 500
-mincut=2 exclude enzymes cutting once or not at all
-maxcut=1 select only enzymes cutting once
mapplot only:
-noplot -out=my.txt suppresses plot and creates text file my.txt instead
-double doubles height of characters in graphics mode
mapsort only:
-plasmid to create *.tick file as input to plasmidmap
Plasmid Drawing
[ previous chapter ],[
this chapter ][ next chapter ]
, [next page/section] , or [overview] , or [table of contents]