Section 8-4: Restriction Enzyme Mapping Programs

[ Previous chapter ][ This chapter ][ Next chapter ] The analysis of a DNA sequence to estimate composition or codon region was based on little auxiliary data. If we want to detect possible cleavage sites in a biological sequence we need to have the known sites listed in a database. In contrast to the codon usage tables, which are systematic and complete, restriction enzyme tables need to consider different sites, including variances.


Subsection 8.4.1

Principle of Patterns

In order to define a pattern in the nucleotide alphabet, the use of ambiguity symbols is a good way to allow several different symbols to be used at one position. Proteins, however, will need a different mechanism. The definition and properties of patterns are described in a later section of the BioCompanion. Briefly, the restriction enzyme cleavage sites are described in a format called a pattern with the following properties:

NOTE: This type of program assumes that cleavage and binding site are extremely close to each other. The programs using patterns to describe restriction enzymes are NOT usable for other purposes unless explicitly mentioned.

Limitations of the Pattern Approach in DNA Analysis

Patterns in DNA are known by example mostly. Very little is known on detailed properties (such as promoter requirements). Look at the following example. The pattern language for a simple promoter, such as

 
 
  
TATA box, about 30 to 300 less important base pairs, and the start codon
  

  
will read in a pattern language as
 
 
  
TATA(N){30,300}ATG
  

  
However, the ATG as required in the pattern must not be any methionine, but the start codon. Therefore, it depends very much on the input sequence which is used for comparison in the pattern analysis whether the result of this comparison is of use or not. Most genetically important elements are, unfortunately, only known as example. Therefore, if a general pattern is derived from these examples, we risk that many comparisons of the pattern to an input sequence are computationally correct but biologically irrelevant. Therefore, the straightforward application of patterns is valid for restriction mapping, but will be problematic for genetic motifs.


Subsection 8.4.2

Using the 'prime' Program to Predict Primers in a Pattern Approach

The program prime can predict "good" primers from a given nucleotide sequence. Note that the use of this program does only suggest multiple primers; the user has to evaluate suitable positions from the output. The program 'prime' computes a text output and a graphic overview which is suitable to identify regions of good primers; as usually the first top hits are located in only two or three regions rather than being equally dispersed on the entire region of interest. The 'prime' program has some limitations, as it should not be used to predict primers with a target of more than a certain length and a certain maximum length for each region.

Other software packages, specifically, PC-type applications, might be worth considering if you use primer predictions frequently.


Subsection 8.4.3

Principle of Restriction Enzyme Mapping in a Pattern Approach

Restriction enzymes will cleave DNA sequences at certain positions. A program which analyses such cleavage sites will, therefore, compare the entire DNA input sequence versus a database of enzymes and locate matches of the DNA sequence and the binding site of the enzyme as described in the database. The output of the programs will print the location of the cleavage sites either schematically (an overview plot, as graphics), or analytically (printed sequence and restriction enzyme cleavage sites). The output of the latter is the most detailed view, however, overloaded with information and occasionally too crowded. Therefore, it is possible to exclude enzymes from the display even if they would theoretically match. The criteria for this exclusion can be the following:

If the size of the fragment matters, programs are available which will display the fragments sorted by size rather than by cleavage position. For this purpose, it matters whether the sequence is circular or not (such as plasmids: One cut will not result in a size difference).

Last not least, the GCG program package provides a functionality for drawing plasmids with their cleavage sites.


Subsection 8.4.4

Programs

Useful options to 'map', 'mapsort', 'mapplot':

 

  
-once                 only 1 cut in entire sequence
  
-sixbase              only sixbase cutters
  
-exclude=200,500      do not consider enzymes cutting between 200 and 500
  
-mincut=2             exclude enzymes cutting once or not at all
  
-maxcut=1             select only enzymes cutting once
  

  
mapplot only:
  
-noplot -out=my.txt   suppresses plot and creates text file my.txt instead
  
-double               doubles height of characters in graphics mode
  

  
mapsort only:
  
-plasmid              to create  *.tick file as input to plasmidmap
  

  
Plasmid Drawing

The program plasmidmap (*) reads a *.tick file generated by the mapsort program used with the plasmid option. To get started, you might want to fetch the example files and try it with these:

% fetch pgamma.*

% plasmidmap @pgamma.fil

Further information is available in printed form, and it is highly recommended to review this documentation before you spend extended time periods with the programs. Alternatively, on-line help is available. To get started, use the command genhelp plasmidmap description .

NOTE: The GCG software package graphics cannot be easily transferred into PC type of graphics in version 8.x of the package. Encapsulated postscript will be an option for high-quality prints in combination with manual reinking if required. Public domain and commercial software packages might suit the purpose better than the 'plasmidmap' program from GCG. Before you investigate these alternatives, however, please make sure that the effort is worth.


[ previous chapter ],[ this chapter ][ next chapter ] , [next page/section] , or [overview] , or [table of contents]