Section 9-2: GCG's Implementation of Schematic Comparison

[ Previous chapter ][ This chapter ][ Next chapter ] We will assume that all these setup operations have been successfully completed. Note that the methods described below are valid for both DNA and protein sequences.


Subsection 9.2.1

Comparison Calculation

compare calculates the dots to be displayed later , comes with two different algorithms; the "window/stringency" as default. You might try compare -word for really large sequences.


Subsection 9.2.2

Display Program

dotplot displays dots calculated by compare, and will look nicer if you use additional options like the following. For an overview of possible options, use dotplot with the "check" option.

Recommendation for the 'dotplot' program:

Nicer figures will be obtained if you give the following command line options. If you use WPI, make sure that the command line options are ticked as indicated below.

dotplot -tickaxes -symbol=2 -font=3

================================= Begin Exercise 8

Schematic pairwise DNA analysis: Compare two sequences using the 'dotplot' technique.

Using previous exercise results, you should have two DNA sequences by now: my1.seq as the typed-in sequence, and my2.seq as the reading-frame extracted DNA sequence from the seqed exercise. You shall compare these two sequences now.

To solve this problem, follow this schedule:

================================= End Exercise 8


Subsection 9.2.3

Detection of Internal Repeats

It is important to know as much about the sequence of interest as possible. The dotplot as explained above may be used to analyse sequences internally, i.e., you may compare a sequence against itself. The dotplot as such becomes perfectly symmetrical. The gcg implementation of the 'dotplot' program will recognise that the sequences on both axes are identical and will, therefore, plot only half of the sequence. You might want to force a full display with

% dotplot -all

The benefit of an internal repeat analysis will be obvious if you encounter gene duplication or the occurrence of several functional protein motifs.

================================= Begin Exercise 9

Internal repeat analysis: Analyse a single sequence using the 'dotplot' technique in order to find internal repeats.

Using previous exercise results, you should have a database DNA sequence my2.seq, and the translated sequence, m19311.pep as peptide. You shall compare these two sequences now with itself on each DNA and protein level, and compare the results. Note that, in particular on protein level, the adjustment of window and stringency values might be a lengthy process.

To solve this problem, follow this schedule:

Tip for the Superimposition of Dotplots: All gcg graphics routines offer a "density" to plot the output. If you do not accept the default value but use a number dividable by three for DNA, and the corresponding number for the protein comparison.

================================= End Exercise 9


[ previous chapter ],[ this chapter ][ next chapter ] , [next page/section] , or [overview] , or [table of contents]