Section 8-5: Translation

[ Previous chapter ][ This chapter ][ Next chapter ]


Subsection 8.5.1

DNA to Protein

Two programs should be run, one after the other. The first is needed to determine the reading frame. If you know it already or if you ran the corresponding analysis programs ( frames or similar ) you can immediately proceed to run the second program

% translate

Note that you might want to reverse the sequence before translation. The second option is to use the program map with the corresponding translation options, and afterwards extract the corresponding peptides from the output with

% extractpeptide

Translation of Genomic Sequences

The translation of genomic sequences requires that, before running the program translate , you know the intron/exon borders. Without this knowledge, erroneous sequences will be the result. Unfortunately, the availability of programs to detect these genetically relevant sites is very limited and, if possible at all, limited by the reliability of the predictions of computational models. The GCG program package does not currently support this type of prediction.

Translation of Database Sequences

In the DNA sequence databases, entries of genetic origin will frequently cross-reference the protein sequence. This saves you a translation as you may use the protein sequence directly.

If this is not true or if you do not have the protein sequence database available locally, DNA sequences of genetic origin occasionally show CDS features which describe the position of reading frames and the corresponding intron/exon boundaries. The translate program will allow to translate one after the other. Alternatively, the WWW browser of the SRS system will allow to click on the peptide feature and translate the sequence automatically. In order to get this sequence into GCG format, you might use the mouse and highlight the sequence (and only the sequence). Next, copy the sequence into the paste buffer (use the pull-down of the <Edit> menu). Then, on the command line, you give the command (as an example, for the sequence my.seq)

% cat > my.seq

and, subsequently, you paste the contents into the sequence (again, by using the <Edit> pull-down). What you have done is to open a file with the cat command and you have appended the text into this file. Therefore, after the paste, the file is still open. You need to close it accordingly by typing <CTRL><D>.

Next, you need to reformat the file to GCG format. As it is plain text, it may complain about a missing ".." divider but, this should not matter.

NOTE:

1) You need to be sure that you copy only the sequence.

2) The WPI interface is not useful for this trick.

3) Apply manual checking whether you succeeded (is M at position ?)

4) Make sure that no stop codons (indicated by "*") are present in your sequence.

Translation of Mitochondrial Sequences

Be aware that translation requires a table which contains the amino acid symbols resolved to the individual codons. Some sequences might have other translation patterns. The GCG software offers these different tables. Refer to the genhelp section on the translate program.


Subsection 8.5.2

Protein to DNA

The translation from amino acid code to DNA requires a correct codon usage table . The default table might not be suited for detailed analysis. To get an organism-specific codon usage table, refer to corresponding section of the BioCompanion, or compile your own one from an existing (set of) sequence(s) with the program

% codonfrequency

To use a specific table to translate DNA into protein, use

% backtranslate your.seq codon.file

e.g.,

 

  
backtranslate hp7764.pep drosophila_high.cod
  

  
The second file name will be assumed to be the codon file. Examine the result using the methods described in the file handling section .


Subsection 8.5.3

DNA to RNA and Vice Versa

The change of T to U and U to T can be done with the reformat program:

% reformat-DNA

or

% reformat-RNA

Similarly, the case of sequence characters can be changed with the reformat program by using the options tolower and toupper, respectively.

If problems occur because of a wrong sequence type assignment, you need to reformat the sequence specfically with type 'NUCLEIC' or 'PROTEIN', respectively.


[ previous chapter ],[ this chapter ][ next chapter ] , [next page/section] , or [overview] , or [table of contents]