BACKTRANSLATE

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

FUNCTION

[ Top | Next ]

BackTranslate backtranslates an amino acid sequence into a nucleotide sequence. The output helps you identify areas with fewer ambiguities that might be candidates for synthetic probes.

DESCRIPTION

[ Previous | Top | Next ]

BackTranslate uses a translation table to backtranslate a protein sequence to the most probably or most ambiguous nucleic acid sequence. The output file can be used as input to other Accelrys GCG (GCG) programs.

If you choose one of the table of back-translations parameters, the program also uses a codon preference table and writes the codons for each amino acid in order of their preference in that table. Below each codon list, there is a number between 0 and 1,000; it is the product of the probabilities for the most likely codons for the next four amino acids multiplied by 1,000. The higher the number, the more likely it is that the next 12 nucleotides (four amino acids) contain preferred codons.

EXAMPLE

[ Previous | Top | Next ]

To make a back-translation of the ilvI protein showing all possible back-translations from amino acids one to six, using codon frequencies from the file ecohigh.cod, you would do the following:

% backtranslate

  BACKTRANSLATE what sequence ?  ilvhiaa.pep

                 Begin (* 1 *) ?

                 End (* 956 *) ?  6

  Would you like to see:

      a) table of back-translations and most probable sequence

      b) table of back-translations and most ambiguous sequence

      c) most probable sequence only

      d) most ambiguous sequence only

  Please choose one (* b *):

  Use what codon frequency file (* GenRunData:ecohigh.cod *) ?

  What should I call the output file (* ilvhiaa.seq *) ?

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file:

!!NA_SEQUENCE 1.0

 BACKTRANSLATE of: : ilvhiaa.pep  check: 2165  from: 1  to: 6

E Coli. ilvI - ilvH (peptide)

 Using codon frequencies from: /package/share/9.0/gcgcore/data/rundata/ecohigh.cod

 CheckFile: 9032

Codon usage for enteric bacterial (highly expressed) genes 7/19/83

    Ser        Phe        Ser        Gln        Pro        Trp

  UCC 0.37   UUC 0.76   UCC 0.37   CAG 0.86   CCG 0.77   UGG 1.00

  UCU 0.34   UUU 0.24   UCU 0.34   CAA 0.14   CCA 0.15

  AGC 0.20              AGC 0.20              CCU 0.08

  UCG 0.04              UCG 0.04              CCC 0.00

  AGU 0.03              AGU 0.03

  UCA 0.02              UCA 0.02

  89         186        245        0          0          0

ilvhiaa.seq  Length: 18  September 30, 1998 17:08  Type: N  Check: 2929  ..

       1  WSNTTYWSNC ARCCNTGG

INPUT FILES

[ Previous | Top | Next ]

BackTranslate accepts a single protein sequence and a single codon frequency table as input. Look at the CodonFrequency program for information about how to create or modify a codon frequency file. If BackTranslate rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Prime selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.

CodonFrequency tabulates codon usage from sequences or existing codon frequency tables. Composition counts trinucleotides from any set of sequences. The mapping programs can be run with -ALL to identify all potential restriction sites in back-translated sequences. If you run the mapping programs with -SILent, they will identify potential restriction sites that can be created which won't change the translation of the nucleic acid sequence.

RESTRICTIONS

[ Previous | Top | Next ]

No checking is done to see that your codon frequency table and your translation table agree. The most ambiguous back-translated sequence comes from the translation table. The most probable back-translated sequence comes from the codon frequency table. The table of codon choices also comes from the codon frequency table.

CONSIDERATIONS

[ Previous | Top | Next ]

You should realize that the most ambiguous back-translation uses three IUB codes (see Appendix III) to represent each codon. These codes are not capable of correctly representing sets of codons where more than one of the bases is incompletely permuted. This is the case for the stop codons and for the residues with six synonymous codons. For instance, serine should back-translate into the codonsTCT, TCC, TCA, TCG, AGT or AGC . These can be represented precisely as either TCN or AGY. The codon shown by BackTranslate for serine is WSX, which has eight permutations, six of which are correct and two of which are not!

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % backtranslate [-INfile1=]ilvhiaa.pep -Default

Prompted Parameters:

-BEGin=1 -END=6         sets the range of interest

-MENu=a                 menu for the type of output, where:

                          A=table of all back-translations and most probable sequence

                          B=table of all back-translations and most ambiguous sequence

                          C=most probable sequence only

                          D=most ambiguous sequence only

[-INfile2=]ecohigh.cod    specifies the codon frequency table

[-OUTfile=]ilvhiaa.seq    names the output file

Local Data Files:

-TRANSlate=translate.txt defines most ambiguous representation for each codon family

Optional Parameters:

-WINdow=4                 shows probability of the preferred codons for next 4 amino acids occurring together by chance

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

BackTranslate uses translate.txt to create the most ambiguous back-translation in your protein sequence. If the standard translation table does not apply to your sequence, you can provide an alternate file named translate.txt in your current working directory or use -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-BEGin=1

Sets the beginning position for all input sequences.

-END=100

Sets the ending position for all input sequences.

-MENu=A

Indicates the type of output to create. -MENu=A produces a table of possible codons plus the most probable nucleic acid sequence. In order, the remaining three options are: a table of possible codons plus the most ambiguous sequence, the most probable sequence only, and the most ambiguous sequence only.

-INfile2=ecohigh.cod

Selects the codon frequency table to use when constructing the most probable sequence. You can select optional codon frequency tables to bias the results in favor of the codon usage in E. coli, human, drosophila, maize, yeast, and some other class of organisms.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-WINdow=4

BackTranslate normally displays the probability of the preferred codons for the next four amino acids in the sequence, based on your codon frequency table. Use this parameter to set the number of codons used in the display to a number other than four.

Printed: May 27, 2005 11:42

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.