TRANSLATE

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

CONSIDERATIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

Translate translates nucleotide sequences into peptide sequences.

DESCRIPTION

[ Previous | Top | Next ]

Translate creates a protein sequence by translating nucleic acid sequences that you specify. In addition to translating a single range of a given nucleotide sequence, it will concatenate ranges into a single assembly for translation or translate each range before assembling them. The ranges can be of any length, come from either strand of a sequence, or even from more than one sequence file. Unlike most Accelrys GCG (GCG) programs, Translate lets you specify ranges as if the sequence was circular (extending past the end of the sequence and continuing at the beginning).

Translate can be run either interactively or noninteractively. When you specify a single sequence to translate and -Default is not on the command line, it works interactively, prompting you for each segment to translate. To run avoid being prompted, either use -Default on the command line or supply a multiple file specification as input with either a wild card or a list file specification. (See the INPUT FILES topic below for more detailed information.)

Translate supports the IUB-IUPAC character set for the representation of nucleotide ambiguities. See Appendix III for a list of these characters.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Translate to translate the G-gamma gene in gamma.seq into the protein sequence for the human fetal beta globin G gamma:

 
 
% translate
 
 TRANSLATE from what sequence ?  gamma.seq
 
                  Begin (* 1 *) ?  2179
                End (* 11375 *) ?  2270
               Reverse (* No *) ?
 
 Range begins ATGGG and ends GGAAG.  Is this correct (* Yes *) ?
 
 That is done, now would you like to:
 
   A) Add another exon from this sequence
   B) Add another exon from a new sequence
 
   C) Translate and then add more genes from this sequence
   D) Translate and then add more genes from a new sequence
 
   W) Translate assembly and write everything into a file
 
 Please choose one (* W *):  a
 
                  Begin (* 1 *) ?  2393
                End (* 11375 *) ?  2615
               Reverse (* No *) ?
 
 Range begins GCTCC and ends TCAAG.  Is this correct (* Yes *) ?
 
 That is done, now would you like to:
 
   A) Add another exon from this sequence
   B) Add another exon from a new sequence
 
   C) Translate and then add more genes from this sequence
   D) Translate and then add more genes from a new sequence
 
   W) Translate assembly and write everything into a file
 
 Please choose one (* W *):  a
 
                  Begin (* 1 *) ?  3502
                End (* 11375 *) ?  3630
               Reverse (* No *) ?
 
 Range begins CTCCT and ends ACTGA.  Is this correct (* Yes *) ?
 
 That is done, now would you like to:
 
   A) Add another exon from this sequence
   B) Add another exon from a new sequence
 
   C) Translate and then add more genes from this sequence
   D) Translate and then add more genes from a new sequence
 
   W) Translate assembly and write everything into a file
 
 Please choose one (* W *):
 
 What should I call the output file (* gamma.pep *) ?  ggamma.pep
 
%

OUTPUT

[ Previous | Top | Next ]

Here is the output file ggamma.pep:

 
 
!!AA_SEQUENCE 1.0
TRANSLATE of: gamma.seq check: 6474 from: 2179 to: 2270
      and of: gamma.seq check: 6474 from: 2393 to: 2615
      and of: gamma.seq check: 6474 from: 3502 to: 3630
generated symbols 1 to: 148.
 
Human fetal beta globins G and A gamma
from Shen, Slightom and Smithies,  Cell 26; 191-203.
Analyzed by Smithies et al. Cell 26; 345-353.
 
ggamma.pep  Length: 148  October 5, 1998 12:59  Type: P  Check: 6924  ..
 
       1  MGHFTEEDKA TITSLWGKVN VEDAGGETLG RLLVVYPWTQ RFFDSFGNLS
 
      51  SASAIMGNPK VKAHGKKVLT SLGDAIKHLD DLKGTFAQLS ELHCDKLHVD
 
     101  PENFKLLGNV LVTVLAIHFG KEFTPEVQAS WQKMVTGVAS ALSSRYH*

INPUT FILES

[ Previous | Top | Next ]

Translate accepts multiple (one or more) nucleotide sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. If Translate rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

Single Sequence Input

If you specify a single sequence on the command line or in response to the first program prompt, and -Default is not on the command line, Translate prompts you for the sequence range and strand. After reading that range, the program prompts you for other ranges in the same or a different sequence or to translate the ranges either before or after assembling them.

Use -Default to translate sequences without prompting you, in accordance with any command-line parameters that are present.

Multiple Sequence Input

When you specify multiple sequences, Translate runs noninteractively. By default, Translate will translate each sequence separately and write out each translation to a separate sequence file without prompting you for the range and strand of each sequence. Use -ONEPEPtide to assemble all of the sequences first and then translate them into a single protein sequence.

If you use a list file to specify multiple sequences as input, you can add begin, end, and strand attributes for each sequence. You can use the join sequence attribute to selectively assemble some of the sequence entries in a list file together before translation. All sequences listed contiguously in the list file that share the same join attribute (i.e. share the same sequence name following the join token) are assembled together before translation and the translated sequence is given the name of the join attribute. All other sequences in the list file are translated separately. Here is an example of an input list file, hsp70dna.list, for Translate.

 
 
!!SEQUENCE_LIST 1.0
Example list file of 70kD heat shock coding sequences used as input
for TRANSLATE
  ..
gb_in:tchsp70    Begin:  302 End: 2341
gb_pl1:phhsp70g  Begin:  240 End:  453 Join: hsp70_petunia
gb_pl1:phhsp70g  Begin: 1076 End: 2817 Join: hsp70_petunia
gb_pr:humhsp70   Begin:  228 End:  968

Using this file as input, Translate writes three output files. The first output file contains a translation of the first sequence entry in this list file. The second output file, hsp70_petunia.pep, is a translation of an assembly of the next two sequence entries. The last output file contains a translation of the last sequence entry in this list file. For more information about list files, see "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide.

RELATED PROGRAMS

[ Previous | Top | Next ]

SeqManip+, Map, and Map+ are related programs.

RESTRICTIONS

[ Previous | Top | Next ]

Unknown.

CONSIDERATIONS

[ Previous | Top | Next ]

Translate allows you to translate sequences where the reading frame is interrupted. This frame-interruption commonly occurs in eukaryotic sequences containing introns. In the example above, a single codon is divided by the first intron. To accommodate frame interruption, Translate allows you to specify ranges to translate (exons) that are not an even multiple of three in length. Translate concatenates the nucleotide ranges that you define and translates them only at the moment you choose a menu item that starts with the word Translate.

If you continue after translating an assembly, you are in effect building a new assembly (gene) and concatenating the protein sequence from the new gene onto the protein sequence you have already created.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % translate [-INfile=]@hsp70dna.list -Default
 
Prompted Parameters:
 
[-OUTfile=]hsp70.pep       names the output file (single sequences only)
 
Local Data Files:
 
-TRANSlate=translate.txt   contains the genetic code
 
Optional Parameters:
 
-BEGin=1 -END=100          sets the range of interest
-REVerse                   specifies the strand for each sequence
-ONEPEPtide                translates all concatenatated DNA fragments into
                             a single peptide
-NOJOIN                    ignores all "join" sequence attributes
                             specified in a list file
-LIStfile[=translate.list] writes a list file of output sequence names
-RSF                       specifies RSF format for output file
-OPEn[=20]                 only translates open reading frames [minimum
                              peptide length]
-EXTension=.pep            sets the file name extension for output
                             sequence files
-NOMONitor                 suppresses the screen monitor
-NOSUMmary                 suppresses the summary at the end of the program

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-BEGin=1

Sets the beginning position for all input sequences. When the beginning position is set from the command line, Translate ignores beginning positions specified for individual sequences in a list file.

-END=100

Sets the ending position for all input sequences. When the ending position is set from the command line, Translate ignores ending positions specified for sequences in a list file.

-REVerse

Sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, Translate ignores any strand designation for individual sequences in a list file.

-ONEPEPtide

Concatenates all input sequences together and then translates them all into a single protein sequence.

-NOJOIN

Sets Translate to ignore all join sequence attributes specified in the input list file. All nucleotide sequences specified in the list file are translated into separate output sequence files.

-LIStfile=translate.list

Writes a list file with the names of the output sequence files. This list file is suitable for input to other GCG programs that support list files (see Section 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Translate makes one up using translate for the file name and .list for the file name extension.

-EXTension=.pep

This program normally creates output file names by using the original input file name for the base name and the program name for the name extension. Use this parameter to specify some other file name extension.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: May 27, 2005 14:56


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio