CODONFREQUENCY

[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

FILES USED

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

CodonFrequency tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodonPreference, Correspond, and Frames programs.

DESCRIPTION

[ Previous | Top | Next ]

CodonFrequency counts codons and writes their frequencies into codon frequency tables. It counts the codons from ranges within sequences or existing codon frequency tables. The output table is a file with the sum of all the observations for each of the 64 possible codons. This file is suitable for input to other GCG programs, including BackTranslate, CodonPreference, and Correspond.

CodonFrequency supports the assembly of fragments from circular molecules by letting you define a range in the sequence that extends across the end and into the beginning of a molecule. The terminal bell rings when a circular range is chosen.

To count codons from sequences, specify ranges until you have assembled a sequence you want to count. For each range, CodonFrequency shows you the starting and ending symbols to double check that you have chosen the range and strand accurately.

After choosing each range, you must decide if you would like to add another exon to the gene or count the codons in the gene you have assembled. It is critical that you count the codon frequencies from multi-exon genes after assembling all the ranges since intervening sequences often interrupt such genes within a codon, thus destroying the reading frame.

After CodonFrequency counts all the codons in your gene, you may specify another gene from the current sequence file or get other sequence files or codon frequency tables.

You can specify multiple sequences (such as a list file or sequence specification using an asterisk (*) wildcard) to count codons from more than one sequence at a time. By default, each sequence in a multiple sequence specification is treated as a separate gene and counted separately by CodonFrequency. If you add the -ONEPEPtide command-line parameter, then all sequences in a multiple sequence specification are concatentated together into a single sequence before counting codons. If you use a list file to specify multiple sequences, you can add Begin, End, and Strand sequence attributes to specify the range and strand for each sequence. For more information about list files, see "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide.

After each sequence range is counted or each new codon frequency table is read, CodonFrequency asks if you want to write the data to a file. If you choose to do so, the program writes a file with the number of observations for each codon. In addition, CodonFrequency normalizes the codon observations to a frequency per thousand and to a fraction for each codon within its synonymous family.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using CodonFrequency to generate a codon frequency table for the human fetal beta globin gene G-Gamma:

 
 
% codonfrequency
 
 You can add codon frequencies from either:
 
        E)xisting codon usage files
        S)equence files
 
 Please select one (* S *):
 
 Count codons from what sequence ?  gamma.seq
 
                  Begin (* 1 *) ?  2179
                End (* 11375 *) ?  2270
               Reverse (* No *) ?
 
 That begins ATGGG and ends GGAAG.  Is this correct (* Yes *) ?
 
 Get another exon from this gene (* No *) ?  y
 
                  Begin (* 1 *) ?  2393
                End (* 11375 *) ?  2615
               Reverse (* No *) ?
 
 That begins GCTCC and ends TCAAG.  Is this correct (* Yes *) ?
 
 Get another exon from this gene (* No *) ?  y
 
                  Begin (* 1 *) ?  3502
                End (* 11375 *) ?  3630
               Reverse (* No *) ?
 
 That begins CTCCT and ends ACTGA.  Is this correct (* Yes *) ?
 
 Get another exon from this gene (* No *) ?
 
 That's done, now would you like to:
 
     1) Get a new sequence input file
     2) Get a new codon table input file
     3) Specify another gene from this sequence file
 
     W)rite the frequencies to your output file
 
 Please choose one (* W *):
 
 What should I call the output file (* gamma.cod *) ?  ggammacod.cod
 
%

OUTPUT

[ Previous | Top | Next ]

Note that the multi-exon G-Gamma gene is assembled from three exons before the codons are counted! Here is part of the output file:

 
 
!!CODON_FREQUENCY 1.0
 
 CODONFREQUENCY  October 13, 1998 15:56
 
From          : gamma.seq  check: 6474  from: 2179  to: 2270
 continuing on: gamma.seq  check: 6474  from: 2393  to: 2615
 continuing on: gamma.seq  check: 6474  from: 3502  to: 3630
 
Human fetal beta globins G and A gamma
from Shen, Slightom and Smithies,  Cell 26; 191-203.
Analyzed by Smithies et al. Cell 26; 345-353.
 
AmAcid  Codon      Number    /1000     Fraction   ..
 
Gly     GGG         0.00      0.00      0.00
Gly     GGA         6.00     40.54      0.46
Gly     GGT         1.00      6.76      0.08
Gly     GGC         6.00     40.54      0.46
 
Glu     GAG         4.00     27.03      0.50
Glu     GAA         4.00     27.03      0.50
Asp     GAT         5.00     33.78      0.62
Asp     GAC         3.00     20.27      0.38
 
////////////////////////////////////////////
 
Leu     CTG        12.00     81.08      0.71
Leu     CTA         0.00      0.00      0.00
Leu     CTT         0.00      0.00      0.00
Leu     CTC         3.00     20.27      0.18
 
Pro     CCG         0.00      0.00      0.00
Pro     CCA         1.00      6.76      0.25
Pro     CCT         2.00     13.51      0.50
Pro     CCC         1.00      6.76      0.25

INPUT FILES

[ Previous | Top | Next ]

CodonFrequency can accept as input existing codon usage files created either by a previous CodonFrequency session or by hand using a text editor (see the FILES USED topic for information on the format of this type of file). Only a single file can be specified on the command line using the -CODonfile=ecohigh.cod parameter. When the program is run interactively, more than one file can be used, but the file names must be entered one at a time when the program prompts for them.

CodonFrequency can also accept a single or multiple nucleotide sequence specification. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. If CodonFrequency rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

BackTranslate, CodonPreference, and Correspond need to have codon frequency tables like the ones written by CodonFrequency as input.

RESTRICTIONS

[ Previous | Top | Next ]

Unknown. CodonFrequency reads the third column of data in existing codon usage files. This column should not have normalized data in it (e.g., percentages) if you plan to add it to data that has not been normalized. Look at the file structure in the example output and under the FILES USED topic below.

FILES USED

[ Previous | Top | Next ]

Existing codon usage files may be written by CodonFrequency or generated by hand. If you write a codon table yourself, it should be documented with text followed by a line with two adjacent periods. Below this heading and dividing line, write the data with the first three columns of information as in the output file shown above. The lines can be in any order and only codons whose use is greater than zero need be present. The spacing of the columns is not significant and blank lines are allowed. After creating a codon table with the first three columns of information, you should generate the complete codon usage table (five columns of information) by using the table you created as the input to CodonFrequency.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % codonfrequency [-INfile=]@hsp70dna.list -Default
 
Prompted Parameters:
 
[-OUTfile1=]hsp70.cod     specifies the output file name
 
Local Data Files:
 
-TRANSlate=translate.txt  contains the genetic code
 
Optional Parameters:
 
-CODonfile=ecohigh.cod  specifies the input file of codon frequencies
-BEGin=1 -END=100       sets the range of interest for each sequence
                          (non-interactive mode only)
-REVerse                specifies the strand for each sequence
                          (non-interactive mode only)
-ONEPEPtide             concatenates all DNA fragments in a multiple
                          sequence specification before processing
-CONtinue               makes CODONFREQUENCY continue after you write
                          out the file
-NOMONitor              suppresses screen trace for each input sequence

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-CODonfile=ecohigh.cod

Specifies an existing codon frequency file to be used as input to CodonFrequency.

-BEGin=1

Sets the beginning position for all input sequences. When the beginning position is set from the command line, CodonFrequency ignores beginning positions specified for individual sequences in a list file. CodonFrequency recognizes -BEGin only when more than one input sequence is specified or when -Default is on the command line.

-END=100

Sets the ending position for all input sequences. When the ending position is set from the command line, CodonFrequency ignores ending positions specified for sequences in a list file. CodonFrequency recognizes -END only when more than one input sequence is specified or when -Default is on the command line.

-REVerse

Sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, CodonFrequency ignores any strand designation for individual sequences in a list file. CodonFrequency recognizes -REVerse and -NOREVerse only when more than one input sequence is specified or when -Default is on the command line.

-ONEPEPtide

Concatenates all input sequences in a multiple sequence specification together before processing.

-CONtinue

Causes CodonFrequency to loop back to the beginning of the program after you write the output file. If you leave the file specification blank when the program loops back to the top, CodonFrequency stops.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: May 27, 2005  11:53 


[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio