COMPTABLE

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

CONSIDERATIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[Top | Next ]

CompTable creates a scoring matrix using equivalences defined in a simplification scheme such as the one used for Simplify. (See the Section 4, Using Data Files in the User's Guide for more information.)

DESCRIPTION

[ Previous | Top | Next ]

Scientists comparing protein sequences sometimes want to consider similar amino acids as equivalent. Sequence simplification can be done either by changing the symbols in the sequences being compared (see Simplify) or, for programs that use scoring matrices, by creating a table that scores matches between the symbols you consider to be equivalent.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using CompTable to make a scoring matrix with the standard simplification file used by Simplify (you can use Fetch to make a copy of simplify.txt and modify it to create the input file for CompTable):

 
% comptable
 
 COMPTABLE from what simplification file ?  simplify.txt
 
 What is the comparison match value (* 10 *) ?
 
 What is the comparison mismatch value (* -2 *) ?  0
 
 Are you creating a protein scoring matrix (* Yes *) ?
 
 What should I call the output file (* simplify.cmp *) ?
 
%

OUTPUT

[ Previous | Top | Next ]

Here is part of the output scoring matrix file:

 
!!AA_SCORING_MATRIX_RECT 1.0
 COMPTABLE of: simplify.txt  FileCheck: 327
 
A standard simplification used by SIMPLIFY and WORDSEARCH to simplify
peptide sequences.  The first line below means "for all of the P, A, G,
S, or T characters in the sequence, substitute A." The program COMPTABLE
can construct a symbol comparison table with the equivalences from this
file.
 
                     August 18, 1998 12:19   ..
 
{
GAP_CREATE 20
GAP_EXTEND 1
}
 
      A    B    C    D    E     F    G    H    I    J     K    L  ...  ..
A    10    0    0    0    0     0   10    0    0    0     0    0  ...
B     0   10    0   10   10     0    0    0    0    0     0    0  ...
C     0    0   10    0    0     0    0    0    0    0     0    0  ...
D     0   10    0   10   10     0    0    0    0    0     0    0  ...
E     0   10    0   10   10     0    0    0    0    0     0    0  ...

See Appendix VII for more information about scoring matrices.

INPUT FILES

[ Previous | Top | Next ]

CompTable accepts a simplification table file as input. Here is the input file for the example above:

 
!!SIMPLIFY 1.0
A standard simplification used by SIMPLIFY and WORDSEARCH to simplify
peptide sequences.  The first line below means "for all of the P, A, G,
S, or T characters in the sequence, substitute A." The program COMPTABLE
can construct a symbol comparison table with the equivalences from this
file.
 
10/7/84 ..
 
A PAGST
D QNEDBZ
H HKR
I LIVM
F FYW
C C

RELATED PROGRAMS

[ Previous | Top | Next ]

Simplify simplifies a sequence file with the simplifications from a simplification table.

CONSIDERATIONS

[ Previous | Top | Next ]

CompTable calculates default gap creation and extension penalties to write in the auxiliary data block in the output scoring matrix file that are appropriate for the type of scoring matrix you are creating (protein or nucleotide ) and for the comparison match and mismatch values that you specify. You can use -GAPweight and -LENgthweight to specify alternative gap penalties if you don't want to accept the default values.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

COMPTABLE does not support complete command-line control.
 
Required Parameters:
 
-PROtein or -NUCleotide  specifies the type of the scoring matrix
 
Local Data Files:        None
 
Optional Parameters:
 
-GAPweight=50            sets the gap creation penalty
-LENgthweight=3          sets the gap extension penalty

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-PROtein or -NUCleotide


Specifies the type of scoring matrix that will be created.

-GAPweight


Specifies the default gap creation penalty associated with the scoring matrix. This penalty is written in the auxiliary data block in the output scoring matrix file. If you don't specify a default gap creation penalty with -GAPweight, the program calculates a reasonable default and writes it in the auxiliary data block. (See "Auxillary Data Block: Setting Gap Creation and Extension Penalties" in Appendix VII for information about the auxiliary data block in scoring matrix files.)

-LENgthweight


Specifies the default gap extension penalty associated with the scoring matrix. This penalty is written in the auxiliary data block in the output scoring matrix file. If you don't specify a default gap extension penalty with -LENgthweight, the program calculates a reasonable default and writes it in the auxiliary data block. (See "Auxillary Data Block: Setting Gap Creation and Extension Penalties" in Appendix VII for information about the auxiliary data block in scoring matrix files.)

Printed: May 27, 2005  11:58


[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.co, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio