SIMPLIFY

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

INPUT FILES

RELATED PROGRAMS

SIMPLIFICATION FILE

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

Simplify lets you reduce the number of symbols in a sequence. Such a simplification would allow you, for instance, to treat all hydrophobic amino acids as equivalent.

DESCRIPTION

[ Previous | Top | Next ]

Scientists searching for the basic design features in protein sequences believe that there may be functionally similar amino acids that can be substituted without causing radical changes in the function of the protein. Therefore, it may be useful to treat some amino acids as equivalent in peptide sequence comparisons. The simplifications below are from Dr. Miguel A. Jimenez-Montano, who worked with Dr. Hugo Martinez at the University of California in San Francisco, and is now at Univ. de las Americas-Puebla (Mexico). You can determine your own simplification by changing the local data file simplify.txt. Here are the default simplifications in the public data file.

 
 
           A  =  P,A,G,S,T    (neutral, weakly hydrophobic)
           D  =  Q,N,E,D,B,Z  (hydrophilic, acid amine)
           H  =  H,K,R        (hydrophilic, basic)
           I  =  L,I,V,M      (hydrophobic)
           F  =  F,Y,W        (hydrophobic, aromatic)
           C  =  C            (cross-link forming)
           All other characters are unchanged.

The simplify.txt file in the public data directory is only appropriate for simplifying peptide sequences. You must create your own simplify.txt file to define equivalences for nucleic acid simplifications.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Simplify to make a simplification of gzeinaa.pep:

 
 
% simplify
 
  SIMPLIFY what sequence(s) ?  gzeinaa.pep
 
               Begin (* 1 *) ?  18
             End (*   285 *) ?  243
 
  What should I call the output file (* gzeinaa.sim *) ?
 
%

INPUT FILES

[ Previous | Top | Next ]

Simplify accepts a single sequence or multiple sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. The function of Simplify depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

CompTable writes a scoring matrix based on the simplifications from a simplification file like simplify.txt. You can assign match and mismatch values.

SIMPLIFICATION FILE

[ Previous | Top | Next ]

You can use Fetch to make a copy of simplify.txt in your own directory, and then modify it with an editor to suit your own needs. Here is the default version:

 
 
!!SIMPLIFY 1.0
A standard simplification used by SIMPLIFY and WORDSEARCH to simplify
peptide sequences.  The first line below means "for all of the P, A, G,
S, or T characters in the sequence, substitute A." The program COMPTABLE
can construct a symbol comparison table with the equivalences from this
file.
 
10/7/84 ..
 
A PAGST
D QNEDBZ
H HKR
I LIVM
F FYW
C C

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % simplify [-INfile=]ggamma.pep -Default
 
Prompted Parameters: (for a single sequence)
 
-BEGin=1 -END=444          sets the range of interest
[-OUTfile=]ggamma.sim      names the output file
 
Local Data Files:
 
-DATa=simplify.txt         specifies a file of equivalences
 
Optional Parameters:
 
-EXTension=.sim            sets the default output file name extension
-LIStfile[=simplify.list]  writes a list file of output sequence names
-NOMONitor                 suppresses the screen trace
 
The default simplification is as follows:
 
           A  =  P,A,G,S,T    (neutral, weakly hydrophobic)
           D  =  Q,N,E,D,B,Z  (hydrophilic, acid amine)
           H  =  H,K,R        (hydrophilic, basic)
           I  =  L,I,V,M      (hydrophobic)
           F  =  F,Y,W        (hydrophobic, aromatic)
           C  =  C            (cross-link forming)
           All other characters are unchanged.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

Simplify reads the file simplify.txt to find the equivalences you desire. The first letter in each equivalence row is the letter that is substituted for all of the rest of the letters in the row.

The simplify.txt file in the public data directory is only appropriate for simplifying peptide sequences. You must create your own simplify.txt file to define equivalences for nucleic acid simplifications.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-EXTension=.sim

Sets the default file output file name extension.

-LIStfile=simplify.list

Writes a list file with the names of the output sequence files. This list file is suitable for input to other Accelrys GCG (GCG) programs that support list files (see Section 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Simplify makes one up using simplify for the file name and .list for the file name extension.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: May 27, 2005 14:43 


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio