PEPTIDESORT

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

CONSIDERATIONS

RESTRICTIONS

SELECTING ENZYMES

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

PeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by position, putative molecular weight, and relative HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.

DESCRIPTION

[ Previous | Top | Next ]

PeptideSort cuts a peptide sequence with any or all of the proteolytic enzymes and reagents listed in the public or local data file proenzall.dat. The peptides from each digest are sorted by position, weight, and retention time in a high-pressure liquid chromatograph at pH 2.1. For each peptide in each sorting, the following data are displayed: beginning and ending positions, molecular weight, HPLC retention at pH 2.1, HPLC retention at pH 7.4, charge, number of aromatic residues, number of acidic residues, number of basic residues, number of residues containing sulfur, number of hydrophilic residues, and number of hydrophobic residues. The content, isoelectric point, and molar extinction coefficient at 280 nm of each peptide are shown with the table of peptides sorted by position. The content can be displayed in the order of expected elution from an amino acid analyzer.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using PeptideSort to sort the tryptic peptides from the corn storage protein sequence in the file gzeinaa.pep:

 
 
% peptidesort
 
  PEPTIDESORT of what sequence ?  gzeinaa.pep
 
              Begin (* 1 *) ?  18
            End (*   283 *) ?  243
 
 Select the enzymes:  Type nothing or "*" to get all enzymes. Type "?"
 for help on which enzymes are available and how to select them.
 
 
                             Enzyme (* * *):  trypsin
 
  Trypsin
 
  "TRYPSIN" selected 1 enzyme, new total: 1.  Enzyme:
 
  What should I call the output file (* gzeinaa.pepsort *) ?
 
%

OUTPUT

[ Previous | Top | Next ]

Here is the output file:

 
 
 PEPTIDESORT of: gzeinaa.pep  check: 2106  from: 18  to: 243
 
Corn Storage Protein Am. Ac. (19,000, genomic)
extracted from GZEIN.SEQ, checksum 2842, row a
 
 With Enzymes: TRYPSIN
 
                         October 5, 1998 10:42  ..
 
               Digest with: Trypsin.  Peptides Sorted by Position
 
 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   A7,C2,E1,F1,G1,I3,L8,M2,N1,P7,Q2,R1,S8,T1,V1,Y2 Iso=6.11 Ext=2800
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   A5,F1,G1,H1,I4,L10,N1,P3,Q7,R1,S3,V1 Iso=10.53 Ext=0
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   A11,F3,L11,N3,P3,Q14,R1,S3,V1,Y3 Iso=9.50 Ext=3840
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49
   A12,D1,F8,G3,H2,I2,L18,N4,P8,Q16,S3,T4,V3,Y3 Iso=6.50 Ext=3840
 
               Digest with: Trypsin.  Peptides Sorted by Weight
 
 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49
 
               Digest with: Trypsin.  Peptides Sorted by Retention
 
 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49
 
     Summary for whole sequence:
 
Molecular weight =   24583.35     Residues =    226
Average Residue Weight = 108.776     Charged =    1
Isoelectric point =  8.12
Extinction coefficient =  10360
 
Residue           Number      Mole Percent     ..
 
A = Ala            35           15.487
B = Asx             0            0.000
C = Cys             2            0.885
D = Asp             1            0.442
E = Glu             1            0.442
F = Phe            13            5.752
G = Gly             5            2.212
H = His             3            1.327
I = Ile             9            3.982
K = Lys             0            0.000
L = Leu            47           20.796
M = Met             2            0.885
N = Asn             9            3.982
P = Pro            21            9.292
Q = Gln            39           17.257
R = Arg             3            1.327
S = Ser            17            7.522
T = Thr             5            2.212
V = Val             6            2.655
W = Trp             0            0.000
Y = Tyr             8            3.540
Z = Glx             0            0.000
 
A + G              40           17.699
S + T              22            9.735
D + E               2            0.885
D + E + N +  Q     50           22.124
H + K + R           6            2.655
D + E + H + K + R   8            3.540
I + L + M + V      64           28.319
F + W + Y          21            9.292
 
 Enzymes that do cut:
 
  Trypsin
 
 Enzymes that do not cut:
 
   NONE

INPUT FILES

[ Previous | Top | Next ]

PeptideSort accepts a single protein sequence as input. If PeptideSort rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

PeptideMap creates a peptide map with an output format similar to the DNA restriction maps. Isoelectric plots the charge as a function of pH for any peptide sequence.

CONSIDERATIONS

[ Previous | Top | Next ]

The algorithm used by PeptideSort to estimate HPLC retention times (Meek, Proc. Natl. Acad. Sci. USA 77; 1632 (1980)) is based on the assumption that the retention of a peptide correlates to its amino acid composition. This assumption holds for peptides of up to about 20 amino acids, but steric and conformational factors can affect the retention of longer peptides. Retention times calculated by PeptideSort for peptides longer than 20 amino acids should not be considered accurate.

The formula for estimating the retention time is the sum of the retention coefficients for the amino acids in the peptide, plus the coefficients for the end groups, plus a value t0, which is the time for elution of unretained compounds. The retention time reported by PeptideSort does not include the t0 value. You will have to determine this time for your HPLC system and add it to the reported times.

Meek's paper does not report retention coefficients for cysteine, only for cystine. PeptideSort assumes that these are the same. Therefore the estimated retention time for a peptide containing cysteines may be inaccurate.

The retention times reported by PeptideSort should be regarded as estimates, since the actual retention times can vary according to the elution conditions. Meek's retention coefficients were determined empirically using a linear gradient of acetonitrile, starting at 0% at 0 min and increasing to 60% at 80 min (0.75% per min). Increasing the gradient rate to 1.5% acetonitrile per min resulted in retention times that were 70 percent of normal. Decreasing the gradient rate to 0.5% per min resulted in retention times that were 120 percent of normal. Meek also noted minor differences in relative retention rates with columns made by different manufacturers.

RESTRICTIONS

[ Previous | Top | Next ]

A digest may not produce more than 1,000 peptides. If you choose all enzymes by typing * to the prompt Select enzymes: and your protein sequence is over 500 residues long, there may be a great deal of output. Remember to delete the output file when you are finished looking at the data to free disk space.

SELECTING ENZYMES

[ Previous | Top | Next ]

The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. To get help with selecting enzymes, type a ? at the enzyme prompt. Here is what you see:

 
 
Select enzymes:
 
Type "*" to select all enzymes.
Type "**" to select all enzymes including isoschizomers.
Type individual names like "AluI" to select specific enzymes.
Type "?" to see this message and all available enzymes.
Type "??" to see the available enzymes AND their recognition sites.
Type "?A*" to see what enzymes start with "A."
Type "A*" to select all enzymes starting with "A."
Type parts of names like "Al*" to select all enzymes starting with "AL."
Type "~A*" to unselect all selected enzymes starting with "A."
Type "/*" to see what enzymes you have selected so far.
Type "#" to select no enzymes at all.
 
Press <Return> after each selection.
Press <Return> and nothing else to end your selections.
Spaces are allowed; upper and lower case are equivalent.

We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.

There is more information on enzyme files in Appendix VII.

A command-line expression like -ENZymes=AluI,EcoRII would choose AluI and EcoRII and suppress interactive enzyme selection.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % peptidesort [-INfile=]gzeinaa.pep -Default
 
Prompted Parameters:
 
-BEGin=18 -END=243           sets the range of interest
-ENZymes=*[,...]             selects enzymes of interest
[-OUTfile=]gzeinaa.pepsort   names the output file
 
Local Data Files:
 
-DATa1=proenzall.dat     contains enzyme data
-DATa2=aminoacid.dat     contains amino acid data
-DATa3=isoelectric.dat   contains amino acid pK data
-DATa4=extinctcoef.dat   contains extinction coefficient data
 
Optional parameters:
 
-7                       sorts on HPLC retention at pH 7.4 instead of pH 2.1
-MINCuts=2               shows only enzymes that cut at least 2 times
-MAXCuts=4               shows only enzymes that cut less than 4 times
-ELUtion[=dneqsghrtapyvmcilfkw]    sets the order of the composition display
-SHOwseq                 shows peptide fragments in the output

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

PeptideSort needs for data files that can be either local or public. proenzall.dat (see Appendix VII) contains information about the enzymes and proteolytic reagents. aminoacid.dat has information on the physical properties of the amino acids. isoelectric.dat contains pK values for the relevant amino acids. extinctcoef.dat contains extinction coefficient data for the amino acids.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-ENZymes=*[,...]

Specifies the enzymes of interest.

-7

Causes PeptideSort to sort each digest on HPLC retention at pH 7.4 instead of on HPLC retention at pH 2.1 (default).

-MINCuts=n

Excludes enzymes that do not cut at least n times.

-MAXCuts=n

Excludes enzymes that cut more than n times.

-ELUtion=DNEQSGHRTAPYVMCILFKW

Sets the order for the composition data display. If you use the -ELUtion parameter without the optional value, the order is changed from alphabetical to DNE... as expected from the Waters analyzer.

-SHOwseq

Displays the peptide fragments, sorted by position, in the table of cleavage products.

Printed: May 27, 2005 13:57


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio