COILSCAN

[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

ALGORITHM

CONSIDERATIONS

SUGGESTIONS

GRAPHICS

<CTRL>C

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

CoilScan locates coiled-coil segments in protein sequences.

DESCRIPTION

[ Previous | Top | Next ]

CoilScan uses the method of Lupas, et al. to find coiled-coil segments in protein sequences by comparing each residue in the sequence to a weight matrix tabulated from known coiled-coil protein segments. A coiled-coil probability is calculated for each residue in the protein, and those segments whose probabilities meet or exceed a threshold probability you set are reported in the output. This prediction method works only for solvent exposed coiled coils, particularly for parallel and antiparallel two-stranded coiled coils and for parallel three-stranded coiled coils.

EXAMPLE

[ Previous | Top | Next ]

Here is a session with CoilScan+ that was used to find coiled-coil segments in the GCN4 protein of Saccharomyces cerevisiae.

 
% coilscan
 
 CoilScan with what sequence(s) ? PIR:Rgbya2
 
                  Begin (* 1 *) ?
                End (*   281 *) ?
 
 Predict coiled-coil segments using what window size:
 
  a) 28
  b) 28 with weighting
  c) 21
  d) 21 with weighting
  e) 14
  f) 14 with weighting
 
 Please choose one (* a *):
 
 What minimum probability predicts a coiled-coil residue (* 0.50 *) ?
 
 This program can plot the coiled-coil probability at each sequence position.
 Do you want to:
 
     A) Plot to a FIGURE file called "rgbya2.figure"
     B) Plot graphics on COLORWORKSTATION attached to GCGGRAPHICS
     C) Suppress the plot
 
 Please choose one (* A *):
 
 What should I call the output file (* rgbya2.coilscan *) ?
 
 PIR1:RGBYA2                         found!
 FIGURE instructions are now being written into rgbya2.figure.
 
                Total sequences searched: 1
   Sequences with predicted coiled coils: 1
                          CPU time (sec): 0.42
 
%

OUTPUT

[ Previous | Top | Next ]

Here is the output file:

 
 
COILSCAN of: pir1:rgbya2 Check 5404 from: 1 to 281   September 28, 1998 15:02
 
P1;RGBYA2 - amino acid biosynthesis regulatory protein - yeast (Saccharomyces cerevisiae)
N;Alternate names: protein YEL009c
C;Species: Saccharomyces cerevisiae
C;Date: 27-Nov-1985 #sequence_revision 27-Nov-1985 #text_change 12-Dec-1997
C;Accession: A03605; S50450; A03604
R;Hinnebusch, A.G. . . .
 
 Window: 28
 Coiled-coil segments predicted at a minimum probability of 0.50
 Weight matrix: GenRunData:mtidkcoils.dat
 
   231 KRARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER 281
      Probability: 1.00

The CoilScan output is a text file containing a list of those protein segments whose residue coiled-coil probabilities all exceed the threshold you set. The highest probability in each segment is reported below the sequence of the coiled-coil segment.

If you specify multiple protein sequences as input (see the INPUT FILES topic), CoilScan writes a separate text output file for each input sequence. Each output file is named after the input sequence and given the .coilscan file name extension.

By default, CoilScan creates a plot of the coiled-coil probability at each position in the sequence. The threshold probability you set is drawn as a dashed horizontal line on the plot. You can use the plot to determine the coiled-coil probability for each residue and perhaps adjust the threshold probability used to predict coiled-coil regions. Here is the plot of coiled-coil probability as a function of sequence position from the example session:

If you specify multiple protein sequences as input (see the INPUT FILES topic) or you use either -BATch or -Default, the plot of probability as a function of sequence position for each sequence is written to its own Figure file. Each Figure file is named after the input sequence and given the .figure file name extension. You can then use the Figure program to display any of the residue probability plots on the supported graphics device of your choice.

If you use -TABle, the program writes a text output file with the coiled-coil probability calculated for each residue in the sequence. Along with the probability, the table output file lists the structural position in the coiled-coil heptad repeat (position a through g) that is associated with each residue prediction. Those residues whose probabilities meet or exceed the threshold you set are marked by asterisks (*) in the table output file.

If you specify multiple protein sequences as input (see the INPUT FILES) topic, CoilScan writes a separate table output file for each input sequence. Each table file is named after the input sequence and given the .table file name extension.

INPUT FILES

[ Previous | Top | Next ]

The input to CoilScan is one or more protein sequences. If CoilScan rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*.

If you use a list file to specify a group of input sequences, you can add begin and end sequence attributes to specify a range for each sequence to scan. If a sequence range is shorter than the window size selected with -WINdow or in response to the program prompt, that sequence is skipped.

RELATED PROGRAMS

[ Previous | Top | Next ]

Moment makes a contour plot of the helical hydrophobic moment of a peptide sequence. Motifs looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds. ProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences. HTHScan scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation. SPScan scans protein sequences for the presence of secretory signal peptides (SPs). PepPlot plots measures of protein secondary structure and hydrophobicity in parallel panels of the same plot. PeptideStructure makes secondary structure predictions for a peptide sequence. The predictions include (in addition to alpha, beta, coil, and turn) measures for antigenicity, flexibility, hydrophobicity, and surface probability. PlotStructure displays the predictions graphically. PlotStructure plots the measures of protein secondary structure in the output file from PeptideStructure. The measures can be shown on parallel panels of a graph or with a two-dimensional "squiggly" representation. HTHScan+ scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation. SPScan+ scans protein sequences for the presence of secretory signal peptides (SPs).

CoilScan+ locates coiled-coil segments in protein sequences.

ALGORITHM

[ Previous | Top | Next ]

Coiled coils are bundles of two or more alpha helices that are supercoiled together. Each alpha helix in a coiled coil is strongly amphipathic, and the pattern of hydrophilic and hydrophobic amino acids repeats every seven residues. If the seven positions in the coiled-coil heptad repeat are labeled a through g, positions a and d are hydrophobic and form the helix interface, while positions b, c, e, f, and g are hydrophilic and are solvent-exposed.

CoilScan uses the method of described by Lupas, Van Dyke, and Stock (Science 252; 1162-1164 (1991)) and Lupas (In Methods in Enzymology 266; 513-525 (1996)) to predict coiled coils in protein sequences. The frequency of each amino acid at each position in the heptad repeat was tabulated from a database of coiled-coil sequences. Each amino acid frequency was divided by the corresponding frequency of that amino acid in all proteins encoded in GenBank to produce a weight matrix containing values for each amino acid at each position in the heptad repeat. If the weight matrix value for an amino acid at a position in the heptad repeat is greater than 1.0, that amino acid is found at a greater than random frequency at that position in a coiled coil. Conversely, if the value for an amino acid at a position in the heptad repeat is less than 1.0, that amino acid is found at a less than random frequency at that position.

The input sequence is compared to the weight matrix using a sliding window along the sequence. For example, using the default window length of 28, the first residue in the window is assumed to be the a position in the heptad repeat and all other residues in that window assume sequential repeat positions (i.e. bcdefgabcde...). The geometric average of all of the individual weight matrix values under the window is determined. Next, the first residue in the window is assumed to be in the b position in the heptad repeat, and the average window score is determined again. After determining an average window score for each of the heptad repeat frames, the maximum average window score is saved. The window is then shifted one residue along the sequence and the entire process is repeated in this new window. The final coiled-coil score for each residue in determined as the maximum of all of the average scores for all of the windows that include the residue.

After determining a coiled-coil score for each residue in the sequence, a coiled-coil probability is calculated for each residue using those scores. This probability is based on an empirically determined normal distribution of scores for residues in coiled-coil domains, a separate empirically determined normal distribution of scores for residues in all globular (non-coiled-coil) proteins, and an estimated 1:30 ratio of coiled-coil to globular residues in the protein sequences encoded in GenBank. Given any residue score, S, the probability, P(S), of a residue with that score being found in a coiled-coil segment, is given by:

 
 
 
P(S) = G(CC)(S) / (30G(g)(S) 
+ G(CC)(S))
 
 

 

where G(CC)(S) is the value of the coiled-coil normal distribution at score S and G(g)(S) is the value of the globular normal distribution at score S.

CONSIDERATIONS

[ Previous | Top | Next ]

The method used by CoilScan accurately predicts parallel and antiparallel two-stranded coiled coils and parallel three-stranded coiled coils. This program does not accurately predict antiparallel helical bundles containing three or more helices.

SUGGESTIONS

[ Previous | Top | Next ]

Since five of the seven residue positions in the coiled-coil heptad repeat are hydrophilic, the prediction method used by CoilScan may assign a high coiled-coil probability to highly charged sequences that do not contain coiled coils. To counter this problem, you can give the two hydrophobic residue positions in the heptad repeat the same weight as all five hydrophilic positions by choosing one of the weighted windows in response to the program prompt or by using -WEIght. If a weighted scan results in a probability that is 20-30% lower than from an unweighted scan, the predicted sequence may simply be highly charged and not contain true coiled coils.

Both the default weight matrix, mtidkcoils.dat, and the alternative weight matrix, GenMoreData:mtkcoils.dat, identify known coiled-coil segments at high probability. However, when one of the matrices incorrectly identifies a coiled-coil segment in a protein sequence lacking coiled coils, the other matrix may not misidentify the same segment. Therefore, if you scan a sequence separately using each weight matrix (see the LOCAL DATA FILES topic), and the probabilities obtained by the two scans differ by more than 20-30%, the predicted segment may represent a false positive prediction.

CoilScan rarely identifies coiled-coil segments that are longer than 30 residues in protein sequences that don't contain true coiled coils. Therefore, if CoilScan assigns a high coiled-coil probability to a segment that is longer than 30 residues, that prediction is likely to be correct.

As you decrease the window size for the scan, the resolution between the score distributions of coiled-coil and globular proteins also decreases. Therefore, you should initially use the largest window length (28) for predicting new coiled-coil segments. Once a prediction has been made, you can use the smaller window sizes to identify the ends of the coiled-coil segment with greater precision.

GRAPHICS

[ Previous | Top | Next ]

Accelrys GCG (GCG) must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages GCG supports. See Section 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.

<CTRL>C

[ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % coilscan [-INfile=]pir:rgbya2 -Default
 
Prompted Parameters:
 
-BEGin=1 -END=281                sets the range of interest for each sequence
-WINdow=28                       sets window size for coiled-coil prediction (choices are 28, 21, and 14)
-WEIght                          gives increased weight to the 1st and 4th positions in the heptad repeat
-PROBability=0.5                 sets minimum probability to predict a coiled-coil segment
[-OUTfile1=]rgbya2.coilscan      specifies output file of coiled-coil predicted segments
 
Local Data Files:
 
-DATa=mtidkcoils.dat   assigns a file of amino acid coiled-coil propensities
 
Optional Parameters:
 
-TABle[=rgbya2.table]     files residue-by-residue coiled-coil probabilities
-NOPLOt                   suppresses the plot
-NOLINe                   suppresses plotting the threshold used for predicting coiled-coil segments
-RSF[=coilscan.rsf]       writes the predicted segments as features in an RSF file
-BATch                    submits program to the batch queue
-NOMONitor                suppresses the screen trace of program progress
-NOSUMmary                suppresses the screen summary
 
All GCG graphics programs accept these and other switches. See the Using Graphics section of the USERS GUIDE for descriptions.
 
-FIGure[=filename]  stores plot in a file for later input to FIGURE
-FONT=3             draws all text on the plot using font 3
-COLor=1            draws entire plot with pen in stall 1
-SCAle=1.2          enlarges the plot by 20 percent (zoom in)
-XPAN=10.0          moves plot to the right 10 platen units (pan right)
-YPAN=10.0          moves plot up 10 platen units (pan up)
-PORtrait           rotates plot 90 degrees

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

CoilScan reads the preference values for each amino acid at each possible position in the coiled-coil heptad repeat from the weight matrix file mtidkcoils.dat. This matrix is based on a database of 26,965 residues from myosins, tropomyosins, intermediate filaments type I-V, desmosomal proteins, and kinesins. CoilScan can read an alternative weight matrix based on a database of 16,968 residues from myosins, tropomyosins, and intermediate filaments type I and II if you use -DATa=GenMoreData:mtkcoils.dat.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-WINdow=28

   Sets the window length for the coiled-coil predictions. The choices are 28, 21, and 14. See the ALGORITHM topic for more information about window length.

-WEIght

Gives greater weight to the first and fourth positions (positions a and d) in the heptad repeat in predicting coiled-coil protein segments. See the SUGGESTIONS topic for more information about giving greater weight to the first and fourth positions in the heptad repeat.

-PROBability=0.5

   Sets the minimum probability required to predict a coiled-coil segment in a protein sequence.

-TABle=rgbya2.table

Writes a file with the coiled-coil probability at each residue position. The output also lists the structural position in the heptad repeat (position a through g) corresponding to each residue prediction. Those residues whose coiled-coil probabilities meet or exceed the threshold you set are indicated with an asterisk (*).

-NOPLOt

   Suppresses the plot.

-NOLINe

   Suppresses plotting a line indicating the threshold used for predicting coiled-coil segments as a horizontal dashed line on the plot.

-RSF=coilscan.rsf

Writes an RSF (rich sequence format) file containing the input sequences annotated with features generated from the results of CoilScan. This RSF file is suitable for input to other GCG programs that support RSF files. In particular, you can use SeqLab to view this features annotation graphically. If you don't specify a file name with this parameter, then the program creates one using coilscan for the file basename and .rsf for the extension. For more information on RSF files, see "Using Rich Sequence Format (RSF) Files" in Section 2 of the User's Guide. Or, see "Rich Sequence Format (RSF) Files" in Appendix C of the SeqLab Guide.

-BATch

Submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

Writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

               You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

The parameters below apply to all GCG graphics programs. These and many others are described in detail in Section 5, Using Graphics of the User's Guide.

-FIGure=programname.figure

   Writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of sending it to the device specified in your graphics configuration.

-FONT=3

   Draws all text characters on the plot using Font 3 (see Appendix I).

-COLor=1

               Draws the entire plot with the pen in stall 1.

               The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).

-SCAle=1.2

Expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).

-XPAN=30.0

   Moves the plot to the right by 30 platen units (pan right).

-YPAN=30.0

   Moves the plot up by 30 platen units (pan up).

-PORtrait

Rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.

Printed: May 27, 2005  11:55


[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio