COILSCAN+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

ALGORITHM

CONSIDERATIONS

SUGGESTIONS

GRAPHICS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

CoilScan+ locates coiled-coil segments in protein sequences.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

CoilScan+ uses the method of Lupas, et al. to find coiled-coil segments in protein sequences by comparing each residue in the sequence to a weight matrix tabulated from known coiled-coil protein segments. A coiled-coil probability is calculated for each residue in the protein, and those segments whose probabilities meet or exceed a threshold probability you set are reported in the output. This prediction method works only for solvent exposed coiled coils, particularly for parallel and anti-parallel two-stranded coiled coils and for parallel three-stranded coiled coils.

EXAMPLE

[ Previous | Top | Next ]

Here is a session with CoilScan+ that was used to find coiled-coil segments in the amino acid biosynthesis regulatory protein RGBYA2- yeast (Saccharomyces cerevisiae). 

 
coilscan+
 
CoilScan+ locates coiled-coil segments in protein sequences.
 
coilscan+ with what sequence(s) ? rgbya2.pir1
Begin (* 1 *) ?
End (-1 for entire sequence) (* -1 *) ?
Predict coiled-coil segments using what window size. Valid Values are: 28, 21, 14 (* 28 *) ?
Increase weight on 1st and 4th positions (* flag *) ?
What minimum probability predicts a coiled-coil residue (* 0.5 *) ?
What should I call the output file (* <sequence_name>.coilscan+ *) ?
 
 
Analyzing sequence 'RGBYA2' from 'RGBYA2.pir1'
Processing results...
 
Input sequences processed                       : 1
Number of sequences with predicted Coiled Coils : 1
 
 
Results written to rgbya2.coilscan+
 

OUTPUT

[ Previous | Top | Next ]

Here is the output file:

 

 

*** SUMMARY ***

 

 

 
COILSCAN+ of: pir1:rgbya2 Check 5404 from: 1 to 281   September 28, 1998 15:02
 
P1;RGBYA2 - amino acid biosynthesis regulatory protein - yeast (Saccharomyces
 cerevisiae)
N;Alternate names: protein YEL009c
C;Species: Saccharomyces cerevisiae
C;Date: 27-Nov-1985 #sequence_revision 27-Nov-1985 #text_change 12-Dec-1997
C;Accession: A03605; S50450; A03604
R;Hinnebusch, A.G. . . .
 
 Window: 28
 Coiled-coil segments predicted at a minimum probability of 0.50
 Weight matrix: GenRunData:mtidkcoils.dat
 
   231 KRARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER 281
      Probability: 1.00

The CoilScan+ output is a text file containing a list of those protein segments whose residue coiled-coil probabilities all exceed the threshold you set. The highest probability in each segment is reported below the sequence of the coiled-coil segment.

If you specify multiple protein sequences as input (see the INPUT FILES topic), CoilScan+ writes a separate text output file for each input sequence. Each output file is named after the input sequence and given the .coilscan+ file name extension.

 

If you use -table, the program writes a text output file with the coiled-coil probability calculated for each residue in the sequence. Along with the probability, the table output file lists the structural position in the coiled-coil heptad repeat (position a through g) that is associated with each residue prediction. Those residues whose probabilities meet or exceed the threshold you set are marked by asterisks (*) in the table output file.

If you specify multiple protein sequences as input (see the INPUT FILES) topic, CoilScan+ writes a single table output file for all input sequences. 

INPUT FILES

[ Previous | Top | Next ]

The input to CoilScan+ is one or more protein sequences. If CoilScan+ rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*.

If you use a list file to specify a group of input sequences, you can add begin and end sequence attributes to specify a range for each sequence to scan. If a sequence range is shorter than the window size selected with -window or in response to the program prompt, that sequence is skipped.

RELATED PROGRAMS

[ Previous | Top | Next ]

CoilScan locates coiled-coil segments in protein sequences.

Motifs looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.

ProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences.

HTHScan+ scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation.

SPScan+ scans protein sequences for the presence of secretory signal peptides (SPs).

HTHScan scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation.

SPScan scans protein sequences for the presence of secretory signal peptides (SPs).

 

ALGORITHM

[ Previous | Top | Next ]

Coiled coils are bundles of two or more alpha helices that are supercoiled together. Each alpha helix in a coiled coil is strongly amphipathic, and the pattern of hydrophilic and hydrophobic amino acids repeats every seven residues. If the seven positions in the coiled-coil heptad repeat are labeled a through g, positions a and d are hydrophobic and form the helix interface, while positions b, c, e, f, and g are hydrophilic and are solvent-exposed.

CoilScan+ uses the method of described by Lupas, Van Dyke, and Stock (Science 252; 1162-1164 (1991)) and Lupas (In Methods in Enzymology 266; 513-525 (1996)) to predict coiled coils in protein sequences. The frequency of each amino acid at each position in the heptad repeat was tabulated from a database of coiled-coil sequences. Each amino acid frequency was divided by the corresponding frequency of that amino acid in all proteins encoded in GenBank to produce a weight matrix containing values for each amino acid at each position in the heptad repeat. If the weight matrix value for an amino acid at a position in the heptad repeat is greater than 1.0, that amino acid is found at a greater than random frequency at that position in a coiled coil. Conversely, if the value for an amino acid at a position in the heptad repeat is less than 1.0, that amino acid is found at a less than random frequency at that position.

The input sequence is compared to the weight matrix using a sliding window along the sequence. For example, using the default window length of 28, the first residue in the window is assumed to be the position in the heptad repeat and all other residues in that window assume sequential repeat positions (i.e. bcdefgabcde...). The geometric average of all of the individual weight matrix values under the window is determined. Next, the first residue in the window is assumed to be in the b position in the heptad repeat, and the average window score is determined again. After determining an average window score for each of the heptad repeat frames, the maximum average window score is saved. The window is then shifted one residue along the sequence and the entire process is repeated in this new window. The final coiled-coil score for each residue in determined as the maximum of all of the average scores for all of the windows that include the residue.

After determining a coiled-coil score for each residue in the sequence, a coiled-coil probability is calculated for each residue using those scores. This probability is based on an empirically determined normal distribution of scores for residues in coiled-coil domains, a separate empirically determined normal distribution of scores for residues in all globular (non-coiled-coil) proteins, and an estimated 1:30 ratio of coiled-coil to globular residues in the protein sequences encoded in GenBank. Given any residue score, S, the probability, P(S), of a residue with that score being found in a coiled-coil segment, is given by:

 
 
 
P(S) = G(CC)(S) / (30G(g)(S) 
+ G(CC)(S))
 
 

 

Where G(CC)(S) is the value of the coiled-coil normal distribution at score S and G(g)(S) is the value of the globular normal distribution at score S.

CONSIDERATIONS

[ Previous | Top | Next ]

The method used by CoilScan+ accurately predicts parallel and anti-parallel two-stranded coiled coils and parallel three-stranded coiled coils. This program does not accurately predict anti-parallel helical bundles containing three or more helices.

SUGGESTIONS

[ Previous | Top | Next ]

Since five of the seven residue positions in the coiled-coil heptad repeat are hydrophilic, the prediction method used by CoilScan+ may assign a high coiled-coil probability to highly charged sequences that do not contain coiled coils. To counter this problem, you can give the two hydrophobic residue positions in the heptad repeat the same weight as all five hydrophilic positions by choosing one of the weighted windows in response to the program prompt or by using -weight. If a weighted scan results in a probability that is 20-30% lower than from an unweighted scan, the predicted sequence may simply be highly charged and not contain true coiled coils.

Both the default weight matrix, mtidkcoils.dat, and the alternative weight matrix, Share_misc:mtkcoils.dat, identify known coiled-coil segments at high probability. However, when one of the matrices incorrectly identifies a coiled-coil segment in a protein sequence lacking coiled coils, the other matrix may not misidentify the same segment. Therefore, if you scan a sequence separately using each weight matrix (see the LOCAL DATA FILES topic), and the probabilities obtained by the two scans differ by more than 20-30%, the predicted segment may represent a false positive prediction.

CoilScan+ rarely identifies coiled-coil segments that are longer than 30 residues in protein sequences that don't contain true coiled coils. Therefore, if CoilScan+ assigns a high coiled-coil probability to a segment that is longer than 30 residues, that prediction is likely to be correct.

As you decrease the window size for the scan, the resolution between the score distributions of coiled-coil and a globular protein also decreases. Therefore, you should initially use the largest window length (28) for predicting new coiled-coil segments. Once a prediction has been made, you can use the smaller window sizes to identify the ends of the coiled-coil segment with greater precision.

 

PROGRAM TERMINATION

[ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

 

CoilScan+ locates coiled-coil segments in protein sequences.


Minimal Syntax: % coilscan+ [-infile=]value [-outfile=]value -Default


Minimal Parameters (case-insensitive):

-infile         [Type: InFile / Default: EMPTY / Aliases: infile1 in]
                The name of the input file.

Prompted Parameters (case-insensitive):

-begin          [Type: Integer / Default: '1' / Aliases: beg]
                First base of interest for each query sequence.


-end            [Type: Integer / Default: '-1']
                Last base of interest for each query sequence.


-window         [Type: Integer / Default: '28']
                Sets window size for coiled-coil prediction. Valid Values are: 28, 21, 14.


-weight         [Type: Boolean / Default: 'flag' / Aliases: wei]
                Gives increased weight to the 1st and 4th positions in the heptad repeat.


-probability    [Type: Double / Default: '0.5' / Aliases: prob]
                Sets minimum probability to predict a coiled-coil segment.


-outfile        [Type: OutFile / Default: '<sequence_name>.coilscan+' /
                Aliases: out outfile1] Names the output file.

Optional Parameters (case-insensitive):

-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.


-default        [Type: Boolean / Default: 'false' / Aliases: d def]
                Specifies that sensible default values be used for all parameters where possible.


-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup.


-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
                Tells application to print only a minimal amount of information.


-data           [Type: String / Default: 'SHARE_MISC:mtidkcoils.dat' /
                Aliases: dat] Assigns a file of amino acid coiled-coil propensities.


-table          [Type: OutFile / Default: EMPTY / Aliases: tab]
                Files residue-by-residue coiled-coil probabilities.


-seqout         [Type: OutFile / Default: EMPTY / Aliases: rsf]
                Annotated sequence output.


-monitor        [Type: Boolean / Default: 'true' / Aliases: mon]
                Displays screen trace of progress.


-batch          [Type: Boolean / Default: 'false']
                Allows to submit a job to a batch queue.


-summary        [Type: Boolean / Default: 'true' / Aliases: sum]
                Displays screen summary at end of the program.

 

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

CoilScan+ reads the preference values for each amino acid at each possible position in the coiled-coil heptad repeat from the weight matrix file mtidkcoils.dat. This matrix is based on a database of 26,965 residues from myosins, tropomyosins, intermediate filaments type I-V, desmosomal proteins, and kinesins. CoilScan+ can read an alternative weight matrix based on a database of 16,968 residues from myosins, tropomyosins, and intermediate filaments type I and II if you use -DATa=share_misc:mtkcoils.dat

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, -in

 

                   The name of the input file.

 

-begin, -beg

 

                   First base of interest for each query sequence.

 

-end

 

                   Last base of interest for each query sequence.

 

-outfile, -outfile1, -out

 

                   Names the output file.

 

-seqout, -rsf

 

                   Annotated sequence output.

 

-window=28

Sets the window length for the coiled-coil predictions. The choices are 28, 21, and 14. See the ALGORITHM topic for more information about window length.

-weight, -wei

Gives greater weight to the first and fourth positions (positions a and d) in the heptad repeat in predicting coiled-coil protein segments. See the SUGGESTIONS topic for more information about giving greater weight to the first and fourth positions in the heptad repeat.

-check, che, -help

 

      Prints command line summary to the screen.

 

-default, -d, -def

 

      Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

     Prints banner at program startup.

 

-quiet, -qui

 

   This parameter is not supported.

 

-data, -dat

 

  Assigns weight matrix. The default weight matrix files are located in $GCGROOT/share/misc folder

 

-probability=0.5, -prob=0.5

              Sets the minimum probability required to predict a coiled-coil segment in a protein sequence.

-table=rgbya2.table, -tab=rgbya2.table

Writes a file with the coiled-coil probability at each residue position. The output also lists the structural position in the heptad repeat (position a through g) corresponding to each residue prediction. Those residues whose coiled-coil probabilities meet or exceed the threshold you set are indicated with an asterisk (*).

-batch

Submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

-monitor, -mon

Program monitors its progress on your screen by displaying a screen trace of progress. However, when you use -default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-summary

Writes a summary of the program's completion to the screen. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -summary=false.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

 

Printed: May 27, 2005  11:56


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio