SEQSTAT+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RESTRICTIONS

COMMAND-LINE SUMMARY

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

SeqStat+ is a utility program that reads through any number of input sequences and provides general statistics about the files, such as total length, number of sequences, and average length. Additionally it provides some extended information about the sequences depending on their type (protein or nucleotide), such as G+C% content. SeqStat+ provides an option which allows the user to configure what results are produced.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

 

SeqStat+ supports all input file formats such as BSML, FASTA, GenBank, SwissProt, and EMBL formats. SeqStat+ shall also read from STDIN and GCG-format databases, as well as regular files. When a directory of files is specified as input, SeqStat+ will recursively process all files within that directory as input.

 

EXAMPLE

[ Previous | Top | Next ]

%seqstat+

 

SeqStat+ reads in any number of sequences and prints general statistics about them, such as number of bases, average length, and percent GC.

 

 

seqstat+ of what sequence(s) ? gb_ba:a16stm2*

 

 

File                     #Bases     #Seq    AvgLen    MinLen  MaxLen    %GC

------------------------------------------------------------------------------

gb_ba:A16STM208          1349       1       1349.0    1349    1349      57.52%

gb_ba:A16STM210          1348       1       1348.0    1348    1348      57.86%

gb_ba:A16STM213          1349       1       1349.0    1349    1349      58.56%

gb_ba:A16STM214          1349       1       1349.0    1349    1349      56.56%

gb_ba:A16STM220          1372       1       1372.0    1372    1372      57.65%

gb_ba:A16STM226          1347       1       1347.0    1347    1347      58.57%

gb_ba:A16STM232          1352       1       1352.0    1352    1352      56.95%

gb_ba:A16STM262          1342       1       1342.0    1342    1342      55.51%

------------------------------------------------------------------------------

Total                    10808      8       1351.0    1342    1372      57.40%

 

 

OUTPUT

[ Previous | Top | Next ]

            Output shown above is written to the Console by default. If you include –outfile in your commandline, the output is redirected to the file specified.

As SeqStat+ processes the input, it will keep track of a number of statistics. The statistics for each input file or sequence will be kept and output separately. In addition, the statistics for all sequences processed will be kept and output at the end of processing.

 

The statistics displayed to the user will be controlled by a format string. This string defines which fields are printed.  The available fields are:

 

 

 

INPUT FILES

[ Previous | Top | Next ]

SeqStat+ accepts multiple (one or more) nucleotide or protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example Genbank:*. If SeqStat+ rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RESTRICTIONS

[ Previous | Top | Next ]

 

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

SeqStat+ reads in any number of sequences and prints general statistics about them, such as number of bases, average length, and percent GC.


Minimal Syntax: % seqstat+ [-infile=]value -Default


Minimal Parameters (case-insensitive):

-infile         [Type: List / Default: EMPTY / Aliases: infile1 in]
                Input file specification

Optional Parameters (case-insensitive):

-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message
-default        [Type: Boolean / Default: 'false' / Aliases: d def]
                Specifies that sensible default values be used for all parameters where possible.
-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup
-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
                Tells application to print only a minimal amount of information
-outfile        [Type: OutFile / Default: '-' / Aliases: out]
                File to which statistics are written. A value of '-' means STDOUT.
-fmtstr         [Type: String / Default: 'FBSAIXG' / Aliases: fmt]
                Format string for statistics. Consists of one or more of the following characters:
                F: File name
                T: File type
                B: Number of bases
                S: Number of sequences
                A: Average sequence length
                I: Minimum sequence length
                X: Maximum sequence length
                G: G+C% content
                N: N% content

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

  -check, -che, -help          

 

Prints out the usage summary. 

 

  -default, -d, -def       

 

Specifies that sensible default values be used for all parameters where possible. 

 

  -documentation, -doc

 

Prints banner at program startup (default). Skip banner with:  -doc=false

 

  -quiet, -qui        

 

This parameter is not supported.

 

–in[file] = filename..., -infile1, -in

 

This is a space-separated list of input files. If this is the first thing on the command line then the parameter tag can be omitted. If the input file name is “-“, then SeqStat+ will read from STDIN. This is a required parameter.

 

–out[file] = filename, -out

 

Specifies that statistics should be written to filename. If filename is omitted or is ”-“, then statistics are written to STDOUT. The format of the statistics is controlled by the – -fmtstr option. The default value for –outfile is “-“.

 

–fmt[str] = optionletters, -fmt

 

This option controls what appears in the output and the order of appearance. optionletterslists the fields that are to be written. It consists any of the following letters in any order:

 

·        f: File name

·        t: File format type

·        b: Number of bases

·        s: Number of sequences

·        i: Minimum sequence length

·        x: Maximum sequence length

·        a: Average sequence length

·        m: Median sequence length

·        g: G+C% content

·        n: N% content

 

The default value for fmtstr is “ftbsaix”.

 

 

Printed: May 26, 2005 11:44


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio