[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Table of Contents
SeqStat+ is a utility program that reads through any number of input sequences and provides general statistics about the files, such as total length, number of sequences, and average length. Additionally it provides some extended information about the sequences depending on their type (protein or nucleotide), such as G+C% content. SeqStat+ provides an option which allows the user to configure what results are produced.
Advantages of Plus “+” Programs:
P Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.
P Plus programs remove sequence length restriction of 350,000bp.
If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.
SeqStat+ supports all input file formats such as BSML, FASTA, GenBank, SwissProt, and EMBL formats. SeqStat+ shall also read from STDIN and GCG-format databases, as well as regular files. When a directory of files is specified as input, SeqStat+ will recursively process all files within that directory as input.
%seqstat+
SeqStat+ reads in any number of sequences and prints general statistics about them, such as number of bases, average length, and percent GC.
seqstat+ of what sequence(s) ? gb_ba:a16stm2*
File
#Bases
#Seq AvgLen MinLen MaxLen %GC
------------------------------------------------------------------------------
gb_ba:A16STM208
1349
1
1349.0 1349 1349 57.52%
gb_ba:A16STM210
1348
1
1348.0 1348 1348 57.86%
gb_ba:A16STM213
1349
1
1349.0 1349 1349 58.56%
gb_ba:A16STM214
1349
1
1349.0 1349 1349 56.56%
gb_ba:A16STM220
1372
1
1372.0 1372 1372 57.65%
gb_ba:A16STM226
1347
1
1347.0 1347 1347 58.57%
gb_ba:A16STM232
1352
1
1352.0 1352 1352 56.95%
gb_ba:A16STM262
1342
1
1342.0 1342 1342 55.51%
------------------------------------------------------------------------------
Total 10808 8 1351.0 1342 1372 57.40%
Output shown above is written to the Console by default. If you include –outfile in your commandline, the output is redirected to the file specified.
As SeqStat+ processes the input, it will keep track of a number of statistics. The statistics for each input file or sequence will be kept and output separately. In addition, the statistics for all sequences processed will be kept and output at the end of processing.
The statistics displayed to the user will be controlled by a format string. This string defines which fields are printed. The available fields are:
SeqStat+ accepts multiple (one or more) nucleotide or protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example Genbank:*. If SeqStat+ rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.
All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases. Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.
SeqStat+ reads in any number
of sequences and prints general statistics about them, such as number of bases,
average length, and percent GC.
Minimal Syntax: % seqstat+ [-infile=]value -Default
Minimal Parameters (case-insensitive):
-infile
[Type: List / Default: EMPTY / Aliases: infile1 in]
Input file specification
Optional Parameters (case-insensitive):
-check
[Type: Boolean / Default: 'false' / Aliases: che help]
Prints out this usage message
-default
[Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where
possible.
-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
Prints banner at program startup
-quiet
[Type: Boolean / Default: 'false' / Aliases: qui]
Tells
application to print only a minimal amount of information
-outfile
[Type: OutFile / Default: '-' / Aliases: out]
File to which statistics are written. A value of '-' means STDOUT.
-fmtstr
[Type: String / Default: 'FBSAIXG' / Aliases: fmt]
Format string for statistics. Consists of one or more of the following
characters:
F: File name
T: File type
B: Number of bases
S: Number of sequences
A: Average sequence length
I: Minimum sequence length
X: Maximum sequence length
G: G+C% content
N: N% content
You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.
-check,
-che, -help
Prints out the usage summary.
-default,
-d, -def
Specifies that sensible
default values be used for all parameters where possible.
-documentation, -doc
Prints banner at program
startup (default). Skip banner with:
-doc=false
-quiet, -qui
This parameter is not
supported.
–in[file] =
filename..., -infile1, -in
This is a space-separated list of input files. If this is the first thing on the command line then the parameter tag can be omitted. If the input file name is “-“, then SeqStat+ will read from STDIN. This is a required parameter.
–out[file] = filename,
-out
Specifies that statistics should be written to filename. If filename is omitted or is ”-“, then statistics are written to STDOUT. The format of the statistics is controlled by the – -fmtstr option. The default value for –outfile is “-“.
–fmt[str] =
optionletters, -fmt
This option controls what appears in the output and the order of appearance. optionletterslists the fields that are to be written. It consists any of the following letters in any order:
· f: File name
· t: File format type
· b: Number of bases
· s: Number of sequences
· i: Minimum sequence length
· x: Maximum sequence length
· a: Average sequence length
· m: Median sequence length
· g: G+C% content
· n: N% content
The default value for fmtstr is “ftbsaix”.
Printed: May 26, 2005 11:44
[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Technical
Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.