[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Table of Contents
FormatDB+ combines any set of GCG sequences into a database that you can search with BLAST+.
Advantages of Plus “+” Programs:
P Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.
P Plus programs remove sequence length restriction of 350,000bp.
If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.
BLAST+ can search only databases that have been
compressed into a special format. Such databases must be searched in their
entirety. FormatDB+ is provided to allow you to create a BLAST-searchable
database from a group of sequences that interest you.
FormatDB+ accepts any GCG multiple sequence specification as input and
creates the three or four output files necessary for BLAST+.
These files share a common base name (the database name) and must be kept
together in the same directory.
The output is written into your current working directory. If you want
your output written into another directory use the command-line parameter -DIRectory=/usr/user/HomeDir/seq/.
FormatDB+ is a program that allows the user to do convert the input sequences into GCG compatible databases that can be used in alignment programs like BLAST.
formatdb+
of what sequence(s) ? pircat.list
What
should I call the database ? calcium_pump
Working on AA sequence
'A42764.pir2'
Working on AA sequence
'A48849.pir2'
Working on AA sequence
'B31981.pir2'
Working on AA sequence
'PWBYR1.pir1'
Working on AA sequence
'S24359.pir2'
Working on AA sequence
'S71168.pir2'
Working on AA sequence
'b31981.pir2'
FormatDB+ writes three or four files in your current working directory unless you redirect the output with the -directory parameter. These files are calcium_pump.psq, calcium_pump.psi, calcium_pump.psd, calcium_pump.pin, calcium_pump.phr. A log file is also created called formatdb.log.
FormatDB+ accepts multiple sequences of the same type. You can specify
multiple sequences in a number of ways: by using a list file, for example @project.list; by
using an MSF or RSF file, for example project.msf{*}; or by using a
sequence specification with an asterisk (*) wildcard, for
example
Genbank:*.
DataSet+ creates a GCG data library from any set of sequences in GCG format.
BLAST+ searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST+ can produce gapped alignments for the matches it finds.
DataSet creates a GCG data library from any set of sequences in GCG format.
BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.
All the sequences compressed by FormatDB+ must be the same type, that is all nucleotide or all protein! The output files must be kept together in the same directory.
By default BLAST+ does
local searches by reading files from the directory whose logical name is
BLASTDB. Each database known to BLAST+ is named
in one of the three local data files: blast.rdbs, blast.ldbs, and blast.sdbs,
so if your BLAST-searchable database is in some other directory, you have to
name that directory as part of the search set specification to BLAST+. For
instance you could use a specification like /usr/user/burgess/seq/mydatabase
that includes both the directory name and the name of the BLAST-searchable
database (mydatabase in this example).
Normally, BLAST+ cannot
search databases which contain more than 2 billion sequence characters. If the
input database exceeds this size, use the VOLsize parameter to format it into one or more
volumes which can then be searched as a single entity.
All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases. Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.
FormatDB allows the user to
do stuff with sequences.
Minimal Syntax: % formatdb+ [-infile=] value -Default
Minimal Parameters (case-insensitive):
-infile
[Type: InFile / Default: EMPTY / Aliases: infile1 in]
The name of the input sequence(s).
-outfile
[Type: OutFile / Default: EMPTY / Aliases: out outfile1]
Specified output database name.
Optional Parameters (case-insensitive):
-check
[Type: Boolean / Default: 'false' / Aliases: che help]
Prints out this usage message.
-default
[Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where
possible.
-documentation [Type:
Boolean / Default: 'true' / Aliases: doc]
Prints banner at program startup.
-quiet
[Type: Boolean / Default: 'false' / Aliases: qui]
Tells
application to print only a minimal amount of information.
-directory [Type: String /
Default: EMPTY / Aliases: dir]
Writes output files into a directory other than the current directory.
-volsize
[Type: Integer / Default: '0' / Aliases: vol]
Sets maximum number of characters per database volume in millions, 0
means single volume).
-parseseqid [Type: Boolean /
Default: 'true' / Aliases: par]
Create databases with additional parsed indicies.
-executable [Type: String / Default: 'extbin:formatdb_native' / Aliases:exe outfile1]FormatDB exectuable to use
You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.
-infile, -infile1, -in
The name of
the input sequence(s).
-outfile, -outfile1, -out
Specified
output database name.
-directory=dirname, -dir
This parameter allows you to redirect the
output files written by FormatDB+ to a directory other than your current
working directory.
-check, -che, -help
Prints
out this usage message.
-default, -def
Specifies that sensible default values be
used for all parameters where possible.
-documentation, -doc
Prints banner at program startup.
-quiet, -qui
This parameter is
not supported.
-parseseqid, -par
Create databases with additional parsed indices.
-executable,
-exe
FormatDB+ executable to use
-volsize=500000000, -vol
Allows a database to be formatted as a series of volumes that are
automatically treated as one continuous database during BLAST+ searches. Using
multiple volumes permits you to format databases which contain more than 2
billion sequence characters. The optional numerical argument is used to specify
the maximum number of sequence characters per volume. If no numerical value is
specified, then a size limit of 500,000,000 characters is used for each volume.
If volsize=0 is specified, or
if volsize is omitted, then
the database is formatted as a single-volume.
Printed: April 5, 2005 14:58
[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Technical
Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.