FORMATDB+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

SPECIFYING DATABASES TO BLAST

SUGGESTIONS

COMMAND-LINE SUMMARY

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

FormatDB+ combines any set of GCG sequences into a database that you can search with BLAST+.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

BLAST+ can search only databases that have been compressed into a special format. Such databases must be searched in their entirety. FormatDB+ is provided to allow you to create a BLAST-searchable database from a group of sequences that interest you.

FormatDB+ accepts any GCG multiple sequence specification as input and creates the three or four output files necessary for BLAST+. These files share a common base name (the database name) and must be kept together in the same directory.

The output is written into your current working directory. If you want your output written into another directory use the command-line parameter -DIRectory=/usr/user/HomeDir/seq/.

 

EXAMPLE

[ Previous | Top | Next ]

FormatDB+ is a program that allows the user to do convert the input sequences into GCG compatible databases that can be used in alignment programs like BLAST.

formatdb+ of what sequence(s) ? pircat.list

What should I call the database ? calcium_pump

Working on AA sequence 'A42764.pir2'

Working on AA sequence 'A48849.pir2'

Working on AA sequence 'B31981.pir2'

Working on AA sequence 'PWBYR1.pir1'

Working on AA sequence 'S24359.pir2'

Working on AA sequence 'S71168.pir2'

Working on AA sequence 'b31981.pir2'

OUTPUT

[ Previous | Top | Next ]

FormatDB+ writes three or four files in your current working directory unless you redirect the output with the -directory parameter. These files are calcium_pump.psq, calcium_pump.psi, calcium_pump.psd, calcium_pump.pin, calcium_pump.phr. A log file is also created called formatdb.log.

INPUT FILES

[ Previous | Top | Next ]

FormatDB+ accepts multiple sequences of the same type. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example Genbank:*.

 

RELATED PROGRAMS

[ Previous | Top | Next ]

DataSet+ creates a GCG data library from any set of sequences in GCG format.

BLAST+ searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST+ can produce gapped alignments for the matches it finds.

DataSet creates a GCG data library from any set of sequences in GCG format.

BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.

RESTRICTIONS

[ Previous | Top | Next ]

All the sequences compressed by FormatDB+ must be the same type, that is all nucleotide or all protein! The output files must be kept together in the same directory.

SPECIFYING DATABASES TO BLAST

[ Previous | Top | Next ]

By default BLAST+ does local searches by reading files from the directory whose logical name is BLASTDB. Each database known to BLAST+ is named in one of the three local data files: blast.rdbs, blast.ldbs, and blast.sdbs, so if your BLAST-searchable database is in some other directory, you have to name that directory as part of the search set specification to BLAST+. For instance you could use a specification like /usr/user/burgess/seq/mydatabase that includes both the directory name and the name of the BLAST-searchable database (mydatabase in this example).

SUGGESTIONS

[ Previous | Top | Next ]

Normally, BLAST+ cannot search databases which contain more than 2 billion sequence characters. If the input database exceeds this size, use the VOLsize parameter to format it into one or more volumes which can then be searched as a single entity.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

FormatDB allows the user to do stuff with sequences.


Minimal Syntax: % formatdb+ [-infile=] value -Default

Minimal Parameters (case-insensitive):

-infile         [Type: InFile / Default: EMPTY / Aliases: infile1 in]
                The name of the input sequence(s).

-outfile        [Type: OutFile / Default: EMPTY / Aliases: out outfile1]
                Specified output database name.

Optional Parameters (case-insensitive):

-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.

-default        [Type: Boolean / Default: 'false' / Aliases: d def]
                Specifies that sensible default values be used for all parameters where possible.

-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup.

-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
                Tells application to print only a minimal amount of information.

-directory      [Type: String / Default: EMPTY / Aliases: dir]
                Writes output files into a directory other than the current directory.

-volsize        [Type: Integer / Default: '0' / Aliases: vol]
                Sets maximum number of characters per database volume in millions, 0 means single volume).

-parseseqid     [Type: Boolean / Default: 'true' / Aliases: par]
                Create databases with additional parsed indicies.

-executable     [Type: String / Default: 'extbin:formatdb_native' / Aliases:exe outfile1]FormatDB exectuable to use

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, -in

 

The name of the input sequence(s).

 

-outfile, -outfile1, -out

 

Specified output database name.

 

-directory=dirname, -dir

This parameter allows you to redirect the output files written by FormatDB+ to a directory other than your current working directory.

-check, -che, -help   

 

Prints out this usage message.

 

-default, -def

 

            Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

            Prints banner at program startup.

 

-quiet, -qui  

      

            This parameter is not supported.

 

-parseseqid, -par

 

Create databases with additional parsed indices.

      -executable, -exe  

FormatDB+ executable to use

-volsize=500000000, -vol

Allows a database to be formatted as a series of volumes that are automatically treated as one continuous database during BLAST+ searches. Using multiple volumes permits you to format databases which contain more than 2 billion sequence characters. The optional numerical argument is used to specify the maximum number of sequence characters per volume. If no numerical value is specified, then a size limit of 500,000,000 characters is used for each volume. If volsize=0 is specified, or if volsize is omitted, then the database is formatted as a single-volume.

 

Printed: April 5, 2005 14:58


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio