NETBLAST+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INTERPRETING OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

CHOOSING SEARCH SETS

ALGORITHM

PRELIMINARIES

TURNING HITS INTO HSPs

GENERATING GAPPED EXTENSIONS

CONSIDERATIONS

SUGGESTIONS

FILTERING OUT LOW COMPLEXITY SEQUENCES

AMINO ACID SCORING

NUCLEOTIDE SCORING

ALTERNATIVE GENETIC CODES

NETWORK CONSIDERATIONS

COMMAND-LINE SUMMARY

CITING BLAST

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

NetBLAST+ searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST+ can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

BLAST+, or Basic Local Alignment Search Tool, uses the method of Altschul et al. (J. Mol. Biol. 215; 403-410 (1990)) to search for similarities between a query sequence and all the sequences in a database. The query sequence and the database you want to search can be either protein or nucleic acid in any combination.

Accelrys GCG (GCG) NetBLAST+ program is an interface to the BLAST+ service provided by NCBI's web server at www.ncbi.nlm.nih.gov, or by their email server blast@ncbi.nlm.nih.gov.

At the time this document was written, the server at NCBI was offering version 2 of BLAST+, which is described in Altschul et al. (Nucleic Acids Res. 25(17): 3389-3402 (1997)). BLAST2 is known as "gapped BLAST+" because it generates gapped alignments between query and database sequences. It also runs three times as fast as the original BLAST+.

NetBLAST+ is very similar to BLAST+, but whereas BLAST+ searches local databases using the resources of your local server, NetBLAST+ performs only remote searches.

NetBLAST+ supports five different programs in the BLAST+ family:

BLASTP, Protein Query Searching a Protein Database

Each database sequence is compared to the query in a separate protein-protein pairwise comparison.

BLASTX, Nucleotide Query Searching a Protein Database

The query is translated, and each of the six products is compared to each database sequence in a separate protein-protein pairwise comparison.

BLASTN, Nucleotide Query Searching a Nucleotide Database

Each database sequence is compared to the query in a separate nucleotide-nucleotide pairwise comparison.

TBLASTN, Protein Query Searching a Nucleotide Database

Each nucleotide database sequence is translated, and each of the six products is compared to the query in a separate protein-protein pairwise comparison.

TBLASTX, Nucleotide Query Searching a Nucleotide Database

The query and each database sequence are both translated in six frames, and each of the 12 products is compared in 36 different pairwise comparisons. Because this program involves more computation than the others, the BLAST+ server at NCBI will not accept requests for searches of the Non-redundant (nr) database.

Normally, NetBLAST+ decides which BLAST+ program you want to use simply by looking at the type (protein or nucleic acid) of your query sequence and the database you have selected. In the case of nucleotide-nucleotide searches, there are two programs that can do the search. By default, BLASTN is used. To search using TBLASTX instead, use the -TBLASTX parameter.

NetBLAST+ can only search remote databases maintained by NCBI. Remote searches require almost no resources from your own computer. More importantly, the databases at NCBI are updated daily. Keep in mind, however, that using NetBLAST+ is a security risk if your sequence data are confidential.

BLAST+ is a statistically driven search method that finds regions of similarity between your query and database sequences and produces gapped alignments of these regions. Within these aligned regions, the sum of the scoring matrix values of their constituent symbol pairs is higher than some level that you would expect to occur by chance alone.

NetBLAST+ prompts you to set an expectation level for the entire search. By default this level is 10.0, which means that hits are reported only if they have a score that would be expected to occur purely by chance no more than 10 times in this particular search.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using NetBLAST+ to find the sequences in the Non-redundant nucleotide database with similarities to a actinomycetes gene:

11:46~242> netblast+
 
BLAST+ searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST+ can produce gapped alignments for the matches it finds.
 
 
blast+ with what query sequence(s) ? gb_ba:a16stm210
Begin (* 1 *) ?
End (-1 for entire sequence) (* -1 *) ? 600
Search for query in what sequence database ? nr
Ignore hits expected to occur by chance more than n times (* 10 *) ? 1
Limit the number of sequences in my output to (* 500 *) ? 5
What should I call the output file (* <sequence_name>.blast+ *) ?
 
 
Results written to A16STM210.blast+
 
 

OUTPUT

[ Previous | Top | Next ]

 
BLASTN 2.2.10 [Oct-19-2004]
 
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.
RID: 1102573391-7144-145946567571.BLASTQ4
Query= A16STM210
         (600 letters)
 
Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           2,787,982 sequences; 12,411,985,240 total letters
 
 
 
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value
 
emb|X92704.1|A16STM210 Uncultured Actinomycetales bacterium 16S ...  1189   0.0
emb|X92697.1|A16STM81 Uncultured Actinomycetales bacterium 16S r...   985   0.0
emb|X92709.1|A16STM232 Uncultured Actinomycetales bacterium 16S ...   952   0.0
emb|X92703.1|A16STM208 Uncultured Actinomycetales bacterium 16S ...   936   0.0
emb|X92706.1|A16STM214 Uncultured Actinomycetales bacterium 16S ...   922   0.0
 
ALIGNMENTS
>emb|X92704.1|A16STM210 Uncultured Actinomycetales bacterium 16S ribosomal RNA (clone
           TM210)
          Length = 1348
 
 Score = 1189 bits (600), Expect = 0.0
 Identities = 600/600 (100%)
 Strand = Plus / Plus
 
 
Query: 1   cgctggcggcgtgcctaacacatgcaagtcgaacgagattcagtcggtagcaataccgac 60
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1   cgctggcggcgtgcctaacacatgcaagtcgaacgagattcagtcggtagcaataccgac 60
 
 
Query: 61  gaagatctagtggtgaacgggtgagtagcacgtgagcaacctgccccgaagaccgggaca 120
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61  gaagatctagtggtgaacgggtgagtagcacgtgagcaacctgccccgaagaccgggaca 120
 
 
Query: 121 acaccgggaaaccggtgctaataccggatacccccatcagatcgcatggtttgatgagga 180
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 acaccgggaaaccggtgctaataccggatacccccatcagatcgcatggtttgatgagga 180
 
 
Query: 181 aatggattccgcttcgggaggggctcgcggcctatcagctagttggtgaggtaacggctc 240
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 aatggattccgcttcgggaggggctcgcggcctatcagctagttggtgaggtaacggctc 240
 
 
Query: 241 accaaggcatcgacgggtagctggtctgagaggacgatcagccacactgggactgagaca 300
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 accaaggcatcgacgggtagctggtctgagaggacgatcagccacactgggactgagaca 300
 
 
Query: 301 cggcccagactcctacgggaggcagcagtggggaatcttgcacaatgggcgaaagcctga 360
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 cggcccagactcctacgggaggcagcagtggggaatcttgcacaatgggcgaaagcctga 360
 
 
Query: 361 tgcagcaacgccgcgtgagggacgaaggctttctgagttgtaaacctcttacagcaggga 420
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 tgcagcaacgccgcgtgagggacgaaggctttctgagttgtaaacctcttacagcaggga 420
 
 
...
 
Query: 361 tgcagcaacgccgcgtgagggacgaaggctttctgagttgtaaacctcttacagcaggga 420
           |||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||
Sbjct: 361 tgcagcaacgccgcgtgagggacgaaggctttctgagttgtaaacctctttcagcaggga 420
 
 
Query: 421 cgattatgacggtacctgcagaagaagccccggctaactacgtgccagcagccgcggtga 480
           ||||| |||||||||||||||||||||||||||| ||||||||||||||||||| ||| |
Sbjct: 421 cgattgtgacggtacctgcagaagaagccccggccaactacgtgccagcagccgtggtaa 480
 
 
Query: 481 tacgtagggggcgagcgttgtccggattcattgggcgtaaagagctcgtaggcggtttgg 540
           |||||||||||| |||||||||| |||| |||||||||||||||||||||||||| ||||
Sbjct: 481 tacgtagggggcaagcgttgtccagatttattgggcgtaaagagctcgtaggcggcttgg 540
 
 
Query: 541 taagtcggatgtgaaagccccaggcttaacctggagatgccactcgatactgccatggct 600
            |||||| |||||||| | ||||||| ||||||||||||||| | |||||||||||||||
Sbjct: 541 caagtcgaatgtgaaacctccaggctcaacctggagatgccatttgatactgccatggct 600
 
 
Lambda     K      H
    1.37    0.711     1.31
 
Gapped
Lambda     K      H
    1.37    0.711     1.31
 
Matrix: blastn matrix:1 -3
Gap Penalties: Existence: 5, Extension: 2
Number of Sequences: 2787982
Number of Hits to DB: 5,388,035
Number of extensions: 287159
Number of successful extensions: 59364
Number of sequences better than  1.0: 13926
Number of HSP's better than  1.0 without gapping: 13881
Number of HSP's gapped: 48889
Number of HSP's successfully gapped: 31398
Number of extra gapped extensions for HSPs above  1.0: 8568
Length of query: 600
Length of database: 12,411,985,240
Length adjustment: 22
Effective length of query: 578
Effective length of database: 12,350,649,636
Effective search space: 7138675489608
Effective search space used: 7138675489608
A: 0
X1: 11 (21.8 bits)
X2: 15 (30.0 bits)
X3: 25 (50.0 bits)
S1: 14 (25.0 bits)
S2: 22 (44.1 bits)
 
 
 
 

Below is part of the output from the search in the example session:

The output has four parts: 1) an introduction that tells where the search occurred and what database and query were compared; 2) a list of the sequences in the database containing HSPs (high-scoring segment pairs) whose scores were least likely to have occurred by chance; 3) a display of the alignments of the HSPs showing identical and similar residues; and 4) a list of the parameter settings used for the search.

By default, NetBLAST+ looks for alignments that contain gaps. If you look only for alignments that do not contain gaps, there will often be more than one segment pair associated with each database sequence.

The query sequence for this search has been filtered. Filtering eliminates low complexity regions that commonly give spuriously high scores that reflect compositional bias rather than significant position-by-position alignment.Filtering can eliminate these potentially confounding matches (e.g., hits against proline-rich regions or poly-A tails) from the BLAST+ reports, leaving regions whose BLAST+ statistics reflect the specificity of their pairwise alignment.

INTERPRETING OUTPUT

[ Previous | Top | Next ]

Bit Score

Each aligned segment pair has a normalized score expressed in bits that lets you estimate the magnitude of the search space you would have to look through before you would expect to find an HSP score as good as or better than this one by chance. If the bit score is 30, you would have to score, on average, about 1 billion independent segment pairs (2(30)) to find a score this good by chance. Each additional bit doubles the size of the search space. This bit score represents a probability; one over two raised to this power is the probability of finding such a segment by chance. Bit scores represent a probability level for sequence comparisons that is independent of the size of the search.

The size of the search space is proportional to the product of the query sequence length times the sum of the lengths of the sequences in the database. This product, referred to as N in Altschul's publications, is multiplied by a coefficient K to get the size of the search space. When searching protein databases with protein queries, K is about 0.13. NetBLAST+ uses estimates of K produced before it runs by random simulation (Altschul & Gish, Methods in Enzymology 266; 460-480 (1996)).

E Value

There is a probability associated with each pairwise comparison in the list and with each segment pair alignment. The number shown in the list is the probability that you would observe a score or group of scores as high as the observed score purely by chance when you do a search against a database of this size.

An ideal search would find hits that go from extremely unlikely to ones whose best scores should have occurred by chance alone (that is, with probabilities approaching 1.0).

N

If you specify ungapped alignments to NetBLAST+, a third column of data will appear in your output under the heading N. The number in that column indicates how many HSPs were involved in computing the statistics for the sequence. If the number is greater than 1, the scores of multiple HSPs were combined to produce the result. See the ALGORITHM topic for more information.

NetBLAST+ Parameters

At the end of the output is a listing of parameter settings along with some trace information about the search.

INPUT FILES

[ Previous | Top | Next ]

NetBLAST+ accepts a single protein sequence or a single nucleic acid sequence as the query sequence. The function of NetBLAST+ depends on whether your input sequence is protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence is not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

NetBLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.

BLAST+ searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST+ can produce gapped alignments for the matches it finds.

NetFetch+ retrieves sequences from NCBI listed in a NetBLAST+ output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch+ is an RSF file.

FastA+ does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA+ may be more sensitive than BLAST+.

TFastA+ does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA+ translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"

FastX does a Pearson and Lipman search for similarity between a nucleotide query sequence and a group of protein sequences, taking frameshifts into account. FastX translates both strands of the nucleic sequence before performing the comparison. It is designed to answer the question, "What implied protein sequences in my nucleic acid sequence are similar to sequences in a protein database?"

TFastX does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA+, and like TFastA+, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"

SSearch does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST+ and FastA+, it can be very slow.

ProfileSearch uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake. MotifSearch uses a set of profiles (representing similarities within a family of sequences) as a query to either a) search a database for new sequences similar to the original family, or b) annotate the members of the the original family with details of the matches between the profiles and each of the members. Normally, the profiles are created with the program MEME+.

FrameSearch searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program finds an optimal alignment between the protein sequence and all possible codons on each strand of the nucleotide sequence. Optimal alignments may include reading frame shifts.

FindPatterns, StringSearch, and Names are other sequence identification programs.

BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.

NetFetch retrieves sequences from NCBI listed in a NetBLAST output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch is an RSF file.

FastA does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA may be more sensitive than BLAST.

TFastA does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"

FastX+ does a Pearson and Lipman search for similarity between a nucleotide query sequence and a group of protein sequences, taking frameshifts into account. FastX translates both strands of the nucleic sequence before performing the comparison. It is designed to answer the question, "What implied protein sequences in my nucleic acid sequence are similar to sequences in a protein database?"

TFastX+ does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA, and like TFastA, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"

SSearch+ does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it can be very slow.

 

RESTRICTIONS

[ Previous | Top | Next ]

Searching remote databases opens up the possibility of unauthorized access to your query sequence. You should not use confidential query sequences for remote searches.

NetBLAST+ does not accept a conventional GCG sequence specification for the search set. You can search only databases known to NetBLAST+, maintained at NCBI, and each only in its entirety. You cannot restrict the range of the sequence used as the query.

You cannot select a list size or expectation threshold of more than 1,000. The output file can be quite large. If you run out of disk space you may have to delete one or more files before you can continue working.

Because of the way NetBLAST+ must estimate certain statistical parameters (see the ALGORITHM topic in this document), the number of scoring matrices available for use with NetBLAST+ is limited. Currently, valid choices for the -matrix parameter are BLOSUM62 (the default), BLOSUM45, BLOSUM80, PAM30, and PAM70.

Gap creation and gap extension penalties are supported in limited combinations depending upon which scoring matrix is in use. Specifying invalid gap penalties will cause the server to run the search using the default scoring matrix, BLOSUM62, with its default gap creation and extension penalties (11 and 1, respectively). The following table shows the allowed combinations for amino acids. The first values listed are the defaults for each scoring matrix.

 Scoring Matrix    Gap Opening Penalty    Gap Extension Penalty 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
   BLOSUM62                 11                       1 


                             7                       2


                             8                       2


                             9                       2


                            10                       1


                            12                       1
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
   BLOSUM80                 10                       1 


                             6                       2


                             7                       2


                             8                       2


                             9                       1


                            11                       1
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
   BLOSUM45                 14                       2  


                            10                       3


                            11                       3


                            12                       3


                            13                       3


                            12                       2


                            13                       2


                            15                       2


                            16                       1


                            17                       1


                            18                       1


                            19                       1
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
    PAM30                    9                       1 


                             3                       3


                             4                       3


                             5                       3


                             5                       2


                             6                       2


                             7                       2


                             8                       1


                            10                       1
 
 
 
 
 
 
 
 
 
 
 
 
 
 


 
 
 
 
 
 
 
 
 
 
 
 
 
 
    PAM70                   10                       1 


                             4                       3


                             5                       3


                             6                       3


                             6                       2


                             7                       2


                             8                       2


                             9                       1


                            11                       1


BLASTN accepts positive integers for gap penalties, with default values of 5 and 2 for Gap Opening and Extension, respectively. If you specify an unacceptable value for either of these penalties, the corresponding default value is used.

Gapped alignments are not an option when running TBLASTX.

CHOOSING SEARCH SETS

[ Previous | Top | Next ]

To name a searchable database interactively, choose the number of the database of interest from the menu. On the command line, use a parameter like -infile2=nr to choose the name of the database you want to search.

Unfortunately, since some of the database names (for example nr) appear in the menu as both nucleotide and protein databases, NetBLAST+ cannot be sure which you mean if you use parameter like -infile2=nr on the command line. Therefore, if the database you want to search cannot be named unambiguously with the -infile2 parameter, add either -dbnucleotideonly or -dbproteinonly to the command line.

ALGORITHM

[ Previous | Top | Next ]

NetBLAST+ is a client for an implementation of gapped BLAST+ (Altschul et al., Nucleic Acids Research 25; 3389-3402 (1997)), a heuristic algorithm for searching protein and nucleic acid databases for similarities to query sequences.

The above example demonstrates BLASTP, which searches for similarities between protein queries and protein databases, as a prototype for BLAST+. However, the ideas are immediately applicable to comparisons involving conceptual translations of query sequences and databases, and extend to similarity searches between nucleic acid sequences as well.

BLAST+ compares a query sequence with a database sequence by first locating two non-overlapping sequence segments in common within a certain distance of each other, and then attempts to extend these putative "hits" into locally optimal alignments between the sequences being compared. A more detailed description is provided below.

PRELIMINARIES

[ Previous | Top | Next ]

NetBLAST+ uses a substitution matrix (such as the BLOSUM or PAM matrices) to assign a score to the alignment of any pair of amino acids. An aggregate score for an alignment segment can be computed by summing the scores of each amino acid pair in that segment. When given two sequences to compare, the original (ungapped) BLAST+ algorithm searches for arbitrary but equal length segments within each sequence that have a maximal aggregate score which meets or exceeds some threshold or cutoff score. NetBLAST+ looks for locally optimal alignments between the two sequences whose scores cannot be improved either by extending or trimming. Such locally optimal alignments are called "high-scoring segment pairs," or HSPs.

If you assume a simple protein model in which amino acids occur randomly at all positions and in proportion to the frequencies at which they are found within the database and query sequences, then we can compute a normalized score (expressed in units called bits) from the nominal score of an HSP. Such normalized scores allow direct statistical comparison of results regardless of the scoring system used (see "Generating Gapped Extensions" for a caveat to this). Furthermore, the normalized score can be used to compute an expect value, or E-value, which is the number of distinct HSPs having at least that normalized score expected to occur by chance. This theory has not been proved for gapped local alignments and their associated scores, but there are indications that it remains valid (Altschul et al., 1997).

TURNING HITS INTO HSPs

[ Previous | Top | Next ]

The central idea of the BLAST+ algorithm is that any statistically significant alignment between two sequences is likely to contain a high-scoring pair of aligned words. A word is simply a sequence segment of specified length (usually 3 for protein sequences). BLAST+ begins its comparison of a query sequence to a database by scanning the database for words that score at least the threshold score T when aligned with some word within the query sequence. Any word pair satisfying this condition is called a hit. The diagonal of a hit involving words starting at positions (x, y) of the database and query sequences is defined as x-y. The distance between two hits on the same diagonal is defined as the difference between their first coordinates.

Once a hit is found, BLAST+ determines whether the hit lies within an alignment having an aggregate score high enough to be reported. It does this by extending the hit in both directions until the running alignment's score has dropped more than some quantity X below the maximum score yet attained. This extension step is quite costly, taking upwards of 90% of BLAST+'s execution time under most circumstances.

In order to reduce the number of extensions it has to perform, BLAST+ takes advantage of the fact that an interesting HSP is typically much longer than a single hit. In fact, it is likely to contain multiple hits on the same diagonal within a relatively short distance of one another. Therefore, BLAST+ chooses a length A and invokes an ungapped extension if and only if two non-overlapping hits are found on the same diagonal within distance A of one another. (Any hit that overlaps the most recent one is ignored.)

GENERATING GAPPED EXTENSIONS

[ Previous | Top | Next ]

Gapped extensions allow BLAST+ to maintain its sensitivity while tolerating a much higher chance of missing any single moderately scoring HSP. However, gapped extensions take about 500 times longer to execute than ungapped extensions. Therefore, BLAST+ triggers a gapped extension for an HSP only when its score exceeds a moderate score (Sg) specifically chosen so that no more than about one gapped extension is invoked per 50 database sequences.

To generate the gapped local alignment, BLAST+ uses a standard dynamic programming algorithm for pairwise sequence alignment which traverses the cells of a path graph, the dimensions of which are the lengths of the two sequences being compared, performing a fixed amount of computation per each cell. Starting from a single aligned pair of residues, called the seed, the dynamic programming proceeds both forward and backward through the path graph considering only those cells for which the optimal local alignment score falls no more than X below the best alignment score yet found. (This description is a generalization of BLAST+'s method for constructing HSPs.) The region of the path graph explored adapts to the alignment being produced.

The seed for the dynamic programming is the central residue pair of the length-11 segment of the HSP having the highest alignment score. If the HSP itself is shorter than 11 residues in length, its central pair of residues is chosen.

The resulting gapped alignment is reported only if it has an E-value low enough to be of interest. For any alignment actually reported, BLAST+ performs a gapped extension that records "traceback" information (Sankoff and Kruskal, 1983) using a substantially larger X parameter than that employed during the search stage to increase the accuracy of the alignment.

Because BLAST+ produces gapped alignments only for those few database sequences likely to be related to the query, it cannot estimate the parameters necessary to compute normalized scores on the fly. Instead, BLAST+ must rely on estimates of these parameters generated beforehand by random simulation. For this reason, BLAST+ cannot use a scoring system for which no simulation has been performed and still produce accurate estimates of statistical significance.

CONSIDERATIONS

[ Previous | Top | Next ]

Bit Scores and the Size of the Search

Altschul has shown that for sequences that have diverged by a certain amount, there is an informativeness (or ability to discriminate between chance scores and significant scores) associated with each residue pair in the segment pair. This informativeness is the amount of information obtainable from each residue pair in a real alignment that can be used to distinguish the real alignment from a random one. This informativeness can be expressed in bits. The sum of the information available from each residue pair in a segment is the segment pair's score in bits. Such scores are intuitively understandable as the significance of a segment pair score. To express such scores as a fraction you would divide 1 by 2 to the number of bits in the score. For example, if a segment pair has a bit-score of 16, then the appropriate fraction (1/2(16)=1/65,536) would suggest that you should see a score this high by chance about once for every 65,000 independent segment pairs you examine.

For nucleotide sequences that have not diverged, there should be an informativeness of about 2 bits per nucleotide pair. For protein sequences that have not diverged, the informativeness should be slightly over 4 bits per amino acid pair. (The informativeness per pair goes down as the sequences diverge and a segment pair scores is maximally informative only when a scoring matrix appropriate to the extent of divergence between the sequences is used to calculate the score.)

The bit scores are absolute, but the expectation of finding any particular score depends on the size of the search space. The number of places where a segment pair might originate is proportional to the product of the length of the query times the sum of the lengths of all the sequences searched. This product is multiplied by a coefficient K to get the size of the search space. When searching protein databases with protein queries, K is approximately 0.13.

For a query sequence of length 300 searching a database of 12 million residues, the size of the search space would be 300 x 12,000,000 x 0.13 or 468,000,000. For a search this size, a score that only occurs once in every 65,000 potential segment pairs (that is, with a bit score of 16) would be expected to occur about 7,200 times by chance alone.

If the database being searched is highly redundant (as it might be if it contained several hundred homologous cytochromes), then size of the search space calculated by these methods will overestimate the size of the real search space.

Using NetBLAST+ for Nucleotide Searches

By default NetBLAST+ ignores HSPs that do not contain a perfect match of at least 11 nucleotides (22 bits). This is stringent enough that many obviously significant relationships are not found.

The detection of distant relationships between proteins is easier than between nucleotide sequences, even if the nucleotide sequences have to be translated in all six frames to make the amino acid comparison. To give a rough magnitude to this generalization, it is possible to detect similarities in proteins that have diverged by 250 substitutions per 100 residues (250 PAM units) while nucleotide similarities become obscure at distances much greater than 50 substitutions per 100 nucleotides (50 DNA PAM units). Nonetheless, when the nucleotide sequences being compared do not code for proteins, you have no alternative but to search at the nucleotide level.

SUGGESTIONS

[ Previous | Top | Next ]

List Size Limit

A list size that is too small to display all the significant hits is a common problem. Both the screen and the output file will print a warning showing the number of significant hits that are not shown in the list. To see the unlisted hits you must run the search again with the list size limit set high enough to include everything significant. The output can get very large if you set the list size to 1,000. It cannot be set to more than 1,000. If you cannot display everything of interest with a list size limit of 1,000, see the topic FILTERING OUT LOW COMPLEXITY SEQUENCES.

Segment Pair Alignment Limit

NetBLAST+ displays alignments of segment pairs from the top 100 sequences in the list. You can adjust this limit with -alignments.

Batch Queue

Using NetBLAST+ to search a large local database can take a long time. You may want to run searches in the batch queue. You can specify that this program run at a later time in the batch queue by using -batch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Section 3, Using Programs in the User's Guide.

Search Orientation

NetBLAST+ searches both strands of nucleic acid query sequences and database entries. It is no longer possible to limit the search to only one strand of the query or database sequences.

Relationship to FastA+

For protein database searches, BLAST+ and FastA+ have similar sensitivity, although the different algorithms employed make it possible, at least in principle, for FastA+ to find things that BLAST+ misses and vice versa. For nucleotide database searches with nucleotide query sequences, FastA+ may be more sensitive, since by default BLAST+ ignores segment pairs that do not contain a perfect match of at least 11 adjacent nucleotides (22 bits). This default misses many obviously significant relationships. If you are looking for nucleotide sequence homologs that do not code for proteins (that is, if your search cannot be done at the amino acid level), we suggest you use the FastA+ program instead of BLAST+.

FILTERING OUT LOW COMPLEXITY SEQUENCES

[ Previous | Top | Next ]

NetBLAST+ filters out regions of low complexity from query sequences by default. You can turn filtering off by using the -nofilter parameter. Searches against a nucleotide database with nucleotide queries (blastn) employ the DUST filter program (Hancock and Armstrong, Comput. Appl. Biosci. 10: 67-70 (1994); Tatusov and Lipman, unpublished). All other searches employ the SEG filter program (Wootton and Federhen, Computers in Chemistry 17: 149-163 (1993); Wootton and Federhen, Methods in Enzymology 266: 554-571 (1996)). For a general discussion of the role of filtering in search strategies, see Altschul et al., Nature Genetics 6: 119-129 (1994).

Short repeats and low complexity sequences, such as glutamine-rich regions, confound most database searching methods. For NetBLAST+, the random model against which the significance of segment pair scores is evaluated assumes that at each position, each residue has a probability of occurring which is proportional to its composition in the database as a whole. Low complexity or highly repetitive sequences are inconsistent with this assumption.

Low complexity sequence found by the filter program is substituted using the letter N in nucleotide sequence and the letter X in amino acid sequence. Here is an example of a sequence aligned to a filtered copy of itself to show which parts are filtered out:

 
 
  1 MAAKIFCLIMXXXXXXXXXXXXIFPQCSQAPIASLLPPYLSPAMSSVCENPILLPYRIQQ 60
  1 MAAKIFCLIMLLGLSASAATASIFPQCSQAPIASLLPPYLSPAMSSVCENPILLPYRIQQ 60
 
 61 AIAAGIXXXXXXXXXXXXXXXXXXXXXXXXXXNIRXXXXXXXXXXXXXXYSQQQQFLPFN 120
 61 AIAAGILPLSPLFLQQSSALLQQLPLVHLLAQNIRAQQLQQLVLANLAAYSQQQQFLPFN 120
 
121 QXXXXXXXXXXXXXXXXPFSQLAAAYPRQFLPFNQLAALNSHAYVXXXXXXPFSQLAAVS 180
121 QLAALNSAAYLQQQQLLPFSQLAAAYPRQFLPFNQLAALNSHAYVQQQQLLPFSQLAAVS 180
 
181 PAAFLTQQQLLPFYLHTAPNVGTXXXXXXXXXXXXXXXTNPAAFYQQPIIGGALF 235
181 PAAFLTQQQLLPFYLHTAPNVGTLLQLQQLLPFDQLALTNPAAFYQQPIIGGALF 235

AMINO ACID SCORING

[ Previous | Top | Next ]

NetBLAST+ normally uses the BLOSUM62 scoring matrix from Henikoff and Henikoff (Proc. Natl. Acad. Sci. USA 89; 10915-10919 (1992)) whenever the sequences being compared are proteins (including cases where nucleotide databases or query sequences are translated into protein sequences before comparison). You can specify alternate scoring matrices BLOSUM45, BLOSUM80, PAM70 or PAM30 with -matrix, for example -matrix=PAM30.

The seminal paper on this subject is Stephen Altschul's "Amino acid substitution matrices from an information theoretic perspective" (J. Mol. Biol. 219; 555-565 (1991)). If you are new to this literature, an easier place to start reading might be Altschul et al., "Issues in searching molecular sequence databases" (Nature Genetics, 6; 119-129 (1994)).

NUCLEOTIDE SCORING

[ Previous | Top | Next ]

There is no external scoring matrix for nucleotide-nucleotide searches (that is, searches where both the query and the database are nucleotide sequences and where you have not used -TBLASTX.

ALTERNATIVE GENETIC CODES

[ Previous | Top | Next ]

NetBLAST+ normally uses the standard genetic code if either the query or the database sequences requires translation. If your query comes from a system where this genetic code is inappropriate, you can select any of these alternative codes by the numbers given in the following table:

 
 
         1 Standard or Universal
     2 Vertebrate Mitochondrial
     3 Yeast Mitochondrial
     4 Mold, Protozoan, Coelenterate Mitochondrial and Mycoplasma/Spiroplasma
     5 Invertebrate Mitochondrial
     6 Ciliate Macronuclear
     7 [Do not use this index]
     8 [Do not use this index]
     9 Echinodermate Mitochondrial
    10 Alternate Ciliate (Euplotid) Macronuclear
    11 Bacterial
    12 Alternate Yeast Nuclear
    13 Ascidian Mitochondrial
    14 Flatworm Mitochondrial
    15 Alternate Ciliate (Blepharisma) Nuclear
    16 Chlorophycean Mitochondrial
    21 Trematode Mitochondrial

You cannot specify an alternate genetic code for the database sequences.

The numbering for each of these codes has changed from version 1.3 of BLAST+ (used in release 8.0 of GCG) and you can no longer use the numbers zero, seven, and eight!

NETWORK CONSIDERATIONS

[ Previous | Top | Next ]

There are a number of possible problems with client/server applications running over the Internet. You should try to find out if you are being charged for network communications and you should certainly worry about the security and integrity of your sequences. There is always a possibility that a server will become overloaded and that your search will take much longer than normal or that your output will be lost altogether because of a network or server computer glitch. If you are working in Europe there may be services available through EMBnet that are more appropriate or robust than the NCBI BLAST+ server.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

BLAST searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.
 
 
Minimal Syntax: % blast+ [-infile=]value [-infile2=]value [-outfile=]value -Default
 
 
Minimal Parameters (case-insensitive):
 
-infile         [Type: List / Default: EMPTY / Aliases: infile1 in]
                Input file specification.
 
Prompted Parameters (case-insensitive):
 
-begin          [Type: Integer / Default: '1' / Aliases: beg]
                First base of interest in each query sequence.
 
-end            [Type: Integer / Default: '-1']
                Last base of interest in each query sequence.
 
-infile2        [Type: List / Default: EMPTY / Aliases: in2 db]
                Specifies database to search.
 
-expect         [Type: Double / Default: '10' / Aliases: exp]
                Ignores scores that would occur by chance more than n times.
 
-listsize       [Type: Integer / Default: '500' / Aliases: lis list]
                Sets maximum number of sequences listed in the output.
 
-outfile        [Type: OutFile / Default: '<sequence_name>.blast+' / Aliases:out outfile1]Names the output file. '-' for stdout.
 
Optional Parameters (case-insensitive):
 
-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.
 
-default        [Type: Boolean / Default: 'false' / Aliases: d def]
                Specifies that sensible default values be used for all parameters where possible.
 
-documentation  [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup.
 
-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
                Tells application to print only a minimal amount of information.
 
-doclines       [Type: Integer / Default: EMPTY / Aliases: docl]
                Specifies number of documentation lines to copy.
 
-config         [Type: String / Default: '$GCGROOT/etc/blast/blastall.conf']
                Blast configuration file for the plugin.
 
-defaultnucdb   [Type: String / Default: 'genbank']
                Default nucleic database.
 
-defaultprotdb  [Type: String / Default: 'uniprot']
                Default protein database.
 
-format         [Type: String / Default: EMPTY / Aliases: fmt]
                Output format. Valid values are:
                list: Sequence list file of hits
                native: Native BLAST report
                xml: BLAST XML
 
-xml            [Type: Boolean / Default: 'false']
                Output BLAST XML format (same as -format=xml).
 
-native         [Type: Boolean / Default: 'false']
                Output native BLAST report format (same as -format=native)
 
-plugin         [Type: String / Default: 'libBlastAll.so']
                Blast plugin.
 
-algorithm      [Type: String / Default: EMPTY / Aliases: alg program prog] Blast algorithm
 
-tblastx        [Type: Boolean / Default: 'false']
                If query and database are both nucleotide, translates both and does protein comparisons.
 
-dbreport       [Type: Boolean / Default: 'false' / Aliases: dbr dbs listdb listdbs dblist] Lists valid databases then exits.
 
-dbnucleotideonly  [Type: Boolean / Default: 'false' / Aliases: dbn]
                Searches only nucleic databases.
 
-dbproteinonly  [Type: Boolean / Default: 'false' / Aliases: dbp]
                Searches only protein databases.
 
-append         [Type: List / Default: EMPTY]
                Appends string to pass-through command line.
 
-alignments     [Type: Integer / Default: '250' / Aliases: ali align]
                Sets number of sequences for which to show alignments.
 
-chunksize      [Type: Integer / Default: '50' / Aliases: chunk]
                Sets number of sequences to submit in parallel, large values may run out of memory.
 
-wordsize       [Type: Integer / Default: '0' / Aliases: word]
                Sets word size (0 for default)
 
-match          [Type: Integer / Default: '1' / Aliases: mat]
                Sets nucleotide match reward.
 
-mismatch       [Type: Integer / Default: '-3' / Aliases: mis]
                Sets nucleotide mismatch penalty.
 
-matrix         [Type: String / Default: 'BLOSUM62' / Aliases: matr]
                Assigns the scoring matrix for proteins.
 
-gapweight      [Type: Integer / Default: '0' / Aliases: gap]
                Sets gap creation penalty (0 for default).
 
-lengthweight   [Type: Integer / Default: '0' / Aliases: len]
                Sets gap extension penaly (0 for default).
 
-hitextthreshold [Type: Integer / Default: '0' / Aliases: hitextthresh hitext] Sets mimimum score to extend hits (0 for default)
 
-filter         [Type: Boolean / Default: 'true' / Aliases: fil]
                Enables filtering of low complexity segments out of query sequences.
 
-translate      [Type: Integer / Default: '1' / Aliases: trans]
                Names genetic code for translating query.
 
-dbtranslate    [Type: Integer / Default: '1' / Aliases: dbtrans]
                Names genetic code for translating database.
 
-effdbsize      [Type: Integer / Default: '0' / Aliases: eff]
                Sets effective database size (0 for real size)
 
-gaps           [Type: Boolean / Default: 'true']
                Enables gapped alignments.
 
-xdropoff       [Type: Integer / Default: '0' / Aliases: xdr]
                Sets X dropoff value for gapped alignments (0 for default)
 
-lowercasemask  [Type: Boolean / Default: 'false' / Aliases: low lower]
                Filters lower case characters in query sequence.
 
-hitwindow      [Type: Integer / Default: '40' / Aliases: hitw]
                Sets multiple hits window size (0 for single hit algorithm)
 
-besthits       [Type: Integer / Default: '0' / Aliases: bes best]
                Sets number of best hits to keep from a region (off by default, if used a value of 100 is recommended)
 
-megablast      [Type: Boolean / Default: 'false' / Aliases: mega]
                Uses MegaBLAST algorithm for search.
 
-processors     [Type: Integer / Default: '1' / Aliases: proc]
                Sets the number of processors to use.
 
-batch          [Type: Boolean / Default: 'false']
                Allows to submit a job to a batch queue.
 
 

CITING BLAST

[ Previous | Top | Next ]

The original paper describing BLAST+ is Altschul, Stephen F., Gish, Warren, Miller, Webb, Myers, Eugene W., and Lipman, David J. (1990). Basic local alignment search tool. J. Mol. Biol. 215; 403-410. Gapped BLAST+ is described in Altschul, Stephen F., Madden, Thomas L., Schaffer, Alejandro A., Zhang, Jinghui, Zhang, Zheng, Miller, Webb, and Lipman, David J. (1997). Gapped BLAST+ and PSI-BLAST+: a new generation of protein database search programs. Nucleic Acids Res. 25(17);3389-3402.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -data1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

           NetBLAST+ reads the file SHARE_ENERGY:NetBLAST+.rdbs, which lists the available remote databases in the menu.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

Following some of the optional parameters described below is a letter or short expression in parentheses. These are the names of the corresponding parameters at the bottom of your BLAST+ output.

-infile, -infile1, -in

 

            Input file specification.
 

-begin, -beg

 

            First base of interest in each query sequence.
 

-end

 

          Last base of interest in each query sequence.

 

-infile2, -in2, -db

 

            Specifies database to search.
 

-expect=10.0, -exp

This parameter lets you increase the number of hits in your output with scores that would be expected to have occurred by chance alone. Your setting for the number of random hits you will tolerate is used to calculate a cutoff score (S) below which MSPs or multiple HSPs (S2) are ignored. BLAST+ calculates this cutoff score for you. There is nothing to prevent many biologically significant, but statistically insignificant segment pairs from being screened out, so you may sometimes want to increase this parameter in order to have an opportunity see them.

-check, -che, -help

 

Prints out this usage message.

 

-default, -def, -d

 

Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

Prints banner at program startup.

 

-quiet, -qui

 

This parameter is not supported.

 

-doclines, -doc

 

                       Specifies number of documentation lines to copy.
 

-config

 
                     Blast configuration file for the plugin.
 

-defaultnucdb

                
                      Default nucleic database.
 

-defaultprotdb

               
            Default protein database.

 

-format

 
                      Output format. Valid values are:
                     list: Sequence list file of hits
                     native: Native BLAST+ report
                     xml: BLAST+ XML

 

-xml

      
                      Output BLAST+ XML format (same as -format=xml)

 

-native

      
                      Output native BLAST+ report format (same as -format=native)

 

-plugin

 

                     Blast plugin.
 

-algorithm, -alg, -program, -prog

 

          Blast algorithm

 

-dbreport, -dbr

 

Lists valid databases then exits.

 

-chunksize, -chunk

 

                      Sets number of sequences to submit in parallel, large values may run out of memory.

 

-wordsize, -word

 

Sets word size (0 for default)

 

-match, -mat

 

Sets nucleotide match reward.

 

-mismatch, -mis

          
            Sets nucleotide mismatch penalty.

 

-hitextthreshold, -hitext

 

Sets minimum score to extend hits (0 for default).

 

-filter, -fil

 

                   Enables filtering of low complexity segments out of query sequences.

 

-dbtranslate, -dbtrans

 

Names genetic code for translating database.

 

-effdbsize, -eff

 

Sets effective database size (0 for real size)

 

-gaps

 

Enables gapped alignments.

 

-xdropoff, -xdr

 

                    Sets X dropoff value for gapped alignments (0 for default)

 

-lowercasemask, -low

 

Filters lower case characters in query sequence.

 

-besthits, -best

 

Sets number of best hits to keep from a region (off by default, if used a value of 100 is recommended)

 

-megablast, -mega

 

Uses MegaBLAST algorithm for search.

 

 

-processor, -proc

 

Sets the number of processors to use.

 

-listsize=250, -lis

Limits the number of short descriptions of matching sequences reported to a any number in the range 0 through 1000, even if more sequences had scores above the expectation cutoff. The default is default list size 250.

-alignments=100, -ali

Limits the number of pairwise alignments to to any number in the range 0 through 1000, even if more sequences had scores above the expectation cutoff. The default value is 100.

The listsize and alignments limits are are often too small to display everything significant. The number of sequences that would have been returned if unlimited is listed under the Run Parameters heading in the output under "Number of sequences better than." This truncation of your output is by intention, since the output from BLAST+ is usually very large and the first level of inference from most searches can be made from the most significant hits.

-matrix=BLOSUM62, -mat

Sets the amino acid substitution matrix to use. BLAST+ normally uses the BLOSUM62 amino acid substitution matrix from Henikoff and Henikoff for protein sequence comparisons (including all cases where nucleotide database or query sequences are translated before comparison). The other available options are are BLOSUM45, BLOSUM80, PAM30, and PAM70. For more information, see the topic AMINO ACID SCORING.

-gapweight=11, -gap

Sets the penalty for adding a gap to the alignment. See the RESTRICTIONS topic for more information about setting the gap opening penalty.

-lengthweight=1, -len

Sets the penalty for lengthening an existing gap in the alignment. See the RESTRICTIONS topic for more information about setting the gap extension penalty.

-tblastx

You can use this parameter when searching a nucleotide sequence database with a nucleotide query sequence. BLAST+ will then translate the query and every sequence in the database and examine all pairwise combinations to find similarities at the amino acid level.

Because such doubly translated searches require a lot of computing, NCBI currently prohibits TBLASTX searches of the nr database.

Gapped alignments are not available with TBLASTX searches.

-translate=1, -trans

When BLAST+ must translate a nucleotide query sequence, it uses the standard ("universal") genetic code. If your query comes from a system where this is inappropriate, you can select any of the alternative codes listed under the topic ALTERNATIVE GENETIC CODES.

-dbnucleotideonly, -dbn

Confines the menu to search sets containing nucleotide sequences.

-dbproteinonly, -dbp

Confines the menu to search sets containing protein sequences.

-hitwindow, -hitw

            
            Sets multiple hits window size (0 for single hit algorithm)

 

-batch, -bat

Submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

-proxy="gateway.company.com:99/"

Specifies the host and port of a proxy server to use. This parameter causes the request to be sent through the proxy which might be your company's firewall. Not all firewalls require proxy settings; therefore, you should check with your network or system administrator before using this option. The complete URL for NCBI is passed in the GET or POST request. The syntax of the proxy specification is, hostname:portnumber. If the ":portnumber" is omitted, port 80 is assumed.

-url=www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast_report

By default NetBLAST+ makes use of the HTTP server maintained by NCBI at the URL given above. Should there ever be a reason to change this URL, you may do so by specifying -URL.

-mail=blast@ncbi.nlm.nih.gov

You can send your query to NCBI's e-mail server instead by including this qualifier. The HTTP and email servers produce the same result; they differ only in the network mechanism with which they communicate.

The e-mail server may occasionally get backed up with requests. When this happens, repeat submissions of the same request only compound the problem. So that access to the computationally intensive BLASTX, TBLASTN and TBLASTX programs can be continue to be provided, please refrain from re-submitting all searches -- not just BLASTX, TBLASTN, or TBLASTX requests -- within 24 hours of the first submission, when a response has not been received. A 24-hour wait should allow sufficient time for the queues to be clear of other requests or for any underlying problem

-append="string1;string2...", -app

GCG implementation of NetBLAST+ is what is known as a shell program. After collecting your input parameters, the shell calls the NCBI BLAST+ server and sends it a query in NCBI's email query format. If you are familiar with that format, you can pass additional parameters to NCBI by using this parameter. Use semicolons to separate multiple command lines. Here is an example:

%netblast+ -append="HTML yes;PATH me@my.address.edu

Please call us if there are additional parameters you want to use with BLAST+ that you would like to look more like native GCG parameters.

You can read the current version of the general BLAST+ documentation on the World Wide Web at http://www.ncbi.nlm.nih.gov/BLAST/blast_help.html. Additional documentation may be obtained by sending e-mail to blast@ncbi.nlm.nih.gov. The body of the message should just the single word "HELP".

Printed: May 27, 2005  13:50


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio