NetFetch+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

CONSIDERATIONS

NETWORK CONSIDERATIONS

COMMAND-LINE SUMMARY

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

NetFetch+ retrieves sequences from NCBI listed in a NetBlast+ output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch+ is an RSF file.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

NetFetch+ is an interface to the NetEntrez service provided by NCBI's web server at www.ncbi.nlm.nih.gov. It uses this server to perform remote retrievals. NetFetch+ reads the NetBlast+ output file, queries the NCBI web service, and returns the sequences in an RSF output file. You can also retrieve individual sequences with NetFetch+.

NetFetch+ can retrieve sequences only from the databases maintained at NCBI. Sometimes these databases and the databases searched with NetBlast+ differ, resulting in the total or partial failure of some requests. Remote searches require almost no resources from your own computer.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using NetFetch+ to retrieve sequences listed in a NetBlast+ output file:

If the Query Input contains sequences from other than NCBI Databases, for example: Uniprot as shown below
 
Then Netfetch+ returns a message: “No GIs found for query sequence”
 
12:04~252> netfetch+ 17KD_RICAM.blast+
 
NetFetch+ retrieves sequences from NCBI listed in a NetBLAST+ output file. You can also use it to retrieve sequences individually by sequence name or accession number.
 
 
 
No GIs found for NCBI protein query 'UNI_SPROT:17KD_RICAM'
No GIs found for NCBI protein query 'UNI_TREMBL:AAR17678'
 
If the Query Input contains sequences that can be retrieved from NCBI Databases, for example: GenBank as shown below
 
                                                                 Score    E
Sequences producing significant alignments:                      (bits) Value
 
emb|X92704.1|A16STM210 Uncultured Actinomycetales bacterium 16S ...  1189   0.0
emb|X92697.1|A16STM81 Uncultured Actinomycetales bacterium 16S r...   985   0.0
emb|X92709.1|A16STM232 Uncultured Actinomycetales bacterium 16S ...   952   0.0
emb|X92703.1|A16STM208 Uncultured Actinomycetales bacterium 16S ...   936   0.0
emb|X92706.1|A16STM214 Uncultured Actinomycetales bacterium 16S ...   922   0.0
 
then Netfetch+ retrieves all the sequences (shown above) in RSF format

OUTPUT

[ Previous | Top | Next ]

A part of the retrieved sequence A16STM210.RSF  is shown below:

 !!RICH_SEQUENCE 1.0
..
{
name  A16STM210
descrip    Uncultured Actinomycetales bacterium 16S ribosomal RNA (clone TM210).
type    DNA
longname  ncbi:a16stm210
checksum    2782
creation-date 12/09/2004 12:02:25
strand  1
feature 1 1348 4 square solid source
     source          1. .1348
                     /organism="uncultured Actinomycetales bacterium"
                     /mol_type="genomic DNA"
                     /isolation_source="peat bog"
                     /db_xref="taxon:239730"
                     /clone="TM210"
                     /environmental_sample
                     /country="Germany"
feature 1 1348 4 square solid gene
     gene            1. .1348
                     /gene="16S rRNA"
feature 1 1348 4 square solid rRNA
     rRNA            <1. .>1348
                     /gene="16S rRNA"
                     /product="16S ribosomal RNA"
sequence
  CGCTGGCGGCGTGCCTAACACATGCAAGTCGAACGAGATTCAGTCGGTAGCAATACCGAC
  GAAGATCTAGTGGTGAACGGGTGAGTAGCACGTGAGCAACCTGCCCCGAAGACCGGGACA
  ACACCGGGAAACCGGTGCTAATACCGGATACCCCCATCAGATCGCATGGTTTGATGAGGA
  AATGGATTCCGCTTCGGGAGGGGCTCGCGGCCTATCAGCTAGTTGGTGAGGTAACGGCTC
  ACCAAGGCATCGACGGGTAGCTGGTCTGAGAGGACGATCAGCCACACTGGGACTGAGACA
  CGGCCCAGACTCCTACGGGAGGCAGCAGTGGGGAATCTTGCACAATGGGCGAAAGCCTGA
  TGCAGCAACGCCGCGTGAGGGACGAAGGCTTTCTGAGTTGTAAACCTCTTACAGCAGGGA
  CGATTATGACGGTACCTGCAGAAGAAGCCCCGGCTAACTACGTGCCAGCAGCCGCGGTGA
  TACGTAGGGGGCGAGCGTTGTCCGGATTCATTGGGCGTAAAGAGCTCGTAGGCGGTTTGG
  TAAGTCGGATGTGAAAGCCCCAGGCTTAACCTGGAGATGCCACTCGATACTGCCATGGCT
  AGAGTCCGGTAGGGGACCACGGAATTCCTGGTGTAGCGGTGAAATGCGCAGATATCAGGA
  GGAACACCAGTAGCGAAGGCGGTGGTCTGGGCCGGTACTGACGCTGAGGAGCGAAAGCGT
  GGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGTTGGACACTAGGTG
  TGGGGACCTATCGACGGTTTCCGTGCCGCAGCTAACGCATTAAGTGCCCCGCCTGGGGAG
  TACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGCGGAGCAT
  GTGGCTCAATTCGACGCAACGCGAAGAACCTTACCTGGGCTTGACATGTAGGTTAAGGCG
  TGGAGACACGCTGACCTTCGGGTCCTACACAGGTGGTGCATGGCTGTCGTCAGCTCGTGT
  CGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCCTATGTTGCCAGCGGG
  TAATGCCGGGGACTCGTAGGAAACTGCCGGGGTTAACTCGGAGGAAGGTGGGGATGACGT
  CAAGTCATCATGCCCCTTACGTCCAGGGCTGCACACATGCTACAATGGCCGGTACAAAGG
  GCTGCTATCCCGCGAGGGTGAGCGAATCCCATAAAGCCGGTCTCAGTTCGGATTGCAGTC
  TGCAACTCGACTGCATGAAGTCGGAGTCGCTAGTAATCCCGGATCAGCAACGCCGGGGTG
  AATACGTTCCCGGGCCTTGTACACACCG
}
 

Since NetFetch+ completes successfully if any of the sequences requested are returned, the output file may not contain all of the files that were requested.

INPUT FILES

[ Previous | Top | Next ]

NetFetch+ accepts a NetBlast+ output file or the sequence name or accession number of a sequence. You can specify several sequences by placing a comma between sequence names or accession numbers.

RELATED PROGRAMS

[ Previous | Top | Next ]

NetBlast+ searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBlast+ can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. Fetch copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.

NetBlast searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBlast can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. Fetch copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.

NetFetch retrieves sequences from NCBI listed in a NetBlast output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch is an RSF file.

 

RESTRICTIONS

[ Previous | Top | Next ]

NetFetch+ was designed specifically to search the NetEntrez server at NCBI. It is unlikely that it will work with other similar servers.

Searching remote databases opens up the possibility of unauthorized access to your query sequence. You should not use confidential query sequences for remote searches.

NetFetch+ does not accept a conventional GCG sequence specification for the input. The input file is the NetBlast+ output file not a GCG list file. Sequence specifications must be consistent with those allowed by the NCBI web server.

The NCBI databases searched by NetFetch+ may differ from the databases searched by NetBlast+ so that not all sequence names listed in the NetBlast+ output file can be retrieved by NetFetch+. For example, when this document was written you could search the Alu database with NetBlast+ but that database was not available to the NetEntrez server at NCBI used by NetFetch+.

CONSIDERATIONS

[ Previous | Top | Next ]

Network bandwidth varies greatly from time to time and from site to site. You may want to retrieve sequences when the network is more likely to be quiet. However, be aware that waiting too long to fetch sequences may result in retrieval failures because sequences are sometimes replaced or deleted from the databases.

NetFetch+ retrieves all of the sequences into a single RSF file. Most Accelrys GCG (GCG) programs can read individual sequences directly from the RSF file. If you want to export a single sequence into a GCG single sequence file, use the program Reformat or SeqConv+.

NETWORK CONSIDERATIONS

[ Previous | Top | Next ]

There are a number of possible problems with client/server applications running over the Internet. You should determine if you are charged for network communications, and note that the security and integrity of your sequences is at risk. Also there is the possibility that a server will become overloaded and that your search will take much longer than normal or that your output will be lost altogether because of a network or server computer glitch.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

 NetFetch+ retrieves sequences from NCBI listed in a NetBlast+ output file. You can also use it to retrieve sequences individually by sequence name or accession number.
 
 
Minimal Syntax: % netfetch+ [-infile=]value -Default
 
Minimal Parameters (case-insensitive):
 
-infile         [Type: List / Default: EMPTY / Aliases: infile1 in] 
                Inputs file specification.
 
Optional Parameters (case-insensitive):
 
-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.
 
-default        [Type: Boolean / Default: 'false' / Aliases: d def]
                Specifies that sensible default values be used for all parameters where possible.
 
-documentation  [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup.
 
-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
                Tells application to print only a minimal amount of information.
 
-outfile        [Type: OutFile / Default: EMPTY / Aliases: out]
Destination to which the fetched file/sequence is written. This option only works when fetching a single file or sequence.
 
-outformat      [Type: String / Default: 'RSF' / Aliases: outfmt format]
The desired output format for retrieved sequence data. Value should be one of the following: GB GENPEPT FSA EMBL SPT SW RSF MSF SSF BSML.
 
-top            [Type: Integer / Default: '0']
                Fetch only the top N sequences (0 for all).
 
-monitor        [Type: Boolean / Default: 'false' / Aliases: mon]
                Output progress messages.
 
-all            [Type: Boolean / Default: 'false']
                Fetch all sequences for a query.
 
-url            [Type: String / Default: 'www.ncbi.nlm.nih.gov']
                Sends HTTP request to alternative server.
 
-proxy          [Type: String / Default: EMPTY]
                Uses proxy server to send requests.
 
-proxyport      [Type: Integer / Default: '80' / Aliases: port]
                Uses proxy server on this port to send requests.
 
-type           [Type: String / Default: 's' / Aliases: typ]
                Specifies type of database to search (s = both, n = nucleic,p = protein)

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, in

 

Inputs file specification.

 

-top=10

Limit the retrieval to the top sequences. You specify how many sequences you want to retrieve and NetFetch+ will request no more that that many. It always builds the request list from the sequences at the top of the list. If you specify more sequences than listed in the input file, all of the sequences in the file will be requested. If you specify zero or omit -TOP, all of the sequences in the input file will be requested.

-monitor, -mon

 

Program monitors its progress on your screen by displaying a screen trace of progress. However, when you use -default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

 

-check, -che, -help

 

Prints out this usage message.

 

-default, -def, -d

 

Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

Prints banner at program startup.

 

-quiet, -qui

 

This parameter is not supported.

 

-outfile, -out

 

Destination to which the fetched file/sequence is written. This option only works when fetching a single file or sequence.

 

-outformat, -outfmt, -format

 

The desired output format for retrieved sequence data. Value should be one of the following: GB GENPEPT FSA EMBL SPT SW RSF MSF SSF BSML.

 

-all

 

Fetch all sequences for a query.

 

-proxyport, -port

 

Uses proxy server on this port to send requests.

 

-type, -typ

 

 Specifies type of database to search (s = both, n = nucleic,p = protein)

 

-proxy="gateway.company.com:99/"

 

Specifies the host and port of a proxy server to use. This parameter causes the request to be sent through the proxy which might be your company's firewall. Not all firewalls require proxy settings; therefore, you should check with your network or system administrator before using this option. The complete URL for NCBI is passed in the GET or POST request. The syntax of the proxy specification is, hostname:portnumber. If the ":portnumber" is omitted, port 80 is assumed.

 

-URL="www.blast.ncbi.nlm.nih.gov:80/"

 

Specifies the host, port, and command to use when making the request. You can specify the host only, in which case the default port and command are used. You must specify the host if you need to change the port or the command. Specifying the port is never necessary.

The syntax of the command assumes that a comma-separated list of sequence IDs will be concatenated to it before submission to NCBI. For example, if you specify:

 
        % netfetch+ -URL="www.blast.ncbi.nih.gov/htbin/Entrez/query?db=s&uid=“ drome_gpdh

 

You can read the current version of the NetEntrez documentation on the World Wide Web at http://www.ncbi.nlm.nih.gov/.

 

 

Printed: May 27, 2005  13:51


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio