NetFetch

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

CONSIDERATIONS

NETWORK CONSIDERATIONS

COMMAND-LINE SUMMARY

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

NetFetch retrieves sequences from NCBI listed in a NetBLAST output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch is an RSF file.

DESCRIPTION

[ Previous | Top | Next ]

NetFetch is an interface to the NetEntrez service provided by NCBI's web server at www.ncbi.nlm.nih.gov. It uses this server to perform remote retrievals. NetFetch reads the NetBLAST output file, queries the NCBI web service, and returns the sequences in an RSF output file. You can also retrieve individual sequences with NetFetch.

NetFetch can retrieve sequences only from the databases maintained at NCBI. Sometimes these databases and the databases searched with NetBLAST differ, resulting in the total or partial failure of some requests. Remote searches require almost no resources from your own computer.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using NetFetch to retrieve sequences listed in a NetBLAST output file:

 
 
% netfetch
 
 NETFETCH what NCBI sequence or NetBLAST output file ?  zizm99.blastp
 
 What should I call the RSF output file (* zizm99.rsf *) ?
 
 NETFETCH complete with:
 
      Input: zizm99.blastp
     Output: zizm99.rsf
     Server: www.ncbi.nlm.nih.gov
  Requested: 25
   Returned: 25
 
%
 

OUTPUT

[ Previous | Top | Next ]

Below is part of the output from the example session:

 
 
!!RICH_SEQUENCE 1.0
 
NETFETCH of: zizm99.blastp  August 11, 1998 08:09
 
from server: www.ncbi.nlm.nih.gov
 
 25     Sequences Requested
 25     Sequences Returned
 
Sequences Requested
-----
sp|P04704|ZEA2_MAIZE   sp|P24449|ZEAC_MAIZE
gi|168691      pir||S47265
sp|P06674|ZEA3_MAIZE   gi|16073
sp|P24450|ZEAD_MAIZE   pir||S47266
gi|168693      pir||S07172
sp|P04705|ZEAB_MAIZE   gi|22523
gi|168701      sp|P04701|ZEAL_MAIZE
sp|P02859|ZEA1_MAIZE   prf||1107201C
sp|P06675|ZEA4_MAIZE   sp|P08416|ZEA5_MAIZE
prf||1107201B  sp|P04703|ZEA7_MAIZE
sp|P06676|ZEA8_MAIZE   sp|P06677|ZEA9_MAIZE
pir||S21969    prf||1107201G
sp|P04702|ZEA6_MAIZE
 
 ..
{
name  ZEA2_MAIZE
descrip    ZEIN-ALPHA PRECURSOR (19 KD) (CLONE ZG99).
type    PROTEIN
longname  Zea mays
sequence-ID  141598
checksum    745
creation-date  8/11/1998  8: 9:17
strand  1
comments
  LOCUS       141598        235 aa                              15-JUL-1998
  DEFINITION  ZEIN-ALPHA PRECURSOR (19 KD) (CLONE ZG99).
  ACCESSION   141598
  PID         g141598
 
///////////////////////////////////////////////////////////////////////////////
 

Since NetFetch completes successfully if any of the sequences requested are returned, the output file may not contain all of the files that were requested.

INPUT FILES

[ Previous | Top | Next ]

NetFetch accepts a NetBLAST output file or the sequence name or accession number of a sequence. You can specify several sequences by placing a comma between sequence names or accession numbers.

RELATED PROGRAMS

[ Previous | Top | Next ]

NetBLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. Fetch copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen. NetBLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA. Fetch copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.

NetFetch+ retrieves sequences from NCBI listed in a NetBLAST+ output file. You can also use it to retrieve sequences individually by sequence name or accession number. The output of NetFetch+ is an RSF file.

 

 

RESTRICTIONS

[ Previous | Top | Next ]

NetFetch was designed specifically to search the NetEntrez server at NCBI. It is unlikely that it will work with other similar servers.

Searching remote databases opens up the possibility of unauthorized access to your query sequence. You should not use confidential query sequences for remote searches.

NetFetch does not accept a conventional GCG sequence specification for the input. The input file is the NetBLAST output file not a GCG list file. Sequence specifications must be consistent with those allowed by the NCBI web server.

The NCBI databases searched by NetFetch may differ from the databases searched by NetBLAST so that not all sequence names listed in the NetBLAST output file can be retrieved by NetFetch. For example, when this document was written you could search the Alu database with NetBLAST but that database was not available to the NetEntrez server at NCBI used by NetFetch.

CONSIDERATIONS

[ Previous | Top | Next ]

Network bandwidth varies greatly from time to time and from site to site. You may want to retrieve sequences when the network is more likely to be quiet. However, be aware that waiting too long to fetch sequences may result in retrieval failures because sequences are sometimes replaced or deleted from the databases.

NetFetch retrieves all of the sequences into a single RSF file. Most Accelrys GCG (GCG) programs can read individual sequences directly from the RSF file. If you want to export a single sequence into a GCG single sequence file, use the program Reformat.

NETWORK CONSIDERATIONS

[ Previous | Top | Next ]

There are a number of possible problems with client/server applications running over the Internet. You should determine if you are charged for network communications, and note that the security and integrity of your sequences is at risk. Also there is the possibility that a server will become overloaded and that your search will take much longer than normal or that your output will be lost altogether because of a network or server computer glitch.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % netfetch [-INfile1=]zizm99.blastp  -Default
 
Prompted Parameters:
 
-OUTfile1=name.rsf           specifies the output file name
 
Optional Parameters:
 
-TOP=10                      fetch only the top 10 sequences
-MONitor                     displays screen trace
-NOSUMmary                   suppresses the screen summary
-RAW                           saves the entire server response in a .raw file
-URL='"www.your.url/script="'  sends HTTP query to an alternate URL
                               rather than NCBI's Entrez server
-PROXY=gatekeeper.org        uses proxy server to send request
-TYPe=s                      specify the type of database to search:
                                 s = both
                                 n = nucleotide
                                 p = protein

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-TOP=10

Limit the retrieval to the top sequences. You specify how many sequences you want to retrieve and NetFetch will request no more that that many. It always builds the request list from the sequences at the top of the list. If you specify more sequences than listed in the input file, all of the sequences in the file will be requested. If you specify zero or omit -TOP, all of the sequences in the input file will be requested.

-MONitor

Display's a screen trace of the program's progress. Messages will display indicating the connection status to NCBI, the retrieval, and parsing of the result.

-SUMmary

Writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-RAW

Saves the response as it comes back from NCBI in a .raw file. The file will have the same basename as the RSF file. This file will contain the entire response from NCBI including any error or informational messages.

-PROXY="gateway.company.com:99/"

Specifies the host and port of a proxy server to use. This parameter causes the request to be sent through the proxy which might be your company's firewall. Not all firewalls require proxy settings; therefore, you should check with your network or system administrator before using this option. The complete URL for NCBI is passed in the GET or POST request. The syntax of the proxy specification is, hostname:portnumber. If the ":portnumber" is omitted, port 80 is assumed.

-URL="www.blast.ncbi.nlm.nih.gov:80/"

Specifies the host, port, and command to use when making the request. You can specify the host only, in which case the default port and command are used. You must specify the host if you need to change the port or the command. Specifying the port is never necessary.

The syntax of the command assumes that a comma-separated list of sequence IDs will be concatenated to it before submission to NCBI. For example, if you specify:

 
% netfetch -URL="www.blast.ncbi.nih.gov/htbin/Entrez/query?db=s&uid="  drome_gpdh

The actual request made to NCBI will be equivalent to making the following request from a web browser:

 
 

 

http://www.blast.ncbi.nih.gov/htbin/Entrez/query?db=s&uid=drome_gpdh

You can read the current version of the NetEntrez documentation on the World Wide Web at http://www.ncbi.nlm.nih.gov/.

Printed: May 27, 2005  13:51


[[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio