Glossary

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Accession Number

An unique, identifying number consisting of a letter followed by five digits (for example M13786) that is assigned to each entry in a database. Using accession numbers is the best method for specifying database entries in Accelrys GCG (GCG).

When a sequence is first entered into GenBank, or Uniprot, it is assigned a unique primary accession number. If that sequence is ever merged with another sequence, the accession number of the original sequence becomes a secondary accession number in the merged sequence.

For more information, see the release notes for the individual databases or see "Specifying Database Sequences by Accession Number" in the "Using Database Sequences" section of Section 2, Using Sequences.

 

Alias

A feature of UNIX shells that enables users to define program names (and parameters) and commands with abbreviations. When an alias is used in a command line, the system substitutes the alias definition for the abbreviation (the alias). For example, the following command creates an alias called ls: % alias ls `ls -l`. Whenever you use the ls command, the operating system interprets the command as ls -l, That is, instead of just listing the files in a directory, it lists the files with additional attributes such as owner, date, and protections. For more information, see "Using Aliases" in the "For Advanced Users" section of Section 3, Using Programs.

 

At and Batch

UNIX operating system commands which enable users to submit scripts to the computer for running GCG programs and UNIX commands at a later time. These commands are useful for submitting programs to run at off-peak hours.

The ‘at’ queue is also known as the batch queue or script-execution queue. See Batch.

 

-Batch

A command-line parameter available for some programs in GCG that enables you to run a program as a separate process in a batch queue. The program job may execute immediately as a separate process or may wait in a batch queue to execute at a later time. After you submit a job to batch, your terminal is free for other work. Or, you can log off the computer and your job will still execute. For more information, see "Using the Batch Queue" in Section 3, Using Programs.

 

Checksum

An integer value given to each sequence file created with GCG editors or programs. Programs check this value each time a sequence file is used as input. It is a check against the corruption of the sequence data. For more information, see Section 2, Using Sequences.

 

Command-Line Control

A feature of the computer operating system that enables you to provide more information on the command line than just the name of the program or command name. For example, in GCG, you can include the input filename, the output filename, as well as program parameters on the command line. For more information, see "Using Command-Line Control" in Section 3, Using Programs.

 

Database

Repositories of sequence information established and maintained by the scientific community. Each database entry consists of a reference section and the sequence itself. Databases must be formatted specially for use with GCG.

A number of databases are available which allow you to search and retrieve sequence information, including GenBank, UNIPROT, and PIR. For more information, see "Using Database Sequences" in Section 2, Using Sequences.

GCG also allows you to create your own personal databases with the DataSet+ program. For more information, see "Using Personal Databases" in the "For Advanced Users" section of Section 2, Using Sequences.

 

Data Files

Files containing information essential for running GCG programs. For example, the Map program requires the data file Enzyme.Dat, which contains information about recognition sites of restriction enzymes.

Data files can be either default or local. Default data files are those files programs use unless you specify otherwise. That is, you are never required to supply a data file because a default data file is always available. Local data files are those data files you specify a program should use instead of the default.

GCG provides the default data files in the public directory with the logical name SHARE.  For more information, see Section 4, Using Data Files.

 

Directory

A unit of organization for storing information on a computer. Within a directory, you can store subdirectories and files. Directories and subdirectories are analogous to drawers in a filing cabinet.

The top directory, or home directory, refers to the directory you log into. The current directory, or working directory, refers to the directory you are working in presently. For more information, see "Working with Directories" in Section 1, Getting Started.

 

File

A basic unit of storage on a computer--for example sequence information, the output of a program, or a memo to other individuals in your lab. Most GCG programs require one or more files as input and produce an output file of results. For more information, see "Working with Files" in Section 1, Getting Started.

 

File of Sequence Names

Replaced by the term list file. See List File.

 

FOSN

File of Sequence Names. This term has been replaced by the term list file. See List File.

 

Global Parameters

Optional command-line parameters those are available to all GCG programs that support command-line control. (To avoid repetition, these parameters are not displayed in the "Command-Line Summary" in online help or in the Program Manual.) For more information, see "Using Global Parameters" in the "Customizing Program Analyses" section of Section 3, Using Programs.

 

Global Switch

Commands you can issue after initializing GCG that affect every program run during that session. For example, the command nodoc suppresses the short banner that introduces each package program. Global switches also are used to initialize graphics for a GCG session. For more information, see "Using Global Switches" in the "Customizing Program Analyses" section of Section 3, Using Programs.

 

Graphics Configuration

Specifies how graphics output from GCG programs will display during a session. The configuration consists of the graphics language you want to use, the type of supported graphics device you want to display on (printer, plotter, or terminal screen), and the port or queue to which the device is attached. For more information, see "Initializing Your Graphics Configuration" in Section 5, Using Graphics. For more information on supported graphics languages and devices, see Appendix C, Graphics.

 

List File

A text file consisting of sequence specifications (one per line) which can include database sequences, single sequences in your own directories, multiple sequence specifications using wildcards, other list files, and/or MSF files. The items in a list must be preceded by a line ending with two periods (..).

Some GCG programs generate list files as output (for example StringSearch). Others (for example PileUp) accept list files as input if you precede the filename with an ‘at’ symbol (@), for example @hsp70.list. For more information, see "Using List Files" in Section 2, Using Sequences.

 

Local Data File

See Data File.

 

Logical Name

A shorthand name you can use for directories, databases, and filenames which reduces typing and is often easier to remember than full specifications. You can create your own GCG logical names using the program Name. See "Defining and Using Logical Names for Directories" in Section 1, Getting Started.

 

Login

An approved account on a computer that must be created by a system manager before you can log in and use the system. The login consists of a user name and (almost always) a password, the latter known only to the user. For more information, see "Logging On" in Section 1, Getting Started or see your system manager.

 

Metacharacter

A character that is interpreted by UNIX shells and GCG programs in a defined manner. The most common examples of metacharacters are the wildcards * and ?. The * wildcard metacharacter is interpreted to mean "any character or no character" and the ? wildcard metacharacter is interpreted to mean "any one character." For example, in the command % ls *.seq, the *.seq is interpreted as any filename ending with the extension ".seq". In the example % ls hsp?.seq, the hsp?.seq is interpreted as the name of any file beginning with "hsp" followed by one alphanumeric character, and ending with ".seq".

You can use wildcard metacharacters to specify databases or divisions of databases within GCG. For example, GenBank:* specifies all of the entries in the Genbank database; Ba:* specifies all of the bacterial entries in GenBank; and Uni:hsp* specifies all of the sequences in UNIPROT that begin with "hsp".

 Metafile

A device-independent graphics file created by including -FIGure=filename on the command line when you run a GCG graphics program. You then can use the Figure program to print, plot or display the file. For example, if you have configured your graphics for Tektronix emulation of a tek4014 terminal, the Figure program translates the metafile to Tektronix language and displays the information on the Tektronix terminal screen. If you change your graphics configuration to PostScript for a LaserWriter printer and rerun the Figure program, the metafile is translated to PostScript and prints on the LaserWriter. For more information, see "Saving Graphic Output to a File" in Section 5, Using Graphics.

 

MSF File

Files containing two or more sequences aligned together. For more information, see "Using Multiple Sequence Format (MSF) Files" in Section 2, Using Sequences.

 

Multiple Sequence Format File

See MSF File.

 

Parameter

Modifies the action of a UNIX or GCG command. Some parameters have values, which modify the parameter (for example -BEGin=100), but not all do (for example -BATch). See also unqualified parameter. For more information, see "Using Program Parameters" in Section 3, Using Programs.

 

Platen

The area used to print, plot, or display GCG graphics. The platen is defined on every supported graphics device as 100 vertical units (Y) by 150 horizontal units (X).

 

Platen Unit

The units used to define the platen for GCG graphics. There are 100 platen units vertically and 150 platen units horizontally in the platen.

 

Port

A port is a connection through which a separate device (such as a printer, plotter, or graphics terminal) may communicate with the computer. For more information, see "Connecting a Graphics Device to the Computer" in Appendix C, Graphics.

 

Primary Accession Number

The first number that appears in the accession number in a database entry. Using accession numbers is the best way to specify a sequence entry. See Accession Number.

 

Public Data File

A data file that resides in a public directory of GCG. See Data File.

 

Qualifier

The first unit of a parameter. That is, a parameter can be defined as -qualifier=value. All qualifiers are preceded with a dash (-). Some qualifiers have values (for example -BEGin=100), but not all do (for example -BATch). Modifies the action of a UNIX or GCG command.

 

Queued Device

A device, usually a printer, that has a method (the queue) for controlling the number of "jobs" it submits to the computer. Each job that is sent to a queued device is sent to the computer in the order it was received. A system manager must set up a queue device. For more information, see "Defining Your Graphics Configuration" in Section 5, Using Graphics.

 

Rich Sequence Format File

See RSF File.

 

RSF File

Files containing one or more sequences that may or may not be related. In addition to the sequence data, each sequence can be richly annotated with creator/author, sequence weight, creation date, a one-line description, offset, and sequence features. For more information see "Using Rich Sequence Format (RSF) Files" in Section 2, Using Sequences.

 

Scoring Matrix

A table of pairwise relationships between nucleotide symbols or amino acid symbols.
By default for nucleotides, the pairwise value for identities (for example guanines pairing with guanines) is greater than the value given for for non-identity pairs (for example guanine pairing with adenine).

For amino acids, the greater the value for an amino acid pair, the more related or substitutable those amino acids are thought to be (for example valine pairing with isoleucine is given a higher value than valine pairing with tryptophan).

Scoring matrices are used by database searching and multiple sequence alignment programs. Default scoring matrices for each GCG program requiring one can be found in files in the directory with the logical name Share_Matrix. You can use the Fetch+ command to copy these files into your directory to customize them. For more information, see "Using a Special Kind of Data File: A Scoring Matrix" in Section 4, Using Data Files.

 

Script-Execution Queue

The queue used by the ‘at’ and ‘batch’ UNIX commands. The script-execution queue is also known as the batch queue or at queue. See At and Batch.

 

Secondary Accession Number

Any non-primary accession number associated with a database entry. Secondary accession numbers usually indicate that the database entry has been merged or modified in some way. See Accession Number.

 

Shell Metacharacter

See Metacharacter.

 

Shell Script

A file that can be used to execute one or more UNIX or GCG commands. Shell scripts are automatically created each time you run a program with the -BATch parameter. The script contains all of the information you would communicate to the computer if you ran the program from the command line.

Other examples of files containing shell scripts are .login (csh) and .profile (ksh), which are executed automatically each time you log onto the computer. You also can create shell scripts to submit programs that do not support the -BATch parameter. For more information, see "Working with Shell Scripts" in Section 3, Using Programs.

 

Standard Error

The device to which a program or operating system normally sends error messages. Standard error is usually directed to either the terminal screen or to a file. For more information, see "Using Command-Line Redirection" in Section 1, Getting Started.

 

Standard Input

The device from which a program or operating system normally receives input. In most cases this is the terminal (that is, whatever you type from the keyboard is displayed on the terminal screen). For more information, see "Using Command-Line Redirection" in Section 1, Getting Started.

 

Standard Output

The device to which a program or operating system normally sends its output. For UNIX commands this is usually the terminal. For GCG programs (other than graphics programs) this is usually a file. For more information, see "Using Command-Line Redirection" in Section 1, Getting Started.

 

Symbol Comparison Table

Replaced by the term scoring matrix. See scoring matrix.

 

Unqualified Parameter

Parameters that are not preceded by a qualifier. For example, it is not necessary to specify an input file from the command line with -INfile=; you can simply type the filename. If you use unqualified parameters, they must appear in the proper order on the command line. See also parameter.

 

Wildcard

Use wildcard characters, such as the asterisk (*) or question mark (?), for file specifications. You most often will use wildcards to specify multiple files. An asterisk (*) wildcard serves as an ambiguous replacement for a character or group of characters; the * means "any character or no character." The question mark (?) wildcard means "any one character." See also metacharacter. For more information, see "Using Wildcards" in the "Working with Files" section of Section 1, Getting Started.

 


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio