GAPSHOW

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

NUMBERING

GRAPHICS

<CTRL>C

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

GapShow displays an alignment by making a graph that shows the distribution of similarities and gaps. The two input sequences should be aligned with either Gap or BestFit before they are given to GapShow for display.

DESCRIPTION

[ Previous | Top | Next ]

BestFit and Gap make optimal alignments of sequences by adding gaps to maximize the number of matches. Gap and BestFit normally display the alignments, but they can also write the aligned sequences (with gaps inserted) into new sequence files. GapShow reads these files and plots the distribution of the differences or similarities in the alignment graphically (see figure below). The sequences are represented by horizontal lines. The horizontal lines have openings at points where there are gaps in either sequence. Regions of interest, such as coding regions, can be shown outside these lines (see the LOCAL DATA FILES topic). A large vertical line between the sequences can indicate either a difference or similarity (see the OPTIONAL PARAMETERS topic below). If differences are being shown, gaps are also depicted with short vertical lines to convey that the gap is a difference between the two sequences. You can reproduce the original numbering by using command-line parameters (see below).

When run with the command-line parameter -OUTfile, GapShow writes a file like the paired output file from Gap and BestFit. The two sequences are displayed one above the other with vertical bars (|) marking the positions where the symbols are similar. Some people use GapShow with this parameter to see the differences between very similar sequences.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using GapShow to display the alignment of hpr.gap to hpf.gap both graphically and base-by-base:

 
 
% gapshow -OUTfile=hpr.pair
 
 GAPSHOW of what sequence 1 ?  hpr.gap
 
                  Begin (* 1 *) ?
                End (*  2982 *) ?
 
         to what sequence 2 ?  hpf.gap
 
                  Begin (* 1 *) ?
                End (*  2982 *) ?
 
  When your LaserWriter attached to tty07 is ready, press <Return>.
 
 Output file: "hpr.pair"
 
%

OUTPUT

[ Previous | Top | Next ]

Here is the plot from this session:

Here is part of the output file (called hpr.pair) from the example session:

 
 
 GAPSHOW of: hpr.gap  check: 7949  from: 1  to: 2982
 
 GAP of: hpr.seq  check: 8102  from: 1  to: 2966
 after alignment with: hpf.seq  check: 2624  from: 1  to: 2740
 Symbol comparison table: GenRunData:nwsgapdna.cmp  CompCheck: 8760
         Gap Weight:     50      Average Match: 10.000
      Length Weight:      3   Average Mismatch:  0.000
            Quality:  24426             Length:   2982 . . .
 
 TO: hpf.gap  check: 9311  from: 1  to: 2982
 
 GAP of: hpf.seq  check: 2624  from: 1  to: 2740
 after alignment with: hpr.seq  check: 8102  from: 1  to: 2966
 Symbol comparison table: GenRunData:nwsgapdna.cmp  CompCheck: 8760
         Gap Weight:     50      Average Match: 10.000
      Length Weight:      3   Average Mismatch:  0.000
            Quality:  24426             Length:   2982 . . .
 
        Match display thresholds for the alignment(s):
                    | =   5
                    : =   5
                    . =   1
 
                        September 24, 1998 12:12  ..
                  .         .         .         .         .
       1 AAGCTTGGTATGCTCAGAAGCAGCTAAAGCGTGTATGTGGGGCGGAGGGT 50
         ||||||||||||||||||||| ||||||| ||||||| |  |    | ||
       1 AAGCTTGGTATGCTCAGAAGCTGCTAAAGTGTGTATGGGCAG....GTGT 46
                  .         .         .         .         .
      51 GGGGGCAACTTCTTGGTCCTAGCACTTCCATATATTGATTTTCTTTTCTG 100
         |||||||| |||||||||||||||||||||||||| || |||||||||||
      47 GGGGGCAATTTCTTGGTCCTAGCACTTCCATATATCGACTTTCTTTTCTG 96
 
    ////////////////////////////////////////////////////////////
                  .         .         .         .         .
    2885 ACACCTGCTATGGCGATGCGGGCAGTGCCTTTGCCGTTCACGACCTGGAG 2934
         ||||||||||||||||||||||||||||||||||||||||||||||||||
    2659 ACACCTGCTATGGCGATGCGGGCAGTGCCTTTGCCGTTCACGACCTGGAG 2708
                  .         .         .
    2935 GAGGACACCTGGTACGCGGCTGGGATCTTAAG 2966
         |||||||||||||| ||| |||||||||||||
    2709 GAGGACACCTGGTATGCGACTGGGATCTTAAG 2740

INPUT FILES

[ Previous | Top | Next ]

GapShow accepts two aligned individual nucleotide sequences or protein sequences as input The function of GapShow depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Pretty displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it. Gap uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. BestFit makes an optimal alignment of the best segment of similarity between two sequences. Optimal alignments are found by inserting gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman. When run with the command-line parameter -OUT, BestFit and Gap write files with the aligned sequences (expanded by the addition of gaps). Figure makes figures and posters by drawing graphics and text together. You can include output from other Accelrys GCG (GCG) graphics programs as part of a figure. PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict. PlotSimilarity plots the running average of the similarity among the sequences in a multiple sequence alignment.

RESTRICTIONS

[ Previous | Top | Next ]

GapShow does not make alignments, it displays them. When run with the command-line parameter -OUT, BestFit and Gap writes files with the aligned sequences (expanded by the addition of gaps).

The GapShow display stops when the shorter sequence is exhausted. When two sequences are aligned with Gap or BestFit, both of the output sequences should be exactly the same length.

NUMBERING

[ Previous | Top | Next ]

GapShow normally numbers each sequence from one to the length of the sequence in the input file. There are parameters described below that can number either sequence, starting from any number, either upwards or downwards. If you use a marking file to mark regions of interest, the ranges in it must fall within the numbers actually displayed for a region to be marked.

GRAPHICS

[ Previous | Top | Next ]

GCG must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages GCG supports. See Section 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.

<CTRL>C

[ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % gapshow [-INfile1=]hpr.gap [-INfile2=]hpf.gap -Default
 
Prompted Parameters:
 
-BEGin1=1 -END1=2982    sets the range of interest for sequence 1
-BEGin2=1 -END2=2982    sets the range of interest for sequence 2
 
Local Data Files:
 
-MATRix=swgapdna.cmp    assigns the scoring matrix for nucleic acids
-MATRix=blosum62.cmp    assigns the scoring matrix for proteins
-MARk1=hpr.mrk          defines regions of known interest on sequence 1
-MARk2=hpf.mrk          defines regions of known interest on sequence 2
 
Optional Parameters:
 
[-OUTfile1=]hpr.pair    makes a paired output file like the one from GAP
-PAIr=x,5,1             thresholds for displaying '|', ':', and '.'
-WIDth=50               the number of sequence symbols per line
-PAGe=60                adds a line with a form feed every 60 lines
-NOBIGGaps              suppresses abbreviation of large gaps with '.'s
-NOPLOt                 suppresses the plot
-BARs=d                 plots bars for (s)imilarities or (d)ifferences
-CONtinous              represents the sequences as continuous (unbroken)
                          lines
-DENsity=3000           plots 3,000 sequence symbols per page (100 ppu)
-NONUMbering            suppresses numbering of both sequences
-NOLABeling             suppresses labeling of both sequences
-NUM1=1352              starts numbering sequence 1 at 10
-REVNUM1                numbers sequence 1 downwards (default is upwards)
-NUM2=-500              starts numbering sequence 2 at -500
-REVNUM2                numbers sequence 2 downwards (default is upwards)
 
All GCG graphics programs accept these and other switches. See the Using
Graphics section of the USERS GUIDE for descriptions.
 
-FIGure[=filename]  stores plot in a file for later input to FIGURE
-FONT=3             draws all text on the plot using font 3
-COLor=1            draws entire plot with pen in stall 1
-SCAle=1.2          enlarges the plot by 20 percent (zoom in)
-XPAN=10.0          moves plot to the right 10 platen units (pan right)
-YPAN=10.0          moves plot up 10 platen units (pan up)
-PORtrait           rotates plot 90 degrees

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Section 4, Using Data Files in the User's Guide.

GapShow uses a scoring matrix to define when a symbol in sequence one is similar to a symbol in sequence two. See Appendix VII for more information about scoring matrices. If two symbols have a comparison value of greater than or equal to the average positive non-identical comparison value in the matrix, GapShow puts a bar in the text file and on the plot.

The scoring matrices used by GapShow are the same ones used by Gap and BestFit to make the alignments. If, in your directory, you have the matrix file swgapdna.cmp for nucleotide alignments made with BestFit, or nwsgapdna.cmp for nucleotide alignments made with Gap, or blosum62.cmp for protein alignments made with either BestFit or Gap, your local matrix file is used by GapShow instead of the public versions. GapShow uses swgapdna.cmp for nucleotide alignments if the first line of the first sequence file does not contain the string "GAP "; otherwise nwsgapdna.cmp is used.

Marking Files

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the filename extension .mrk causes the program to mark each range specified in the file. You can provide a marking file on the command line with an expression like -MARk=gamma.mrk. The file gamma.mrk contains information about the format of marking files. The figure for the example session shows marked regions. The files hpr.mrk and hpf.mrk were present when the figure below was made. The coordinates from the marking file are interpreted in the numbering mode chosen.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-MATRix=mymatrix.cmp

Allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData.

For more information see the Local Scoring Matrices section.

-MARk=hpr.mrk

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the file name extension .mrk causes the program to mark each range specified in the file. The file gamma.mrk contains information about the format of marking files.

-OUTfile=hpr.pair

Makes a text output file showing the alignment in a format similar to the output files from Gap or BestFit. If you do not supply a name, GapShow creates a name for it using the name of the first input sequence with the file name extension .pair. The file from the example session is shown above under the OUTPUT topic.

These four parameters format the output in the text file.

-PAIr=4,2,1

The paired output file from this program displays sequence similarity by printing one of three characters between similar sequence symbols: a pipe character(|), a colon (:), or a period (.). Normally a pipe character is put between symbols that are the same, a colon is put between symbols whose comparison value is greater than or equal to the average positive non-identical comparison value in the scoring matrix, and a period is put between symbols whose comparison value is greater than or equal to 1. You can change these match display thresholds from the command line. The three values associated with -PAIr are the display thresholds for the pipe character, colon, and period. The match display criterion for a pipe character changes from symbolic identity (the default) to the quantitative threshold you have set in the first parameter. A pipe character will no longer be inserted between identical symbols unless their comparison values are greater than or equal to this threshold. If you still want a pipe character to connect identical symbols, use x instead of a number as the first value. (See Appendix VII for more information about scoring matrices.)

-WIDth=50

Puts 50 sequence symbols on each line of the output file. You can set the width to anything from 10 to 150 symbols.

-PAGe=60

Printed output from this program may cross from one page to another in an annoying way. Use this parameter to add form feeds to the output file in order to try to keep clusters of related information together. You can set the number of lines per page by supplying a number after -PAGe.

-NOBIGGaps

Suppresses large gap abbreviations, showing all the sequence characters across from large gaps. Usually, gaps that extend one sequence by more than one complete line of output are abbreviated with three dots arranged in a vertical line.

-NOPLOt

Suppresses the plot.

-BARs=d

By default, GapShow plots bars at the positions where the aligned sequences differ. This parameter lets you choose to show the points of similarity (s) or difference (d).

-CONtinuous

Usually, GapShow represents the sequences as horizontal lines. Where there are gap characters (. and ~) in either sequence, GapShow interrupts the line. Use this parameter to make these lines continuous across the gaps.

-DENsity=1000

Sets the number of bases or amino acids per 100 platen units (PU). This is usually equivalent to the number of bases or amino acids per page. Output from different GCG graphics programs that are run at the same density can be compared by lining up the plots on a light box.

-NONUMbering

Suppresses the numbering of both sequences.

-NOLABeling

Suppresses the titles above and below the plot. Look at the Figure program to see a nice way to annotate GCG plots.

These five parameters affect the numbering of each sequence.

-NUM1=1352

Begins numbering the first sequence at 1352.

-REVNUM1

Numbers the first sequence downwards instead of upwards. If you do not set a beginning number, the first number is equal to the length of the sequence (not counting the gaps).

-NUM2=-500

Begins numbering the second sequence at -500.

-REVNUM2

Numbers the second sequence downwards instead of upwards. If you do not set a beginning number, the first number is equal to the length of the sequence (not counting the gaps).

The parameters below apply to all GCG graphics programs. These and many others are described in detail in Section 5, Using Graphics of the User's Guide.

-FIGure=programname.figure

Writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of sending it to the device specified in your graphics configuration.

-FONT=3

Draws all text characters on the plot using Font 3 (see Appendix I).

-COLor=1

Draws the entire plot with the pen in stall 1.

The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).

-SCAle=1.2

Expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).

-XPAN=30.0

Moves the plot to the right by 30 platen units (pan right).

-YPAN=30.0

Moves the plot up by 30 platen units (pan up).

-PORtrait

Rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.

Printed: May 27, 2005  12:31


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio