Table of Contents
Dot-plotting is the best way to see all of the structures in common between two sequences or to visualize all of the repeated or inverted repeated structures in one sequence. DotPlot is the second part of a two-part set of programs that generate dot-plots of the points of similarity between two sequences. (See Maizel and Lenk (1981) "Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences" Proc. Natl. Acad. Sci. (USA) 78(12); 7665-7669.) Compare writes a point file with the coordinates of the points in common between two sequences. DotPlot plots those points on a plotter or a graphics terminal.
DotPlot calculates the minimum density in bases per 100 platen units along either axis that would allow all of the points to be plotted on a single page. At high densities (for example, 1,000 bases per 100 platen units), the dots will not be individually resolved.
You can select any density you want and DotPlot divides the plot into as many pages as it takes to plot the whole file. Before you decide to go ahead, DotPlot tells you how many pages it would take at your chosen density, and if you have chosen your density interactively, DotPlot gives you a chance to change your mind. Look at the output suggestions below for more help on the format.
The file hpr.pnt from the sample session of the Compare program is used to make the plot at the end of this program's description.
DOTPLOT what point file ? hpr.pnt
hpr.pnt contains COMPARE results of
Axis Name Check Start End Dir
Horizontal hpr.seq 8102 1 2966 for
Vertical hpf.seq 2624 1 2740 for
Window . . . . . . . . . 21
Stringency . . . . . . . 14.0
Number of points . . . . 4,986
Percent of possible . . 0.061
The minimum density for a one-page plot is
3115.9 bases/100 platen units on each axis.
What point density would you like (* 3115.9 *) ?
DOTPLOT will take 1 pages. Would you like to:
P)lot the points
G)et another point file to plot
Please select one (* P *):
When your LaserWriter attached to tty07 is ready, press <Return>.
The plot from this session made with the Apple LaserWriter is reproduced in the first figure at the end of this program entry. The second figure shows a plot of a word comparison of the same two sequences, using a word size of 8. The input point file for the second figure is hpr-word.pnt.
Compare and StemLoop generate files with the points to actually be plotted by DotPlot. BestFit and Gap align regions of interest identified with dot-plots and present you with a base-by-base alignment with gaps inserted.
No more than 200,000 points can be plotted. The density is measured in symbols per 100 platen units. The Hewlett Packard 7475 plotter plots from 350 to 450 points per minute and resolves the points at densities below 1,000 bases per 100 platen units. The Hewlett Packard 7550 plotter plots about 1,000 points per minute.
Accelrys GCG (GCG) must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages GCG supports. See Section 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.
If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.
DotPlot gives you the option of extending a plot over several pages. If the chosen point density requires more than one page for plotting, the pages are all marked with the date and time so that they can be associated. To see all of the dots distinctly, you should plot 1,000 bases/100 platen units or less. A good plot does not have more than a few thousand points.
Each page of a multi-page plot is marked with the date and time. Each axis is marked with a number that tells how many pages away from the origin the axis is. For instance, the lower-left page is marked with a '1' on both axes. The upper-right page of a nine-page plot with three divisions on each axis would be marked with a '3' on both axes. -AUTOFeed causes plotters with automatic paper feed to put in new paper automatically.
Comparing Identical Sequences
If the sequences compared are the same in both range and checksum, then only the points above the diagonal are plotted and only pages with at least one possible point above the diagonal are shown. This feature can be canceled with -ALL.
Plotting Over the Same Page
You can plot two comparisons on the same piece of paper. For instance, you can compare proteins and nucleic acids on the same sheet by plotting on the same page. The optional parameters -NOLABel, -SYMbol, -NOCAPtion, and -POIntcolor can help you make the second plot contrast with the first. -NOUNLoad is useful with Hewlett Packard plotters -- it keeps GCG programs from unloading the paper after each plot.
All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.
Minimal Syntax: % dotplot [-INfile1=]hpr.pnt -Default
-DENsity=3116 sets the number of bases per 100 platen units
Local Data Files: None
-NOCAPtion suppresses the caption
-NUMbering[=10] displays sequence numbers every 10th base
-NOLABels suppresses all labels except for ticks
-TICKNUMbering=bc sets where to place tick numbering, where:
a=bottom b=right c=top d=left
-POIntcolor=1 sets color for the points
-SYMbol=0 makes points with various centered symbols
-SYMBOLHeight=0.02 sets height of centered symbols in platen units
-ALL plots redundant points (when sequences are identical)
-TICKAXes connects ticks with a solid axis
-DOTSonly suppresses connecting adjacent points with a line
-AXIs draws an axis of symmetry
All GCG graphics programs accept these and other switches. See the Using
Graphics section of the USERS GUIDE for descriptions.
-FIGure[=filename] stores plot in a file for later input to FIGURE
-FONT=3 draws all text on the plot using font 3
-COLor=1 draws entire plot with pen in stall 1
-SCAle=1.2 enlarges the plot by 20 percent (zoom in)
-XPAN=10.0 moves plot to the right 10 platen units (pan right)
-YPAN=10.0 moves plot up 10 platen units (pan up)
-PORtrait rotates plot 90 degrees
You can set the parameters listed below from the command line.
Sets the number of bases or amino acids per 100 platen units (PU). This is usually equivalent to the number of bases or amino acids per page. Output from different GCG graphics programs that are run at the same density can be compared by lining up the plots on a light box.
The parameters below may be helpful when more than one point file is displayed on the same plot.
Suppresses the blue divider box and the text to its left.
This program tries to number the ticks on each axis at an interval that gives about three to six numbered ticks. Use this parameter to set the numbering interval to please yourself. You can suppress tick numbering altogether with -NONUMbering.
Suppresses all of the labels except for the tick labels. Ticks are labeled with numbers on the right and top sides, unless you specify different sides to be numbered. See -TICKNUMbering below. Note that -FASt suppresses all text.
With -NOLABels, you can choose which axes should have their ticks numbered. The letter codes are as follows: a=bottom, b=right, c=top, and d=left.
Defines the color for the points as follows: Black=1, Green=2, Blue=3, and Red=4.
Defines a centered symbol to be used for every point. The available symbols are Point=0, Square=1, Octogon=2, Triangle=3, +=4, X=5, Diamond=6, *=7, and |=8.
Defines a symbol height for symbols (other than points) in units of one percent of the plotter's vertical axis (one platen unit).
When the sequences compared are identical in both range and checksum, only the points above the diagonal are plotted. The diagonal is represented with a drawn line. You can override this feature with -ALL.
Connects the ticks with a solid axis. Usually, GCG programs draw ticks floating in space.
When several adjacent points occur on a diagonal, DotPlot speeds up the plot by connecting them with a line. This parameter forces DotPlot to avoid this shortcut and plot all of the dots.
Causes an axis of symmetry to be drawn on plots where a sequence is compared to its own reverse-complement strand. The axis may not appear to be centered in some windows since Compare rounds the coordinates for the center of the window to the nearest integer. (Use -ALL with Compare to see a base-by-base comparison that is symmetric to this axis.)
The parameters below apply to all GCG graphics programs. These and many others are described in detail in Section 5, Using Graphics of the User's Guide.
Writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of sending it to the device specified in your graphics configuration.
Draws all text characters on the plot using Font 3 (see Appendix I).
Draws the entire plot with the pen in stall 1.
The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).
Expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).
Moves the plot to the right by 30 platen units (pan right).
moves the plot up by 30 platen units (pan up).
Rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.
Printed: May 27, 2005 12:05
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.