MAPPLOT

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

CONSIDERATIONS

SUBSET, OVERLAP, AND PERFECT SEARCHES

SELECTING ENZYMES

DEFINING PATTERNS

GRAPHICS

<CTRL>C

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

MapPlot displays restriction sites graphically. If you don't have a plotter, MapPlot can write a text file that approximates the graph.

DESCRIPTION

[ Previous | Top | Next ]

MapPlot is a tool for genetic engineering. It helps you visualize how part of a DNA molecule may be isolated. MapPlot uses color to distinguish the types of overhang left after digestion (5' overhangs are green, 3' overhangs are blue, blunt ends are black, and undetermined overhangs are red). The site, cut position, and total number of cuts are also shown for each enzyme. The enzymes that do not cut are listed below the plot. You may choose to plot only enzymes that have six base recognition sites or enzymes that cut the molecule only once.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using MapPlot to display the restriction enzymes that cut pbr322 once outside the tetracycline resistance and beta lactamase coding sequences:

 
 
% mapplot -CIRcular -OUTfile=synpbr322.mapplot -MINSitelen=6 \
  -EXCLUde=86,1276,3293,4153 -ONCe
 
 (Circular) MAPPLOT of what sequence ?  GenBank:SynpBR322
 
                   Begin (* 1 *) ?
                 End (*  4361 *) ?
 
 Select the enzymes:  Type nothing or "*" to get all enzymes. Type "?"
 for help on which enzymes are available and how to select them.
 
                                      Enzyme(* * *):
 
  When your LaserWriter attached to tty07 is ready, press <Return>.
 
%

OUTPUT

[ Previous | Top | Next ]

If you are reading the Program Manual, you can see the plot from this session at the end of this program description. Here is some of the text output file:

 
 
(Circular) MAPPLOT of: gb_sy:SynpBR322  Check: 5483  from: 1  to: 4361
 
J01749 Cloning vector pBR322, complete genome. 6/96
LOCUS       SYNPBR322    4361 bp    DNA   circular  SYN       07-JUN-1996
DEFINITION  Cloning vector pBR322, complete genome.
ACCESSION   J01749 K00005 L08654 M10282 M10283 M10286 M10356 M10784 M10785
            M10786 M33694 V01119
NID         g208958 . . .
 
 With 158 enzymes: *
 
 MaxCuts: 1
 
                        October 12, 1998 10:24
 
       1                                                   4361   ..
   AatII _____.____.____._____.____.____.____.____.__|_  1 G_ACGT'C
  AflIII _____.____.____._____.____|____.____.____.____  1 A'CryG_T
   AlwNI _____.____.____._____.____.___|.____.____.____  1 CAG_nnn'CTG
 
 /////////////////////////////////////////////////////////////////////
 
    SspI _____.____.____._____.____.____.____.____._|__  1 AAT'ATT
    StyI _____.____.___|._____.____.____.____.____.____  1 C'CwwG_G
 Tth111I _____.____.____._____._|__.____.____.____.____  1 GACn'n_nGTC
 
 Enzymes that do cut and were not excluded:
 
    AatII   AflIII    AlwNI     ApoI     AvaI   Bpu10I    BsaAI    BsaBI
     BsgI     BsmI    BsmBI    BspEI BspLU11I Bst1107I     ClaI    EcoRI
  HindIII     MscI     NdeI    PvuII     SapI     SspI     StyI  Tth111I
 
 Enzymes that do not cut:
 
    AflII     ApaI     AscI    AvrII     BaeI     BclI    BglII     BmgI
     BplI Bpu1102I    BsaXI    BseRI    BsrGI   BssHII   BstEII    BstXI
 
/////////////////////////////////////////////////////////////////////////////
 
 Enzymes excluded; MinCuts: 1  MaxCuts: 1  Excluded ranges: 86,1276 3293,4153
 
    AccI   AceIII     AhdI    ApaBI    ApaLI    BamHI     BanI    BanII
    BbsI   Bce83I     BcgI     BcgI    BciVI     BfiI     BglI     BpmI
 
/////////////////////////////////////////////////////////////////////////////

INPUT FILES

[ Previous | Top | Next ]

MapPlot accepts a single nucleotide or protein sequence as input. The function of MapPlot depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Map maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence. MapSort finds the coordinates of the restriction enzyme cuts in a DNA sequence and sorts the fragments of the resulting digest by size. MapSort can sort the fragments from single or multiple enzyme digests. Use MapPlot with MapSort to locate practical sites for labeling or isolating any region within a DNA molecule.

CONSIDERATIONS

[ Previous | Top | Next ]

You have to use the command-line parameter -CIRcular if you want MapPlot to show cuts for recognition sites that span the ends of the molecule. MapPlot is a graphics program; it writes a text output file representing the map only if -OUTfile appears on the command line.

SUBSET, OVERLAP, AND PERFECT SEARCHES

[ Previous | Top | Next ]

This program normally requires that a sequence pattern be a subset of the enzyme recognition site. If the recognition pattern in the enzyme data file were GCRGC, then the pattern GCAGC in your sequence would be found, since A is within the set of bases defined by R (see Appendix III). If the pattern in the enzyme data file were GCAGC, then a GCRGC in your sequence would not be recognized. If your sequence is very ambiguous, as it might be if it were a backtranslated sequence, then it may be better to use -ALL to do an overlap search. The overlap search would consider an R in your sequence to match an A in the recognition site.

With -PERFect, the program looks for a perfect symbol match between your sequence and the recognition pattern -- GCRGC in the recognition pattern would only match a GCRGC in the sequence.

All searches are case insensitive (upper- or lowercase) for the letters in either the sequence or the enzyme recognition site.

SELECTING ENZYMES

[ Previous | Top | Next ]

The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. To get help with selecting enzymes, type a ? at the enzyme prompt. Here is what you see:

 
 
Select enzymes:
 
Type "*" to select all enzymes.
Type "**" to select all enzymes including isoschizomers.
Type individual names like "AluI" to select specific enzymes.
Type "?" to see this message and all available enzymes.
Type "??" to see the available enzymes AND their recognition sites.
Type "?A*" to see what enzymes start with "A."
Type "A*" to select all enzymes starting with "A."
Type parts of names like "Al*" to select all enzymes starting with "AL."
Type "~A*" to unselect all selected enzymes starting with "A."
Type "/*" to see what enzymes you have selected so far.
Type "#" to select no enzymes at all.
 
Press <Return> after each selection.
Press <Return> and nothing else to end your selections.
Spaces are allowed; upper and lower case are equivalent.

We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.

There is more information on enzyme files in Appendix VII.

A command-line expression like -ENZymes=AluI,EcoRII would choose AluI and EcoRII and suppress interactive enzyme selection.

DEFINING PATTERNS

[ Previous | Top | Next ]

FindPatterns, Map, MapSort, MapPlot, and Motifs all let you search with ambiguous expressions that match many different sequences. The expressions can include any legal GCG sequence character (see Appendix III). The expressions can also include several non-sequence characters, which are used to specify OR matching, NOT matching, begin and end constraints, and repeat counts. For instance, the expression TAATA(N){20,30}ATG means TAATA, followed by 20 to 30 of any base, followed by ATG. Following is an explanation of the syntax for pattern specification.

Implied Sets and Repeat Counts

Parentheses () enclose one or more symbols that can be repeated some number of times. Braces {} enclose numbers that tell how many times the symbols within the preceding parentheses must be found.

Sometimes, you can leave out part of an expression. If braces appear without preceding parentheses, the numbers in the braces define the number of repeats for the immediately preceding symbol. One or both of the numbers within the braces may be missing. For instance, both the pattern GATG{2,}A and the pattern GATG{2}A mean GAT, followed by G repeated from 2 to 350,000 times, followed by A; the pattern GATG{}A means GAT, followed by G repeated from 0 to 350,000 times, followed by A; the pattern GAT(TG){,2}A means GAT, followed by TG repeated from 0 to 2 times, followed by A; the pattern GAT(TG){2,2}A means GAT, followed by TG repeated exactly 2 times, followed by A. (If the pattern in the parentheses is an OR expression (see below), it cannot be repeated more than 2,000 times.)

OR Matching

If you are searching nucleic acids, the ambiguity symbols defined in Appendix III let you define any combination of G, A, T, or C. If you are searching proteins, you can specify any of several symbol choices by enclosing the different choices in parentheses and separating the choices with commas. For instance, RGF(Q,A)S means RGF followed by either Q or A followed by S. The length of each choice need not be the same, and there can be up to 31 different choices within each set of parentheses. The pattern GAT(TG,T,G){1,4}A means GAT followed by any combination of TG, T, or G from 1 to 4 times followed by A. The sequence GATTGGA matches this pattern. There can be several parentheses in a pattern, but parentheses cannot be nested.

NOT Matching

The pattern GC~CAT means GC, followed by any symbol except C, followed by AT. The pattern GC~(A,T)CC means GC, followed by any symbol except A or T, followed by CC.

Begin and End Constraints

The pattern <GACCAT can only be found if it occurs at the beginning of the sequence range being searched. Likewise, the pattern GACCAT> would only be found if it occurs at the end of the sequence range.

GRAPHICS

[ Previous | Top | Next ]

Accelrys GCG (GCG) must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages GCG supports. See Section 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.

<CTRL>C

[ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % mapplot [-INfile=]genbank:synpbr322 -Default
 
Prompted Parameters:
 
-BEGin=1 -END=4361        sets the range of interest
-ENZymes=*[,...]          selects the enzymes
 
Local Data Files:
 
-DATa=enzyme.dat          specifies restriction enzymes and recognition sites
-DATa=proenzyme.dat       specifies peptidases and peptide cleavage reagents
-TRANSlate=translate.txt  contains the genetic code
-MARk=synpbr322.mrk       marks regions of interest below the plot
 
Optional Parameters:
 
-CIRcular              treats the sequence as circular (default is linear)
[-OUTfile=]synpbr322.mapplot   makes a text file representation of the plot
  -WIDth=45            sets the number of columns (30-100) in text output
  -NOPLOt              suppresses the plot
  -APPend              appends the enzyme file to the text output
-DENsity=4361          sets the number of bases per 100 platen units
-SPAcing=1.6           sets the number of platen units per line
-MISmatch=1            finds potential sites with one or fewer mismatches
-SILent                finds translationally silent potential restriction sites
-PERFect               looks only for perfect matches
-ALL                   finds "overlapping-set" matches
-CUTters[=fn]          writes enzyme data file with enzymes that did cut
-NONCUTters[=fn]       writes enzyme data file with enzymes that did not cut
-EXCUTters[=fn]        writes enzyme data file with enzymes that were excluded
-MINSitelen=6          selects enzymes with 6 (or more) bases
                         in recognition site
-OVErhang=0            selects only blunt-end cutters ("5" for 5', "3" for 3')
-MINCuts=2             displays only enzymes that cut at least 2 times
-MAXCuts=2             displays only enzymes that cut no more than 2 times
-ONCe                  displays only enzymes that cut once
-EXCLude=n1,n2         suppresses enzymes cutting between bases n1 and n2
 
All GCG graphics programs accept these and other switches. See the Using
Graphics section of the USERS GUIDE for descriptions.
 
-FIGure[=filename]  stores plot in a file for later input to FIGURE
-FONT=3             draws all text on the plot using font 3
-COLor=1            draws entire plot with pen in stall 1
-SCAle=1.2          enlarges the plot by 20 percent (zoom in)
-XPAN=10.0          moves plot to the right 10 platen units (pan right)
-YPAN=10.0          moves plot up 10 platen units (pan up)
-PORtrait           rotates plot 90 degrees

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

This program reads the public or local version of enzyme.dat to get the enzyme names, recognition sites, cut positions, and overhangs. You can use mapping programs to search for any sequence pattern by adding the pattern to the enzyme data file. If you use the command-line parameter -APPend, this program appends the enzyme data file to the output file. (See Appendix VII for more information about enzyme data files.)

If MapPlot finds Type: P on the dividing line in the sequence file, it reads proteolytic cleavage data in the local data file proenzyme.dat.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII. If you use the command-line parameter -APPend and you have provided your own translation scheme, this program appends your translation table to the output file.

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the filename extension .mrk causes the program to mark each range specified in the file. You can provide a marking file on the command line with an expression like -MARk=gamma.mrk. The file gamma.mrk contains information about the format of marking files. The figure for the example session shows marked regions.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-ENZymes=*[,...]

Specifies the restriction enzymes whose recognition sites you want to search. If you search for several different enzymes, separate their names with commas. -ENZymes=* selects all enzymes, -ENZymes=** selects all enzymes, including isoschizomers, and -ENZymes=Al* selects all enzymes whose names start with Al.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-MARk=synpbr322.mrk

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the file name extension .mrk causes the program to mark each range specified in the file. The file gamma.mrk contains information about the format of marking files.

-CIRcular

Causes MapPlot to treat the sequence as circular. For instance, if your sequence ended with 'GA' and started with 'ATTC' then MapPlot would show an EcoRI site at the 'G'.

[-OUTfile=]synpbr322.mapplot

Writes a text file with a representation of the restriction map that can be printed on a regular printer. This output format was designed by Dr. Vern Luckow of Texas A and M University. It is a good representation of the plotted output of MapPlot. The output file can be printed quickly without using a graphics output device.

The output file can be specified with just the name of the output file as the second parameter on the command line. Do not use TERM as the output file if you are making a plot with a plotter attached to the terminal.

These three parameters only apply when -OUTfile is used.

-WIDth=45

If you are using the output file parameter, MapPlot uses 45 characters to represent the sequence in the summary restriction map. You may change the number of characters used to represent the sequence to anything from 30 to 100 characters.

-NOPLOt

Suppresses the plot when only the text output file summary is desired.

-APPend

Appends the enzyme data file to your output file. If you provided your own translation scheme to find translationally silent potential restriction sites (using -SILent) that file is also appended.

-DENsity=1000

Sets the number of bases or amino acids per 100 platen units (PU). This is usually equivalent to the number of bases or amino acids per page. Output from different GCG graphics programs that are run at the same density can be compared by lining up the plots on a light box.

-SPAcing=1.6

Sets the spacing between each line of the display to 1.6 platen units. If the plot seems crowded on your plotter, try setting the spacing to 3.0.

-MISmatch=1

Causes the program to recognize sites that are like the recognition site but with one or fewer mismatches. If too many mismatches are allowed, the results may not be meaningful. The output from most mapping programs distinguishes between sites with no mismatches and sites with mismatches.

-SILent

Shows the places where restriction sites can be introduced (by site-directed mutagenesis) without changing the peptide translation of the sequence. The -SILent parameter assumes that the range you have chosen defines a coding region and reading frame precisely. Sites may be found that have any number of bases changed as long as the changes do not alter the translation. The reading frame is implied by the beginning coordinate you specify. The output from most mapping programs distinguishes between real sites and sites with one or more mismatches. The data file translate.txt defines the genetic code.

-PERFect

Sets the program to look for a perfect alphabetic match between the site and the sequence. Ambiguity codes are normally translated so that the site RXY would find sequences like ACT or GAC. With this parameter, the ambiguity codes are not translated so the site RXY would only match the sequence RXY. This parameter is not the same as -MISmatch=0!

-ALL

Makes an overlap-set map instead of the usual subset map. If your sequence is very ambiguous (for instance, as a back-translated sequence would be) and you want to see where restriction sites could be, then an overlap-set map is for you. Overlap-set and subset pattern recognition is discussed in more detail in the Program Manual entry for Window.

-CUTters=gamma.cutters

Writes out a new enzyme data file containing those selected enzymes that did cut your sequence and were not excluded with any of the -MINCuts, -ONCe, -MAXCuts, and -EXClude parameters. If you do not add a file name to the -CUTters parameter the output file will have the name of your sequence followed by the file name extension .cutters

-NONCUTters=gamma.noncutters

Writes out a new enzyme data file containing the selected enzymes that did NOT cut your sequence. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension .noncutters

-EXCUTters=gamma.excutters

Writes out a new enzyme data file containing those enzymes that did cut your sequence but were excluded with any of the -EXClude, -MINCuts, -ONCe, and -MAXCuts parameters. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension .excutters

The parameters -MINSitelen and -OVErhang restrict the domain of enzymes selected.

-MINSitelen=6

Selects only patterns with the specified number or more bases in the recognition site. You can display the sites from any pattern in the enzyme or pattern file that you take the trouble to name individually, but when you use all of the patterns, the program uses all of the patterns whose recognition sites have the specified number or more non-N, non-X bases. -MINSitelen=6 replaces the -SIXbase parameter from earlier versions of GCG.

-OVErhang=0

Selects only enzymes that leave blunt ends. Use a 5 with this parameter to search only with enzymes that leave 5' overhangs and a 3 to search only with enzymes that leave a 3' overhang. You can use multiple values, separated by commas. For instance, -OVErhang=5,3 searches with all enzymes that leave either 5' or 3' overhangs. You can display the cuts from any enzyme in the enzyme data file that you take the trouble to name individually, but when you use * (meaning all), the program uses all of the enzymes whose overhangs conform to your choice with this parameter.

The -MINCuts, -MAXCuts, -ONCe, and -EXClude parameters suppress the display of selected enzymes. The list of excluded enzymes in the program output includes both selected enzymes that cut within excluded ranges and selected enzymes that did not cut the right number of times.

-MINCuts=2

Excludes enzymes that do not cut at least two times.

-MAXCuts=2

Excludes enzymes that cut more than two times.

-ONCe

Excludes, from the set of enzymes displayed, those enzymes that cut your sequence more than once (equivalent to setting both mincuts and maxcuts to one).

-EXClude=n1,n2[,n3,n4,...]

Excludes enzymes that cut anywhere within one or more ranges of the sequence. If an enzyme is found within an excluded range, then the enzyme is not displayed. The list of excluded enzymes includes enzymes that cut within excluded ranges. The ranges are defined with sets of two numbers. The numbers are separated by commas. Spaces between numbers are not allowed. The numbers must be integers that fall within the sequence beginning and ending points you have chosen. The range may be circular if circular mapping is being done. Exclusion is not done if there are any non-numeric characters in the numbers or numbers out of range or if there is an odd number of integers following the parameter.

The parameters below apply to all GCG graphics programs. These and many others are described in detail in Section 5, Using Graphics of the User's Guide.

-FIGure=programname.figure

Writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of sending it to the device specified in your graphics configuration.

-FONT=3

Draws all text characters on the plot using Font 3 (see Appendix I).

-COLor=1

Draws the entire plot with the pen in stall 1.

The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).

-SCAle=1.2

Expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).

-XPAN=30.0

Moves the plot to the right by 30 platen units (pan right).

-YPAN=30.0

Moves the plot up by 30 platen units (pan up).

-PORtrait

Rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.

Printed: May 27, 2005  12:53



[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio