MAP+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

CONSIDERATIONS

SUBSET, OVERLAP, AND PERFECT SEARCHES

DISPLAY CONVENTIONS

SELECTING ENZYMES

CHOOSING THE TRANSLATION FRAMES

TABLE OUTPUT

POTENTIAL RESTRICTION SITES

SEARCH FOR ANY SEQUENCE PATTERN

DEFINING PATTERNS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE

NEW FUNCTIONS AVAILABLE

OPTIONS EXCLUDED IN MAP+


 

FUNCTION

[Top  | Next ]

Map+ maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map+ can also create a peptide Map+ of an amino acid sequence.

DESCRIPTION

[ Previous | Top| Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

Map+ displays a sequence that is being assembled or analyzed intensively. Map+ asks you to select the enzymes whose restriction sites should be marked individually by typing their names. If you do not answer this question, Map+ selects a representative isoschizomer from all of the commercially available enzymes. You can choose to have your sequence translated in any or all of the six possible translation frames. You can also choose to have only the open reading frames translated.

 

EXAMPLE

[ Previous | Top| Next]

Here is a session using Map+ to display a portion of ggamma.seq, along with a restriction map+ and six-frame protein translation:

 
 

map+ of what sequence(s) ? ggamma.seq

 

Begin (* 1 *) ? 1000

 

End (-1 for entire sequence) (* -1 *) ? 1700

 

Select the enzyme: Type nothing or * to select all enzymes

 

Enzyme (* * *) ?

 

What protein translation do you want

a) frame 1 b) frame 2 c) frame 3

d) frame 4 e) frame 5 f) frame 6

t) 3 forward frames s) 6 frames

n) no protein translation (* t *) ?

 

What should I call the output file (* <sequence_name>.map+ *) ?

 

 

Mapping...........

 

Writing........

 

 

Sequence Length:        701

Enzymes chosen:         240

Cut sites found:        187

 

 

Results written to ggamma.map+

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file:

 
 

Linear MAP of: ggamma.seq check: 7694 from: 1000 to: 1700           

Using enzymes from: /data2/kprasad/sandbox/bio/build/debug/solaris/share/rebase/enzyme.dat
With 240 enzymes: *

Name:   ggamma.seq
Description:  
VERSION:      
GI:                                                  Tsp509I
                                                |
                                             MseI        BslI
                                               |           |
                                                 Hpy188III
                                                     |
                                            S  F  N  S  R  W  G 
                                              L  *  F  Q  M  G  A
                                             P  L  I  P  D  G  G
                                            CCTTTAATTCCAGATGGGGGC
                                         1000    *    *    * 1020
                                            GGAAATTAAGGTCTACCCCCG

             ScrFI
               |
           PspGI
             |
             BstNI           HphI
           BssKI         Tsp509I
    MnlI    BsaJI         MfeI            CviJI           BbsI
      |       |             |               |               |
     Q  S  M  S  R  G  E  E  Q  L  K  H  L  G  W  S  R  F  *  K 
       K  Y  V  Q  G  *  G  T  I  E  T  F  G  L  E  *  I  L  K  V
      K  V  C  P  G  V  R  N  N  *  N  I  W  A  G  V  D  F  E  S
     AAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAGATTTTGAAAGT
         *    *    * 1040    *    *    * 1060    *    *    * 1080
     TTTCATACAGGTCCCCACTCCTTGTTAACTTTGTAAACCCGACCTCATCTAAAACTTTCA

                                  HhaI
                                    |
                                HhaI
                                  |
                               HinP1I
   MboII                          |
     |   Eco57MI             HinP1I
   Eco57MI  |                   |
      |  Eco57I              BssHII
    BpmI    |                   |
     CviJI                      Cac8I
       |                          BstUI
     AluI                       BstUI
       |                          |
     S  A  L  C  V  C  V  C  V  C  A  R  V  F  V  C  V  *  E  R 
       S  S  V  C  V  C  V  C  V  R  A  C  V  C  V  C  V  R  A  C
      Q  L  C  V  C  V  C  V  C  A  R  V  C  L  C  V  C  E  S  V
     CAGCTCTGTGTGTGTGTGTGTGTGTGTGCGCGCGTGTGTTTGTGTGTGTGTGAGAGCGTG
         *    *    * 1100    *    *    * 1120    *    *    * 1140
     GTCGAGACACACACACACACACACACACGCGCGCACACAAACACACACACACTCTCGCAC

                         SfcI               NlaIII
           MseI  TscI  SfaNI               BspCNI
            HpyCH4IV     |                  BseMII
              AclI     CviJI  BtsI       FatI  |             BsrI
     V  F  L  L  T  F  S  A  Y  S  I  Q  G  S  W  W  Q  E  D  N 
       V  S  F  N  V  F  S  L  Q  H  T  G  F  M  V  A  R  R  *  Q
      C  F  F  *  R  F  Q  P  T  A  Y  R  V  H  G  G  K  K  I  T
     TGTTTCTTTTAACGTTTTCAGCCTACAGCATACAGGGTTCATGGTGGCAAGAAGATAACA
         *    *    * 1160    *    *    * 1180    *    *    * 1200
     ACAAAGAAAATTGCAAAAGTCGGATGTCGTATGTCCCAAGTACCACCGTTCTTCTATTGT

                   Tsp45I
                      |
                   MaeIII
                      |
     MboII     EaeI
  Eco57MI        |                                      HphI
     |  Tsp509I  MscI                             BspCNI  |
   BpmI    |       |     TspRI                     BseMII
     | MseI     HaeIII     |  TseI             MnlI   | MseI
        SwaI       |    SpeI  Fnu4HI             MboII   BspMI
        DraI     CviJI   BfaI   HpyCH4V            HpyCH4V |
          |        |       |       |                  |
     K  I  *  I  M  A  S  D  *  C  C  K  K  N  N  Y  L  H  L  M 
       D  L  N  Y  G  Q  *  L  V  L  Q  E  E  Q  L  P  A  F  N  G
      R  F  K  L  W  P  V  T  S  A  A  R  R  T  T  T  C  I  *  W
     AGATTTAAATTATGGCCAGTGACTAGTGCTGCAAGAAGAACAACTACCTGCATTTAATGG
         *    *    * 1220    *    *    * 1240    *    *    * 1260
     TCTAAATTTAATACCGGTCACTGATCACGACGTTCTTCTTGTTGATGGACGTAAATTACC

                          BspCNI
                           BseMII                       HindIII
                              | MseI                       |
                                 Hpy8I       TfiI          CviJI
              DdeI               HpaI        HinfI           |
                |  CviJI        HincII   CviJI |           AluI
     G  K  Q  N  L  R  L  *  G  K  L  T  *  A  *  F  W  V  E  A 
       K  A  K  S  Q  A  L  R  E  V  N  I  G  L  I  L  G  G  S  L
      E  S  K  I  S  G  F  E  G  S  *  H  R  L  D  S  G  W  K  L
     GAAAGCAAAATCTCAGGCTTTGAGGGAAGTTAACATAGGCTTGATTCTGGGTGGAAGCTT
         *    *    * 1280    *    *    * 1300    *    *    * 1320
     CTTTCGTTTTAGAGTCCGAAACTCCCTTCAATTGTATCCGAACTAAGACCCACCTTCGAA

                                   SacI
                                     |
                                    Eco57MI
                                       |
                                 Bsp1286I
                                     |
                                  BsiHKAI
                                     |
                                     BpmI
                        ScrFI          |
                      PspGI     EcoICRI
                        BstNI      |
                      BssKI      CviJI              MnlI
                        |          |anII        Eco57MI
              Hpy188III    CviJI    DdeI        BspCNI
                  |  HaeIII  |        | CviJI    BseMII
                      CviJI      AluI   AluI     BpmI
                        |          |      |        |
     W  C  V  V  I  W  R  P  G  W  S  S  Q  L  T  M  G  S  S  L 
       V  C  S  Y  L  E  A  R  L  E  L  S  A  H  Y  G  F  I  F  I
      G  V  *  L  S  G  G  Q  A  G  A  L  S  S  L  W  V  H  L  Y
     GGTGTGTAGTTATCTGGAGGCCAGGCTGGAGCTCTCAGCTCACTATGGGTTCATCTTTAT
         *    *    * 1340    *    *    * 1360    *    *    * 1380
     CCACACATCAATAGACCTCCGGTCCGACCTCGAGAGTCGAGTGATACCCAAGTAGAAATA

                         TaqII
                           |
                          AlwNI
                            |
                           ScrFI
                             |
                         PspGI
                           |
                           BstNI
                             |
                         BssKI
                           |
                          BsaJI                MnlI
                            |                HpyCH4III
                       CviJI            Tsp45I   |
                         |              MaeIII
         BsmAI         AluI             BstEII       HphI
           |             |                 |           |
     L  S  P  F  I  S  T  A  P  G  K  C  A  G  D  R  F  G  N  P 
       V  S  F  H  L  N  S  S  W  E  M  C  W  *  P  F  W  Q  S  I
      C  L  L  S  S  Q  Q  L  L  G  N  V  L  V  T  V  L  A  I  H
     TGTCTCCTTTCATCTCAACAGCTCCTGGGAAATGTGCTGGTGACCGTTTTGGCAATCCAT
         *    *    * 1400    *    *    * 1420    *    *    * 1440
     ACAGAGGAAAGTAGAGTTGTCGAGGACCCTTTACACGACCACTGGCAAAACCGTTAGGTA

Enzymes that cut

AluI           AG'CT
BstUI          CG'CG
Cac8I          GCn'nGC
CviJI          rG'Cy
DpnI           GA'TC
DraI           TTT'AAA
EcoICRI        GAG'CTC
HaeIII         GG'CC
HincII         GTy'rAC
HpaI           GTT'AAC
Hpy8I           GTn'nAC
HpyCH4V        TG'CA
MscI           TGG'CCA
SwaI           ATTT'AAAT
AclI           AA'CG_TT
ApoI           r'AATT_y
BbsI           GAAGACnn'nnnn_
BbvI           GCAGCnnnnnnnn'nnnn_
BfaI           C'TA_G
BsaJI          C'CnnG_G
BsmAI          GTCTCn'nnnn_
BspMI          ACCTGCnnnn'nnnn_
BssHII         G'CGCG_C
BssKI          'CCnGG_
BstEII         G'GTnAC_C
BstNI          CC'w_GG
Bsu36I         CC'TnA_GG
DdeI           C'TnA_G
EaeI           y'GGCC_r
EcoRI          G'AATT_C
FatI           'CATG_
Fnu4HI         GC'n_GC
HinP1I         G'CG_C
HindIII        A'AGCT_T
HinfI          G'AnT_C
Hpy188III      TC'nn_GA
HpyCH4IV       A'CG_T
MaeIII         'GTnAC_
MboI           'GATC_
MfeI           C'AATT_G
MseI            T'TA_A
PspGI          'CCwGG_
ScrFI          CC'n_GG
SfaNI          GCATCnnnnn'nnnn_
SfcI           C'TryA_G
SpeI           A'CTAG_T
TfiI           G'AwT_C
TseI           G'CwG_C
Tsp45I         'GTsAC_
 

Enzymes that do not cut

AfeI           AGC'GCT
AleI           CACnn'nnGTG
BfrBI          ATG'CAT
BmgBI          CAC'GTC
BsaAI          yAC'GTr
BsaBI          GATnn'nnATC
BsrBI          CCG'CTC
BstZ17I        GTA'TAC
EcoRV          GAT'ATC
FspI           TGC'GCA
FspAI          rTGC'GCAy
MlyI           GAGTCnnnnn'
MslI           CAynn'nnrTG

INPUT FILES

[ Previous | Top| Next]

Map+ accepts a single nucleotide or protein sequence as input. The function of Map+ depends on whether your input sequence(s) are protein or nucleotide. Map+ also accepts a combination of protein sequence and nucleotide sequence.

 

RELATED PROGRAMS

[ Previous | Top| Next]

FindPatterns+ searches for short patterns like enzyme recognition sites in one or more sequences.

FindPatterns searches for short patterns like enzyme recognition sites in one or more sequences.

Map maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide Map of an amino acid sequence.

 

CONSIDERATIONS

[ Previous| Top| Next]

Map+ does not treat your sequence as circular unless you use -circular.

The enzymes you name must be in the enzyme data file or you get an error message. You can have your system manager change the public enzyme data file to contain the enzymes most useful to your group, or you can maintain a private copy for your own use. (See the LOCAL DATA FILES topic below for more information.)

SUBSET, OVERLAP, AND PERFECT SEARCHES

[ Previous| Top| Next]

This program normally requires that a sequence pattern be a subset of the enzyme recognition site. If the recognition pattern in the enzyme data file were GCRGC, then the pattern GCAGC in your sequence would be found, since A is within the set of bases defined by R (see Appendix III). If the pattern in the enzyme data file were GCAGC, then a GCRGC in your sequence would not be recognized. If your sequence is very ambiguous, as it might be if it were a backtranslated sequence, then it may be better to use -all to do an overlap search. The overlap search would consider an R in your sequence to match an A in the recognition site.

With -perfect, the program looks for a perfect symbol match between your sequence and the recognition pattern -- GCRGC in the recognition pattern would only match a GCRGC in the sequence.

Note:  1. When -all and -perfect are specified, -Perfect takes precedence.

           2. When -perfect and –silent are specified together, program gives an error message.

All searches are case insensitive (upper- or lowercase) for the letters in either the sequence or the enzyme recognition site.

DISPLAY CONVENTIONS

[ Previous| Top| Next]

Cut Position

As in almost all sequence displays the 5'->3' direction of the top strand is from left to right. Map+ aligns each enzyme's name so that the name ends over the 3' end of the fragment that continues to the left.

Collisions

If more than one enzyme cuts at the same position, Map+ sorts the set of enzymes that cut at the position alphabetically and stacks them up so that each enzyme name ends over the same position. If enzymes that cut to the left are in the way of the display, Map+ puts the names further up and uses a line of '|' characters to connect the name to the cut position.

Potential Sites

When you search for potential restriction sites with either -mismatch or -silent, Map+ differentiates the real sites from the potential sites by capitalizing the enzyme's name at the real sites.

SELECTING ENZYMES

 [ Previous| Top| Next]

The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. To get help with selecting enzymes, type a ? at the enzyme prompt. Here is what you see:

 
 
Select enzymes:
 
Type "*" to select all enzymes.
Type "^A" to select all enzymes starting with "A."
Type parts of names like "Al*" to select all enzymes starting with "AL."
Type "#" to select no enzymes at all.
 
Spaces are allowed; upper and lower case are equivalent.

We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.

            There is more information on enzyme files in Appendix VII. 

A command-line expression like -enzymes=AluI,EcoRII would choose AluI and EcoRII and suppress interactive enzyme selection.

 

CHOOSING THE TRANSLATION FRAMES

[ Previous| Top| Next]

You can name the frames of interest individually with a response like abcf. You can use t or s to mean the three forward or all six possible translation frames. You can make all of the characters in your response uppercase to get three-letter instead of one-letter amino acid symbols in the translation.

You can use an expression like -menu=abcf to choose translation frames a, b, c, and f from the command line.

 

TABLE OUTPUT

[ Previous| Top| Next]

If you want to analyze the restriction sites in another program you can display all the cut positions in a table. Use -TABle to get output like this:

 
 
 (Linear) MAP+ of: gamma.seq  check: 6474  from: 2161  to: 2600
 
Human fetal beta globins G and A gamma
from Shen, Slightom and Smithies,  Cell 26; 191-203.
Analyzed by Smithies et al. Cell 26; 345-353.
 
 With 216 enzymes: *
 
Enzyme        +      -    November 25, 2004 17:59..
 
BfaI       2165   2167
BspGI      2170   2170
BsaHI      2174   2176
 
//////////////////////

Normally, the table is sorted by position first and then alphabetically by enzyme name. You can sort the table by enzyme name first and then by position with -sortbyenzyme.

If you display the cut positions in a table using -table, the program does not create the standard output file displaying the sequence and the restriction sites along that sequence.

POTENTIAL RESTRICTION SITES

[ Previous| Top| Next]

To assist scientists doing site-directed mutagenesis, this program searches for places in your sequence where a restriction enzyme recognition site occurs with one or more mismatches. Use -mismatch=1 to identify positions where recognition could occur with one or fewer mismatches.

Use -silent to find the places in your sequence where a restriction site could be introduced without changing the translation. Read more about using -silent under the PARAMETER REFERENCE topic below.

SEARCH FOR ANY SEQUENCE PATTERN

[ Previous| Top| Next]

By changing the enzyme data file (see the LOCAL DATA FILES topic below), you can make this program search for any pattern. See Appendix VII for notes on enzyme data files.

DEFINING PATTERNS

[ Previous| Top| Next]

FindPatterns, Map, MapSort, MapPlot, and Motifs all let you search with ambiguous expressions that match many different sequences. The expressions can include any legal GCG sequence character (see Appendix III). The expressions can also include several non-sequence characters, which are used to specify OR matching, NOT matching, begin and end constraints, and repeat counts. For instance, the expression TAATA(N){20,30}ATG means TAATA, followed by 20 to 30 of any base, followed by ATG. Following is an explanation of the syntax for pattern specification.

Implied Sets and Repeat Counts

Parentheses () enclose one or more symbols that can be repeated some number of times. Braces {} enclose numbers that tell how many times the symbols within the preceding parentheses must be found.

Sometimes, you can leave out part of an expression. If braces appear without preceding parentheses, the numbers in the braces define the number of repeats for the immediately preceding symbol. One or both of the numbers within the braces may be missing. For instance, both the pattern GATG{2,}A and the pattern GATG{2}A mean GAT, followed by G repeated from 2 to 350,000 times, followed by A; the pattern GATG{}A means GAT, followed by G repeated from 0 to 350,000 times, followed by A; the pattern GAT(TG){,2}A means GAT, followed by TG repeated from 0 to 2 times, followed by A; the pattern GAT(TG){2,2}A means GAT, followed by TG repeated exactly 2 times, followed by A. (If the pattern in the parentheses is an OR expression (see below), it cannot be repeated more than 2,000 times.)

OR Matching

If you are searching nucleic acids, the ambiguity symbols defined in Appendix III let you define any combination of G, A, T, or C. If you are searching proteins, you can specify any of several symbol choices by enclosing the different choices in parentheses and separating the choices with commas. For instance, RGF(Q,A)S means RGF followed by either Q or A followed by S. The length of each choice need not be the same, and there can be up to 31 different choices within each set of parentheses. The pattern GAT(TG,T,G){1,4}A means GAT followed by any combination of TG, T, or G from 1 to 4 times followed by A. The sequence GATTGGA matches this pattern. There can be several parentheses in a pattern, but parentheses cannot be nested.

NOT Matching

The pattern GC~CAT means GC, followed by any symbol except C, followed by AT. The pattern GC~(A,T)CC means GC, followed by any symbol except A or T, followed by CC.

Begin and End Constraints

The pattern <GACCAT can only be found if it occurs at the beginning of the sequence range being searched. Likewise, the pattern GACCAT> would only be found if it occurs at the end of the sequence range.

COMMAND-LINE SUMMARY

[ Previous| Top| Next]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

Minimal Parameters (case-insensitive):

 

-infile         [Type: String / Default: EMPTY / Aliases: infile1 in]

                The name of the input file.

 

Prompted Parameters (case-insensitive):

 

-begin          [Type: Integer / Default: '1' / Aliases: beg]

                First base of interest in each query sequence.

 

-end            [Type: Integer / Default: '-1']

                Last base of interest in each query sequence.

 

-enzymes        [Type: List / Default: '*' / Aliases: enz]

 

-menu           [Type: String / Default: 't' / Aliases: men]

                Selects translation frames.

 

-outfile        [Type: String / Default: '<sequence_name>.map+' / Aliases: out] Names the output file.

 

Optional Parameters (case-insensitive):

 

-check          [Type: Boolean / Default: 'false' / Aliases: che help]

                Prints out this usage message.

 

-default        [Type: Boolean / Default: 'false' / Aliases: d def]

                Specifies that sensible default values be used for all parameters where possible.

 

-documentation [Type: Boolean / Default: 'true' / Aliases: doc]

                Prints banner at program startup.

 

-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]

                Tells application to print only a minimal amount of information.

 

-doclines       [Type: Integer / Default: EMPTY / Aliases: docl]

                Specifies number of documentation lines to copy.

 

-data           [Type: String / Default: EMPTY / Aliases: dat]

                Name of enzymes/reagents.

 

-rsf            [Type: String / Default: EMPTY]

                Save map cut sites as features in rsf format.

 

-once           [Type: Boolean / Default: 'false' / Aliases: onc]

                Shows enzymes that cut only once.

 

-mincuts        [Type: Integer / Default: '1' / Aliases: minc]

                Minimum number of cuts.

 

-maxcuts        [Type: Integer / Default: '100000' / Aliases: maxc]

                Maximum number of cuts.

 

-minsitelen     [Type: Integer / Default: '1' / Aliases: mins]

                Sets minimum number of bases in recognition site.

 

-overhang       [Type: List / Default: '0,5,3' / Aliases: ove]

                Selects restriction enzymes that leave5' overhang3' overhangblunt end.

 

-mismatch       [Type: Integer / Default: '0' / Aliases: mis]

                Number of allowed mismatches.

 

-perfect        [Type: Boolean / Default: 'false' / Aliases: perf]

                Accepts only perfect matches.

 

-all            [Type: Boolean / Default: 'false' / Aliases:]

                Find overlapping set matches.

 

-silent         [Type: Boolean / Default: 'false' / Aliases: sil]

                Find translationally silent potential restriction sites.

 

-circular       [Type: Boolean / Default: 'false' / Aliases: cir]

                Treat the sequence as circular.

 

-linear         [Type: Boolean / Default: 'true' / Aliases: lin]

                Treats the sequence as linear.

 

-nocompline     [Type: Boolean / Default: 'false' / Aliases: nocomp]

                Display the complement strand in the output.

 

-append         [Type: Boolean / Default: 'false' / Aliases: app]

                Append the input data files to the outfile.

 

-translate      [Type: String / Default: 'translate.txt' / Aliases: trans] Translation file.

 

-threeletter    [Type: Boolean / Default: 'false' / Aliases: thr]

                Uses three letter code amino acid symbols to display translation.

 

-width          [Type: Integer / Default: '60' / Aliases: wid]

                Bases per line.

 

-table          [Type: Boolean / Default: 'false' / Aliases: tab]

                Writes a table of cut sites.

 

-sortbyenzyme   [Type: Boolean / Default: 'false' / Aliases: sor]

                Sorts table output first by enzyme.

 

-cutters        [Type: String / Default: EMPTY / Aliases: cut]

                Writes an enzyme data file with enzymes that did cut.

 

-noncutters     [Type: String / Default: EMPTY / Aliases: noncut]

                Writes an enzyme data file with enzymes that did not cut.

 

-excutters      [Type: String / Default: EMPTY / Aliases: excut]

                Writes an enzyme data file with enzymes that were excluded.

 

-nomonitor      [Type: Boolean / Default: 'false' / Aliases: nomon quite qui] Monitor program progress.

 

-nosummary      [Type: Boolean / Default: 'false' / Aliases: nosum]

                Provide summary at the end of the program.

 

-numfreq        [Type: Integer / Default: '20']

                Frequency of numbering sequences.

 

-leftmargin     [Type: Integer / Default: '5']

                Margin size.

 

-markfreq       [Type: Integer / Default: '5']

                Frequency of marking sequences.

 

-markchar       [Type: String / Default: '*']

                Display character for marking.

 

-markpos        [Type: String / Default: 'middle']

                Position of numbers and marking characters.

LOCAL DATA FILES

[ Previous| Top| Next]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

This program reads the public or local version of enzyme.dat to get the enzyme names, recognition sites, cut positions, and overhangs. You can use mapping programs to search for any sequence pattern by adding the pattern to the enzyme data file. If you use the command-line parameter -append, this program appends the enzyme data file to the output file. (See Appendix VII for more information about enzyme data files.)

If Map+ finds Type: P on the dividing line in the sequence file, it reads proteolytic cleavage data in the local data file proenzyme.dat.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -translate=mycode.txt. Translation tables are discussed in more detail in Appendix VII. If you use the command line parameters -append, this program appends the enzyme data file to the output file. If you have provided your own translation scheme that file is also appended.

PARAMETER REFERENCE

[ Previous| Top| Next]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

     -infile, -infile1, -in

 

                       The name of the input file.

 

     -outfile, -out

 

Names the output file.

 

     -begin, -beg

 

            First base of interest in each query sequence.

 

      -end

 

            Last base of interest in each query sequence.

 

-enzymes=*,-enz

Specifies the restriction enzymes whose recognition sites you want to search. If you search for several different enzymes, separate their names with commas. -enzymes=* selects all enzymes, -enzymes=“^A selects all enzymes starting with A”, including isoschizomers, and -enzymes=Al* selects all enzymes whose names start with Al.

      -doclines, -docl

 

                       Specifies number of documentation lines to copy.

 

-data, -dat

 

                        Name of enzymes/reagents.

          

-check, -che, -help

 

            Prints out this usage message.

 

  -default, -d, -def

 

                       Specifies that sensible default values be used for all parameters where possible.

 

 -documentation, -doc

 

            Prints banner at program startup.

 

-quiet, -qui

 

This parameter is not supported.

 

-menu=t, -men

Specifies which nucleotide reading frames are translated into protein sequences in the output file. Specify t for three forward frames, s for all six frames, or n for no protein translation. You can also specify one of the letters a through f for any one of the six possible reading frames.

-translate=filename.txt, -trans

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-rsf=map.rs

Writes an RSF (rich sequence format) file containing the input sequences annotated with features generated from the results of Map+. This RSF file is suitable for input to other Accelrys GCG (GCG) programs that support RSF files. In particular, you can use SeqLab to view this features annotation graphically. If you don't specify a file name with this parameter, then the program creates one using map+ for the file basename and .rsf for the extension. For more information on RSF files, see "Using Rich Sequence Format (RSF) Files" in Section 2 of the User's Guide. Or, see "Rich Sequence Format (RSF) Files" in Appendix C of the SeqLab Guide.

-circular, -cir

Tells Map+ to treat your sequence as circular. If a possible recognition site starts at the end and continues into the beginning of the sequence, the site is marked at the point where a circular molecule would be cut. For instance if your sequence ends in GAA and starts with TTC, Map+ shows an EcoRI cut two bases before the end of the sequence. The sequence is only circularized at the ends found in the file, so if you want a subrange to be treated as circular you have to create a file in which the subrange is the entire sequence (see the Assemble program).

-linear, -lin

Is the opposite of -circular. If you have defined a command that runs Map+ with -circular as the default, use the -linear parameter to make Map+ treat your sequence as linear.

-width=100,-wid

Allows you to choose the number of bases shown on each line of output. The standard is 60, which can be shown on a terminal screen nicely, but 100 sequence symbols per line is very convenient for estimating the size of fragments between cuts.

-threeletter, -thr

Sets the translation to show three-letter amino acid codes instead of the one-letter codes. Normally you can set the translation to show three-letter amino acid codes by capitalizing your response to the protein translation program prompt. However, when you choose protein translation from the command line, you must add -threeletter to get three-letter amino acid codes.

-mismatch=1, -mis

Causes the program to recognize sites that are like the recognition site but with one or fewer mismatches. If too many mismatches are allowed, the results may not be meaningful. The output from most mapping programs distinguishes between sites with no mismatches and sites with mismatches.

-silent, -sil

Shows the places where restriction sites can be introduced (by site-directed mutagenesis) without changing the peptide translation of the sequence. The -silent parameter assumes that the range you have chosen defines a coding region and reading frame precisely. Sites may be found that have any number of bases changed as long as the changes do not alter the translation. The reading frame is implied by the beginning coordinate you specify. The output from most mapping programs distinguishes between real sites and sites with one or more mismatches. The data file translate.txt defines the genetic code.

-perfect, -perf

Sets the program to look for a perfect alphabetic match between the site and the sequence. Ambiguity codes are normally translated so that the site RXY would find sequences like ACT or GAC. With this parameter, the ambiguity codes are not translated so the site RXY would only match the sequence RXY. This parameter is not the same as -mismatch=0!

                   Note: -silent should not be seen with –perfect as both have different functionalities.

 

-all

Makes an overlap-set map+ instead of the usual subset map+. If your sequence is very ambiguous (for instance, as a back-translated sequence would be) and you want to see where restriction sites could be, then an overlap-set map+ is for you. Overlap-set and subset pattern recognition is discussed in more detail in the Program Manual entry for Window.

                   Note: When –all and –perfect are specified, program should use –perfect.

-append, -app

Appends the enzyme data file to your output file. If you provided your own translation scheme, that file is also appended.

-cutters=gamma.cutters, -cut

Writes out a new enzyme data file containing those selected enzymes that did cut your sequence and were not excluded with any of the -mincuts, -once, -maxcuts, and -exclude parameters. If you do not add a file name to the -cutters parameter the output file will have the name of your sequence followed by the file name extension .cutters

-noncutters=gamma.noncutters, -noncut

Writes out a new enzyme data file containing the selected enzymes that did NOT cut your sequence. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension .noncutters

-excutters=gamma.excutters, -excut

Writes out a new enzyme data file containing those enzymes that did cut your sequence but were excluded with any of the -exclude, -mincuts, -once, and -maxcuts parameters. If you do not add a file name to this parameter the output file will have the name of your sequence followed by the file name extension .excutters

                   The parameters -minsitelen and -overhang restrict the domain of enzymes selected.

-minsitelen=6, -mins

Selects only patterns with the specified number or more bases in the recognition site. You can display the sites from any pattern in the enzyme or pattern file that you take the trouble to name individually, but when you use all of the patterns, the program uses all of the patterns whose recognition sites have the specified number bases. -minsitelen=6 replaces the -sixbase parameter from earlier versions of GCG.

-overhang=0, -ove

Selects only enzymes that leave blunt ends. Use a 5 with this parameter to search only with enzymes that leave 5' overhangs and a 3 to search only with enzymes that leave a 3' overhang. You can use multiple values, separated by commas. For instance, -overhang=5,3 searches with all enzymes that leave either 5' or 3' overhangs. You can display the cuts from any enzyme in the enzyme data file that you take the trouble to name individually, but when you use * (meaning all), the program uses all of the enzymes whose overhangs conform to your choice with this parameter.

The -mincuts, -maxcuts, -once, and -exclude parameters suppress the display of selected enzymes. The list of excluded enzymes in the program output includes both selected enzymes that cut within excluded ranges and selected enzymes that did not cut the right number of times.

-mincuts=2, -minc

                       Excludes enzymes that do not cut at least two times.

-maxcuts=2, -maxc

                       Excludes enzymes that cut more than two times.

-once, -onc

Excludes, from the set of enzymes displayed, those enzymes that cut your sequence more than once (equivalent to setting both mincuts and maxcuts to one).

      -nocompline, -nocomp

                       Suppresses complement sequence display.

      -table, -tab

           If you simply want a table of which enzymes cut where use this parameter. See the topic TABLE OUTPUT.

      -nomonitor, -nomon, -quite, -qui

 

                      Monitor program progress.

 

      -nosummary, -nosum

 

            Provide summary at the end of the program.

 

-sortbyenzyme, -sor

Table output is normally sorted by the position of the cut in the top strand of the sequence. Use this parameter to see the cuts sorted first by enzyme and then by position. See the topic TABLE OUTPUT.

    

          NEW FUNCTIONS AVAILABLE

[ Previous | Top| Next]

-numfreq

           Specify frequency of numbering sequences. Range is 0-00. Default: 20. If set to 0, no numbering is shown.

-leftmargin

                       Specify a margin size. Range is 5-50. Default: 5

-markfreq

                       Specify the frequency for marking sequences. Range is 0-100. Default: 10

-markchar

                       Enter any display character for marking Default: *

-markpos

Select Above Sequence, Below Sequence, or Between Strands (default). For single stranded output, Between Strands is equivalent to Below Sequence.

OPTIONS EXCLUDED IN MAP+

[ Previous | Top ]

                        The following options are not included in the MAP+ application of GCG.

                        page[62], bottom ,vertical, nocutline, noseqline, noscaleline, and open.

    

Printed: June 1, 2005 19:21


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio