MEME+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

ALGORITHM

CONSIDERATIONS

SUGGESTIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

MEME+ finds conserved motifs in a group of unaligned sequences. MEME+ saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P      Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P      Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

MEME+ uses the method of Bailey and Elkan to identify likely motifs within the input set of sequences. You may specify a range of motif widths to target, as well as the number of unique motifs to search for. MEME+ uses Bayesian probability to incorporate prior knowledge of the similarities among amino acids into its predictions of likely motifs. The resulting motifs are output as profiles. A profile is a log-odds matrix used to judge how well an unknown sequence segment matches the motif.

EXAMPLE

[ Previous | Top | Next ]

Here is a session with MEME+ that was used to find motifs in a group of calcium-transporting membrane proteins listed in the file pircat.list.

 
11:27~130> meme+
 
Meme+ finds conserved motifs in a group of unaligned sequences. Meme+ saves these motifs as a set of profile. You can search a database of sequences with these profiles using the MotifSearch program.
 
Find motifs in what sequence(s) (* meme+.list *) ? pircat.list
How many motifs to search (* 6 *) ? 2
What should I call the profile file (* meme+.prf *) ?
What should I call the report file (* meme+.meme *) ?
 
 
 
The file meme+.meme already exists. Creating meme+_1.meme as the output file.
 
 
Sequences searched          : 7
Number of motifs identified : 2
Output profile file         : meme+.prf
Output report               : meme+_1.meme

OUTPUT

[ Previous | Top | Next ]

MEME+ generates a report and a file containing one or more ungapped GCG profiles. 

MEME+'s report file gives details about the motifs that help you analyze the validity and usefulness of the results. The file first lists the training set, or input sequences. ("Training set" is a common term for a set of examples from which an intelligent program learns a general concept.) After echoing the parameters you specified, the file gives a detailed description of each motif found. This report includes three different representations of the motif: Two versions of a letter-probability matrix, and a consensus sequence showing all likely letters for each position. (A fourth representation is the ungapped profile that is written to the other output file.) There are six different types of information presented:

- The simplified letter-probability matrix shows probabilities for each letter at each position of the motif (Probabilities are multiplied by 10, and displayed as integers. Values below 0.5 are displayed as ':'. Values above 9.5 are displayed as 'a'.)

- The information content bar graph shows how many bits of information are provided by each position in the motif. This is a measure of how well-conserved the positions of the motif are.

- The multilevel consensus sequence shows, for each position, all letters with a probability >= 0.2 of appearing in that position .

- The BLOCKS format section uses Henikoff's BLOCKS format to display occurrences of the motif within the sequences of the training set.

- The list of possible examples shows the highest scoring matches to the motif, with scores and sequence context included.

- The letter probability matrix shows the probabilities for each letter at each position of the matrix.

Note that this matrix is transposed with respect to the simplified letter-probability matrix. That is, the first row of the simplified matrix corresponds to the first column of this matrix.

For more details about the output, consult Tim Bailey's MEME+ website at http://www.sdsc.edu/MEME+. (Note that the log-odds matrices referred to at the website correspond to the profiles that appear in a separate output file from the Accelrys GCG (GCG))


Here is some of the output from the EXAMPLE:

 
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 3.0 (Release date: 2002/04/02 00:11:59)
 
For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.sdsc.edu.
 
This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs.  MAST is available
for interactive use and downloading at http://meme.sdsc.edu.
********************************************************************************
 
 
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
 
Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************
 
 
********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= /var/tmp/bslskHAAppayHN.fasta
ALPHABET= ACDEFGHIKLMNPQRSTVWY
Sequence name            Weight Length  Sequence name            Weight Length
-------------            ------ ------  -------------            ------ ------
A42764.PIR2              1.0000    919  A48849.PIR2              1.0000    994
B31981.PIR2              1.0000    997  PWBYR1.PIR1              1.0000    950
S24359.PIR2              1.0000    994  S71168.PIR2              1.0000    946
********************************************************************************
 
********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.
 
command: meme /var/tmp/bslskHAAppayHN.fasta -protein -text -print_fasta
 
model:  mod=         zoops    nmotifs=         2    evt=           inf
object function=  E-value of product of p-values
width:  minw=            8    maxw=           50    minic=        0.00
width:  wg=             11    ws=              1    endgaps=       yes
nsites: minsites=        2    maxsites=        6    wnsites=       0.8
theta:  prob=            1    spmap=         pam    spfuzz=        120
em:     prior=       megap    b=           29000    maxiter=        50
        distance=    1e-05
data:   n=            5800    N=               6
 
sample: seed=            0    seqfrac=         1
Dirichlet mixture priors file: prior30.plib
Letter frequencies in dataset:
A 0.078 C 0.021 D 0.051 E 0.069 F 0.042 G 0.066 H 0.012 I 0.072 K 0.059
L 0.101 M 0.031 N 0.043 P 0.041 Q 0.029 R 0.041 S 0.066 T 0.065 V 0.087
W 0.011 Y 0.017
Background letter frequencies (from dataset with add-one prior applied):
A 0.077 C 0.021 D 0.051 E 0.069 F 0.042 G 0.066 H 0.012 I 0.072 K 0.059
L 0.101 M 0.031 N 0.043 P 0.041 Q 0.029 R 0.041 S 0.066 T 0.065 V 0.087
W 0.011 Y 0.017
********************************************************************************
 
 
********************************************************************************
MOTIF  1        width =   50   sites =   6   llr = 725   E-value = 1.6e-109
********************************************************************************
--------------------------------------------------------------------------------
        Motif 1 Description
--------------------------------------------------------------------------------
Simplified        A  ::::a:2::::7:::a:::::22::::::2::::::::::::::::::::
pos.-specific     C  ::5::::::::::::::::::::2::::72:::a::::::::::::::5:
probability       D  :::::::::::::2:::::::::::::::::::::a::::::::::::::
matrix            E  ::::::::::::::::::::::::a:::::::::::::::::::2:::::
                  F  :::::2::::::::::::::::::::::::::::::::::::::::::::
                  G  ::::::8::::::::::::::::::::a::::::::::a:::::::::::
                  H  ::::::::::::::::::::::::::::::::::::::::::::3::::2
                  I  ::::::::::::::::8:::::2:::::::::a:::::::::::::::::
                  K  ::::::::22::873:::22::::::::::::::::a:::::2::::::5
                  L  :2:a:8::2:::::::2:::a:::::8:::::::::::::a:::::::::
                  M  :::::::22:a2::::::::::::::2::::::::::::::::::a::::
                  N  ::::::::::::2:5::::2::::::::::3::::::::::::a::::::
                  P  :::::::::::::::::::::8::::::::::::::::::::::::::::
                  Q  ::::::::::::::::::::::::::::::::::::::::::::5:::::
                  R  ::::::::58:::22:::82:::::::::::::::::::::::::::::3
                  S  ::2::::::::::::::::5::7:::::3:5:::a:::::::2:::5:2:
                  T  a53::::5:::::::::::::::::a:::522:::::a:a:a7:::5:2:
                  V  :3:::::3:::2:::::a:::::8:::::2:8:::::::::::::::a2:
                  W  ::::::::::::::::::::::::::::::::::::::::::::::::::
                  Y  ::::::::::::::::::::::::::::::::::::::::::::::::::
 
         bits    6.5
                 5.9                                  *
                 5.2           *                      *           *
                 4.6           *                      * *       * *
Information      3.9 *   *    **    *  *  *  ** **   ******** * ***
content          3.3 * *** *  ** ******* ** ******  *********** ***** *
(174.3 bits)     2.6 * ************************************************
                 2.0 **************************************************
                 1.3 **************************************************
                 0.7 **************************************************
                 0.0 --------------------------------------------------
 
Multilevel           TTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVCK
consensus             VT    V      K             S N             H T  R
sequence
 
 
--------------------------------------------------------------------------------

And here is an excerpt from the profile file:

!!AA_PROFILE 2.0
 
(Peptide) ..
{
 MEME v3.0 of: pircat.list  Length: 50
!A48849.PIR2     From: 316         To: 365         Weight: 1.000000
!S24359.PIR2     From: 316         To: 365         Weight: 1.000000
!B31981.PIR2     From: 316         To: 365         Weight: 1.000000
!PWBYR1.PIR1     From: 336         To: 385         Weight: 1.000000
!A42764.PIR2     From: 315         To: 364         Weight: 1.000000
!S71168.PIR2     From: 421         To: 470         Weight: 1.000000
                          Gap: 1.00              Len: 1.00
                     GapRatio: 0.0          LenRatio: 0.0
Cons   A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y   Gap  Len
}
 T   -244   -260   -360   -422   -384   -360   -234   -323   -337   -401   -294   -217   -356   -261   -301    -65    374   -296   -352   -301  100  100
! 1
 T   -174   -175   -383   -361   -174   -341   -179    -26   -310     55   -103   -271   -326   -235   -271   -173    254    170   -238   -141  100  100
 C    -70    412   -343   -383   -354   -200   -218   -322   -338   -359   -284   -225   -282   -244   -291    125    236   -258   -342   -276  100  100
 L   -341   -312   -479   -439   -161   -459   -252   -134   -401    310    -61   -420   -375   -276   -322   -391   -338   -203   -269   -203  100  100
 A    345   -138   -376   -374   -345   -188   -250   -343   -365   -340   -291   -328   -356   -295   -314    -99   -225   -214   -325   -282  100  100
 L   -322   -291   -477   -425     95   -465   -226   -103   -388    294    -28   -406   -365   -255   -307   -378   -316   -178   -229   -141  100  100
 G    -44   -314   -248   -338   -432    369   -206   -462   -309   -482   -395   -208   -345   -296   -265   -210   -334   -410   -326   -290  100  100
 T   -162   -167   -351   -340   -180   -321   -157    -52   -282   -127    172   -226   -306   -206   -244   -108    268    152   -228   -136  100  100
 R   -232   -290   -284   -215   -349   -307    -29   -307    119    -28    121   -183   -284    -36    377   -218   -231   -306   -250   -185  100  100
 R   -336   -287   -408   -406   -451   -383    -75   -419    -25   -392   -380   -283   -319   -146    442   -331   -360   -478   -254   -287  100  100
 M   -446   -370   -513   -548   -385   -508   -357   -291   -486   -224    489   -477   -473   -426   -488   -453   -442   -359   -289   -237  100  100
! 11
 
{
 MEME v3.0 of: pircat.list  Length: 50
!S24359.PIR2     From: 719         To: 768         Weight: 1.000000
!B31981.PIR2     From: 718         To: 767         Weight: 1.000000
!A48849.PIR2     From: 719         To: 768         Weight: 1.000000
!A42764.PIR2     From: 661         To: 710         Weight: 1.000000
!PWBYR1.PIR1     From: 697         To: 746         Weight: 1.000000
!S71168.PIR2     From: 701         To: 750         Weight: 1.000000
                          Gap: 1.00              Len: 1.00
                     GapRatio: 0.0          LenRatio: 0.0
Cons   A      C      D      E      F      G      H      I      K      L      M      N      P      Q      R      S      T      V      W      Y   Gap  Len
}
 A    250   -164   -374   -349   -247   -209   -215   -189   -323   -188    355   -298   -298   -247   -283   -127   -191   -174   -293   -207  100  100
! 1
 

INPUT FILES

[ Previous | Top | Next ]

The input to MEME+ is a set of either nucleotide or protein sequences (not both). The function of MEME+ depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

MEME+ respects the begin and end attributes for controlling the range of interest for sequences in list files (but see RESTRICTIONS, below). MEME+ also respects the strand list file attribute for nucleotide sequences.

RELATED PROGRAMS

[ Previous | Top | Next ]

PileUp, Clustalw+ creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. These multiple alignment programs can also plot a tree showing the clustering relationships used to create the alignment. ProfileMake creates a position-specific scoring table, called a profile that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap). ProfileSearch uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake. ProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences. ProfileGap makes an optimal alignment between a profile and one or more sequences.

MEME+'s output can best be appreciated by running the output profiles through MotifSearch, another program in GCG. You will probably want to run MotifSearch at least twice. First, you should use the profiles to search the original training set of sequences. Second, you may wish to search a larger database to identify similar sequences. See the documentation for MotifSearch for details.

MEME finds conserved motifs in a group of unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.

 

RESTRICTIONS

[ Previous | Top | Next ]

You can analyze at most 1,000,000 residues at one time.

If you wish to use both strands of nucleotide sequences, you must specify the one-per model (described in ALGORITHM, below) via the –oneexactly parameter.

MEME+ cannot process multiple sequences with the same name. If MEME+ encounters a second sequence with the identical name as a previous one, it will ignore the second. Thus, you cannot analyze several segments of a single sequence by creating several list file entries of that sequence and specifying different begin and end attributes for each entry.

Meme+ cannot process multiple sequences that contain both type:N and type:P. Meme+ can only process multiple sequences of only one type either :P or :N

ALGORITHM

[ Previous | Top | Next ]

MEME+ [version 3.0]

MEME+ implements the method of Bailey and Elkan, to find one or more motifs that characterize a family of sequences. The core of MEME+ is Expectation Maximization (EM), an unsupervised learning algorithm guaranteed to converge to a local maximum. That is, any motif found by MEME+ will be "better" (according to MEME+'s statistical criteria) than any other motif that differs infinitesimally from the first.

One of the criteria applied by MEME+ depends on your choice of a model. MEME+ can either a) favor motifs that appear exactly once in each sequence in the training set (the one-per model); b) favor motifs that appear zero or one time in each sequences in the training set (the zero-or-one-per model); or c) give no preference to the number of occurrences (the zero-or-more-per model).

MEME+ makes use of Dirichlet priors in its EM calculations for protein sequences. These are empirical statistical measures of the interchangeability of amino acids within subsequences of similar function. Suppose there are two amino acid sequences, S1 and S2, having the same length. If the first residue in S1 is I, and the first residue in S2 is V, then there is some likelihood that S1 and S2 have the same function, given their similarity in the first position. We can estimate that likelihood by analyzing the set of subsequences whose functionality is established.

A drawback to EM is that the maximum it finds is only local. There may be better solutions that were overlooked due to an unlucky choice of the starting point -- EM's initial guess at the solution. This is a nontrivial and heavily studied problem. One approach is to run the algorithm from a large subset of the possible starting points. You may choose the subset to be evenly distributed across the solution space, or to be randomly selected. In any case, this may take a daunting amount of time.

MEME+ refines this approach by taking a carefully chosen subset of possible solutions and running a single iteration of EM on each. It then chooses one from among these as its best candidate, and runs EM to convergence from there. When searching for a starting point, MEME+ does not consider all possible starting points within the range of widths it is given; rather, it surveys starting points at particular steps within the range given. Thus, if using the default range of 8 to 57, MEME+ will only consider initial motifs whose widths are in the set {8, 11, 15, 21, 28, 41, 57}.

Despite limiting the initial set of widths under consideration, MEME+ can find a motif of any width in the given range. This is due to a shortening technique that trims low-information columns from the ends of the motif. However, the motif will never be shortened below the minimum width specified for the search.

CONSIDERATIONS

[ Previous | Top | Next ]

Version 2.0 profile files

MEME+ generates a version 2.0 profile file, which permits multiple profiles to be included in one file. Version 2.0 profile files include an auxiliary data block (encased in {}'s) prior to each profile. This block contains parsable information, including the width of the profile and the column labels for the log-odds matrix.

When reading version 2.0 profile files generated by MEME+, most GCG programs (e.g. ProfileSearch, ProfileGap) will read only the first profile found. At this time, the only exception is MotifSearch, which reads and processes all of the profiles.

Also note that MEME+'s profiles always have Gap and Len values of 100 -- MEME+'s profiles should always be thought of as ungapped. This is a characteristic of MEME+, not of the version 2.0 profile file format.

For more details about version 2.0 profiles, see Appendix VII.

Time-complexity of the algorithm

MEME+'s algorithm for finding the best initial motifs of width W requires k * W * n(2) calculations, where k is an unknown constant (probably between 10 and 100) and n is the total number of residues in the input set. If you allow a large range of widths, this becomes very time-consuming. Searching with the default range of widths requires (8 + 11 + 15 + 21 + 20 + 41 + 57) = 173 iterations of k * n(2) calculations.

In any event, running on a training set of more than 20 or 30 typical proteins will require a lot of processor cycles.

Effects of the choice of model

By default, MEME+ assumes the zero-or-one-per model; that is, it assumes that each motif occurs at most once in each sequence in the search set, but may not occur at all in some sequences. This runs MUCH faster than the zero-or-more model, in which a motif may occur any number of times in a sequence. It is important to understand that using the zero-or-one-per model does not necessarily prevent MEME+ from finding motifs that are duplicated within a sequence; however, the zero-or-more model may rank such motifs higher relative to other candidates.

Multiple motifs

When told to look for more than one motif, MEME+ attempts to minimize the overlap between the current motif and any previously identified motifs.

SUGGESTIONS

[ Previous | Top | Next ]

Choosing the minimum and maximum search widths

As noted under CONSIDERATIONS, the algorithm slows down when searching large ranges of widths. If you have some idea of the width of the target motifs, you can (and should) restrict the range of allowable widths. This will save a lot of computation, especially if you can forego searching beyond a width of 25 or 30.

If the training set may include proteins that are not related to the family of interest, you might first run with -minwidth and -maxwidth both set to the same small number (perhaps 10 for proteins), and nmotifs set to 1 or 2. (Be sure to use the default one-or-zero-per model!) This may find a motif (possibly part of a larger motif) that discriminates between family and non-family members, allowing you to remove the unrelated proteins before running a more exhaustive MEME+ over a larger range of widths.

Finding repeats in a sequence

You can identify motifs within a single sequence by specifying -zeroormore to choose the zero-or-more-per model (described in ALGORITHM).

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

Meme finds conserved motifs in a group of unaligned sequences. Meme saves these motifs as a set of profile. You can search a database of sequences with these profiles using the MotifSearch program.
 
 
Minimal Syntax: % meme+ [-infile=]value -Default
 
Minimal Parameters (case-insensitive):
 
-infile         [Type: InFile / Default: 'meme+.list' / Aliases: infile1 in]
                Input file specification.
 
Prompted Parameters (case-insensitive):
 
-nmotifs        [Type: Integer / Default: '6' / Aliases: nmot]
                Set the maximum number of motifs to search.
 
-outfile1       [Type: OutFile / Default: 'meme+.prf' / Aliases: out1]
                Output file of profiles produced by Meme.
 
-outfile2       [Type: OutFile / Default: 'meme+.meme' / Aliases: out2]
                Output file of report produced by Meme.
 
Optional Parameters (case-insensitive):
 
-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.
 
-default        [Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where possible.
 
-documentation  [Type: Boolean / Default: 'true' / Aliases: doc]
                Prints banner at program startup.
 
-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]
Tells application to print only a minimal amount of information.
 
-begin          [Type: Integer / Default: '1' / Aliases: beg]
                Set the begin range of interest for all sequences.
 
-end            [Type: Integer / Default: EMPTY]
                Set the end range of interest for all sequences.
 
-reverse        [Type: Boolean / Default: 'No' / Aliases: rev]
                Use the reverse strand of all sequences.
 
-data           [Type: String / Default: EMPTY / Aliases: dat]
                Specify file of Dirichlet priors for proteins.
 
-distribution   [Type: String / Default: EMPTY / Aliases: dist]
Specify the distribution of motifs per sequence. Possible values:
                'oneexactly' = each motif to occur exactly once
                'oneorzero' = each motif to occur up to one time (default)
                'zeroormore' = allow any number of motifs
 
-oneexactly     [Type: Boolean / Default: 'false' / Aliases: oneex]
Require each motif to occur exactly once in each sequence. Use -distribution instead.
 
-oneorzero      [Type: Boolean / Default: 'false' / Aliases: oneorz]
Require each motif to occur zero or one time per sequence.Use -distribution instead.
 
-zeroormore     [Type: Boolean / Default: 'false' / Aliases: zeroorm]
Require each motif to occur any number of times per sequence. Use -distribution instead.
 
-minwidth       [Type: Integer / Default: EMPTY / Aliases: minw]
                Limit motifs to be a minimum of this width.
 
-maxwidth       [Type: Integer / Default: EMPTY / Aliases: maxw]
                Limit motifs to be a maximum of this width.
 
-gapopen        [Type: Double / Default: EMPTY / Aliases: gapo]
                Set gap opening penalty for multiple alignments.
 
-gapext         [Type: Double / Default: EMPTY / Aliases: gape]
                Set gap extension penalty for multiple alignments.
 
-twostrands     [Type: Boolean / Default: 'false' / Aliases: twos]
                Search both strands of a nucleotide sequence.
 
-maxemiterations
                [Type: Integer / Default: EMPTY / Aliases: maxem]
                Stop EM after this many iterations without convergence.
 
-emthreshold    [Type: Double / Default: EMPTY / Aliases: emthr]
                Set the convergence criterion for EM.
 
-monitor        [Type: Boolean / Default: 'true' / Aliases: mon]
                Turn on/off result monitoring.
 
-summary        [Type: Boolean / Default: 'true' / Aliases: sum]
                Turn on/off report of run information to screen at exit.
 
-batch          [Type: Boolean / Default: 'false']
                Allows submitting a job to a batch queue.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -data1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide. When processing proteins, MEME+ uses a data file of Dirichlet priors for its Bayesian statistics. By default, the file is Share_Misc:prior30.plib. Although it is possible to specify your own priors, it not advised unless you have a very strong understanding of MEME+'s inner workings.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, -in

 

                     Input files specification.
 

-outfile1, -out1

 

          Output file of profiles produced by Meme.
 

-outfile2, -out2

 

                     Output file of report produced by Meme.

 

-begin=1, -beg

Sets the beginning position for all input sequences. When the beginning position is set from the command line, MEME+ ignores beginning positions specified for individual sequences in a list file.

-end=100

Sets the ending position for all input sequences. When the ending position is set from the command line, MEME+ ignores ending positions specified for sequences in a list file.

-reverse, -rev

Sets the program to use the reverse strand for each input sequence. When -reverse or -noreverse is on the command line, MEME+ ignores any strand designation for individual sequences in a list file.

-nmotifs=6, -nmo

         Gives the number of unique motifs for which to search.

-check, -che, -help

 

Prints out this usage message.

 

-default, -def

 

Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

Prints banner at program startup.

 

-quiet, -qui

 

This parameter is not supported.

 

-data, -dat

 

Assigns weight matrix.

 

-distribution, -dis

Specify the distribution of motifs per sequence.

                Possible values:

                'oneexactly' = each motif to occur exactly once

                'oneorzero' = each motif to occur up to one time (default)

                'zeroormore' = allow any number of motifs

-gapopen, -gapo

 

Set gap opening penalty for multiple alignments

 

-gapext, -gape

 

         Set extension opening penalty for multiple alignments

 

-oneexactly, -oneex

Specifies a model in which each motif should occur exactly once in every sequence in the training set. If a given motif gets a low score in any sequence, it is very unlikely to be chosen. This is the fastest model.

-oneorzero, -oneorz

Specifies a model in which each motif should occur zero or one times in any sequence in the training set. If a given motif scores well at more than one position in a sequence, the motif might still be chosen, but the additional scores "hits" will not contribute to its score. This is the default model. This model is about two times slower than the -oneexactly model.

-zeroormore, -zeroorm

Specifies a model in which each motif may occur any number of times in any sequence in the training set. In this case, additional "hits" after the first within a sequence will contribute to the motif's score. This model is about ten times slower than the -oneexactly model.

-twostrands, -twos

Searches forward and reverse strands of nucleotide sequences. This parameter may be used only with the -oneexactly parameter!

-minwidth=8, -minw

Specifies the smallest acceptable motif for the search. When shortening the chosen motif, MEME+ will NOT shorten below this value.

-maxwidth=57, -maxw

Specifies the largest acceptable motif for the search. If -minwidth is equal to -maxwidth, MEME+ will either find a motif of that width, or find nothing at all.

-emthreshold=.001, -emthr

Gives a convergence criterion for the EM phase of the algorithm. Raising this criterion will make MEME+ run faster, but give inferior results.

-maxemiterations=50, -maxem

Overrules the convergence criterion given by -emthreshold. That is, if EM has failed to converge to the -emthreshold after -maxemiterations, the program will cut off the calculation and settle for its result to that point.

-summary, -sum

Writes a summary of the program's completion to the screen. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -summary=false.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-monitor, -mon

 

Program monitors its progress on your screen by displaying a screen trace of progress. However, when you use -default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

 

-batch, -bat

Submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: May 27, 2005  12:58


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio