CLUSTALW+

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

DENDROGRAM

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

ALGORITHM

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

ClustalW+ creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also create a dendrogram (.dnd) showing the clustering relationships used to create the alignment.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

 

P            Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

 

P            Plus programs remove sequence length restriction of 350,000bp.

 

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis. The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignments are, therefore, of major importance.

The multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment.

Before alignment, the sequences are first clustered by similarity to produce a dendrogram, or tree representation of clustering relationships. It is this dendrogram that directs the order of the subsequent pairwise alignments. 

EXAMPLE

[ Previous | Top | Next ]

18:31~43> clustalw+

ClustalW+ is a general purpose multiple sequence alignment program for DNA or Proteins.  It produces biologically meaningful multiple sequence alignments of divergent sequences.  It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.

clustalw+ of what sequence(s) (*  *) ? gb_pl.msf

What kind of action (alignfast, alignslow, tree) :  (* alignslow *) ? alignslow

What should I call the output file.

Default name for alignment file: clustalw.msf.

Default name for tree file : clustalw.ph

Output file:  (*  *) ?

Creating clustalw.msf as the output file.

CLUSTAL W (1.83) Multiple Sequence Alignments

Sequence format is Clustalw+/MSF

Sequence 1: AB016060.gb_pl      1824 bp

Sequence 2: AB016060            1824 bp

Sequence 3: AB016062.gb_pl      1824 bp

Sequence 4: AB016063.gb_pl      1824 bp

Sequence 5: AB016064.gb_pl      1824 bp

Sequence 6: AB016065.gb_pl      1824 bp

Sequence 7: AB016066.gb_pl      1824 bp

Sequence 8: YSCGCN4.gb_pl       1824 bp

Sequence 9: YSCGCN4_1.gb_pl     1824 bp

Sequence 10: yscgcn4.gb_pl       1824 bp


Start of Pairwise alignments

Aligning...

Sequences (1:2) Aligned. Score:  100

Sequences (1:3) Aligned. Score:  33

Sequences (1:4) Aligned. Score:  1

Sequences (1:5) Aligned. Score:  1

Sequences (1:6) Aligned. Score:  0

Sequences (1:7) Aligned. Score:  4

Sequences (1:8) Aligned. Score:  2

Sequences (1:9) Aligned. Score:  2

Sequences (1:10) Aligned. Score:  2

Sequences (2:3) Aligned. Score:  33

Sequences (2:4) Aligned. Score:  1

Sequences (2:5) Aligned. Score:  1

Sequences (2:6) Aligned. Score:  0

Sequences (2:7) Aligned. Score:  4

Sequences (2:8) Aligned. Score:  2

Sequences (2:9) Aligned. Score:  2

Sequences (2:10) Aligned. Score:  2

Sequences (3:4) Aligned. Score:  1

Sequences (3:5) Aligned. Score:  1

Sequences (3:6) Aligned. Score:  3

Sequences (3:7) Aligned. Score:  1

Sequences (3:8) Aligned. Score:  3

Sequences (3:9) Aligned. Score:  3

Sequences (3:10) Aligned. Score:  3

Sequences (4:5) Aligned. Score:  57

Sequences (4:6) Aligned. Score:  53

Sequences (4:7) Aligned. Score:  68

Sequences (4:8) Aligned. Score:  1

Sequences (4:9) Aligned. Score:  1

Sequences (4:10) Aligned. Score:  1

Sequences (5:6) Aligned. Score:  81

Sequences (5:7) Aligned. Score:  65

Sequences (5:8) Aligned. Score:  1

Sequences (5:9) Aligned. Score:  1

Sequences (5:10) Aligned. Score:  1

Sequences (6:7) Aligned. Score:  62

Sequences (6:8) Aligned. Score:  1

Sequences (6:9) Aligned. Score:  1

Sequences (6:10) Aligned. Score:  1

Sequences (7:8) Aligned. Score:  6

Sequences (7:9) Aligned. Score:  6

Sequences (7:10) Aligned. Score:  6

Sequences (8:9) Aligned. Score:  100

Sequences (8:10) Aligned. Score:  100

Sequences (9:10) Aligned. Score:  100

Guide tree        file created:   [/var/tmp/bslskAAAomayNz.dnd]


Start of Multiple Alignment

There are 9 groups

Aligning...

Group 1: Sequences:   2      Score:15353

Group 2: Sequences:   2      Score:23055

Group 3: Sequences:   4      Score:17100

Group 4:                     Delayed

Group 5: Sequences:   3      Score:34656

Group 6: Sequences:   7      Score:12169

Group 7: Sequences:   2      Score:28956

Group 8: Sequences:   3      Score:16945

Group 9: Sequences:  10      Score:10629

Alignment Score 142840

GCG-Alignment file created      [/var/tmp/bslskBAApmayNz.msf]

Moved tree file from /var/tmp/bslskAAAomayNz.dnd to clustalw.dnd

 

Moved alignment file from /var/tmp/bslskBAApmayNz.msf to clustalw.msf

OUTPUT

[ Previous | Top | Next ]

Here is some portion of the output: clustalw.msf

 

!!NA_MULTIPLE_ALIGNMENT 1.0

MSF: 2420  Type: N  December 07, 2004 18:34  Check: 4225 ..

 Name: AB016060.gb_pl  Len: 2420  Check: 6882  Weight: 1.0

 Name: AB016060  Len: 2420  Check: 6882  Weight: 1.0

 Name: AB016062.gb_pl  Len: 2420  Check: 6225  Weight: 1.0

 Name: AB016063.gb_pl  Len: 2420  Check: 6517  Weight: 1.0

 Name: AB016064.gb_pl  Len: 2420  Check: 3692  Weight: 1.0

 Name: AB016065.gb_pl  Len: 2420  Check: 668  Weight: 1.0

 Name: AB016066.gb_pl  Len: 2420  Check: 7136  Weight: 1.0

 Name: YSCGCN4.gb_pl  Len: 2420  Check: 8741  Weight: 1.0

 Name: YSCGCN4_1.gb_pl  Len: 2420  Check: 8741  Weight: 1.0

 Name: yscgcn4.gb_pl  Len: 2420  Check: 8741  Weight: 1.0

 

//

                1                                                   50

AB016060.gb_pl  .......... .......... .......... .......... ..........

AB016060        .......... .......... .......... .......... ..........

AB016062.gb_pl  .......... .......... .......... .......... ..........

AB016063.gb_pl  .......... .......... .......... .......... ..........

AB016064.gb_pl  .......... .......... .......... .......... ..........

AB016065.gb_pl  .......... .......... .......... .......... ..........

AB016066.gb_pl  .......... .......... .......... .......... ..........

YSCGCN4.gb_pl   ATCTTCGGGG ATATAAAGTG CATGAGCATA CATCTTGAAA AAAAAAGATG

YSCGCN4_1.gb_pl ATCTTCGGGG ATATAAAGTG CATGAGCATA CATCTTGAAA AAAAAAGATG

yscgcn4.gb_pl   ATCTTCGGGG ATATAAAGTG CATGAGCATA CATCTTGAAA AAAAAAGATG

 

                51                                                 100

AB016060.gb_pl  .......... .......... .......... .......... ..........

AB016060        .......... .......... .......... .......... ..........

AB016062.gb_pl  .......... .......... .......... .......... ..........

AB016063.gb_pl  .......... .......... .......... .......... ..........

AB016064.gb_pl  .......... .......... .......... .......... ..........

AB016065.gb_pl  .......... .......... .......... .......... ..........

AB016066.gb_pl  .......... .......... .......... .......... ..........

YSCGCN4.gb_pl   AAAAATTTCC GACTTTAAAT ACGGAAGATA AATACTCCAA CCTTTTTTTC

YSCGCN4_1.gb_pl AAAAATTTCC GACTTTAAAT ACGGAAGATA AATACTCCAA CCTTTTTTTC

yscgcn4.gb_pl   AAAAATTTCC GACTTTAAAT ACGGAAGATA AATACTCCAA CCTTTTTTTC

 

                101                                                150

AB016060.gb_pl  .......... .......... ...ATGGTGT TGTCTGAGTC CAACTTCCTG

AB016060        .......... .......... ...ATGGTGT TGTCTGAGTC CAACTTCCTG

AB016062.gb_pl  .......... .......... .......... ..ATGGATTT CTACACACT.

AB016063.gb_pl  ..AAGCAAAC GCAGCATTGG GAGATAGAAA GAGAGAGAGA AAGAGAGAGA

AB016064.gb_pl  .......... .......... .......... .......... ..........

AB016065.gb_pl  .......... ......TTCA CCCTCCGCCG CCTCGTCAAT TCCACGCGAA

AB016066.gb_pl  .......... .......... .......... .......... ..........

YSCGCN4.gb_pl   CAATTCCGAA ATTTTAGTCT TCTTTAAAGA AGTTTCGGCT CGCTGTCTTA

YSCGCN4_1.gb_pl CAATTCCGAA ATTTTAGTCT TCTTTAAAGA AGTTTCGGCT CGCTGTCTTA

yscgcn4.gb_pl   CAATTCCGAA ATTTTAGTCT TCTTTAAAGA AGTTTCGGCT CGCTGTCTTA

 

                151                                                200

AB016060.gb_pl  TTATGTCTTA TTTCCATTTC AATAGCTTCT GTTTTCTTCT TTCTCTTGAA

AB016060        TTATGTCTTA TTTCCATTTC AATAGCTTCT GTTTTCTTCT TTCTCTTGAA

AB016062.gb_pl  .TGCGT.TTG GATCAATTTT ..TGCCTGCG GTTTGCTTTA TATTCTAGCA

AB016063.gb_pl  GAGAAAGACC CTTACCCTTC TCTATCGCTC GCTTTCCTTT GACGCTTCTG

AB016064.gb_pl  .......... .......... .......... .......... ..TGCAAAAA

AB016065.gb_pl  CGCGAGAGCT CTCGGAAAGC ACCACCACCA GCACAGAGCC AGCGCGAGAG

AB016066.gb_pl  .......... .......... .......... .......... ..........

YSCGCN4.gb_pl   CCTTTTAAAA TCTTCTACTT CTTGACAGTA CTTATCTTCT TATATAATAG

YSCGCN4_1.gb_pl CCTTTTAAAA TCTTCTACTT CTTGACAGTA CTTATCTTCT TATATAATAG

yscgcn4.gb_pl   CCTTTTAAAA TCTTCTACTT CTTGACAGTA CTTATCTTCT TATATAATAG

The gaps at the ends of each sequence are written as dots (.)Which may represent differences in input sequence lengths rather than missing characters or significant differences in the alignment. Internal gaps in each sequence are written as periods (.). See Appendix III for more information about the two different gap characters.

DENDROGRAM

Clustalw+ creates a dendrogram file called clustalw.dnd (default). It has the following information.

(

(

(

(

(

AB016060.gb_pl:0.00000,

AB016060:0.00000)

:0.33486,

AB016062.gb_pl:0.33312)

:0.15633,

(

(

YSCGCN4.gb_pl:0.00000,

YSCGCN4_1.gb_pl:0.00000)

:0.00000,

yscgcn4.gb_pl:0.00000)

:0.48288)

:0.28754,

(

AB016064.gb_pl:0.08285,

AB016065.gb_pl:0.10552)

:0.11331)

:0.03569,

AB016063.gb_pl:0.18615,

AB016066.gb_pl:0.12824);

 

Any dendrogram tree viewer can interpret the distances mentioned in the dendrogram file to draw an appropriate dendrogram for given set of input sequence alignments.

INPUT FILES

[ Previous | Top | Next ]

ClustalW+ accepts multiple (two or more) nucleotide sequences or multiple (two or more) protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. The function of ClustalW+ depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

If the input sequences are named in a list file, you can specify the reverse complement strand of any particular nucleotide sequence in the list as input by using the strand:- sequence attribute. You can restrict the range of interest for any particular sequence with appropriate sequence attributes like Begin:43 and End:682. (See "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide for more information about sequence attributes in list files.) For example:

 

This is part of a list file suitable for input to CLUSTALW+.

 

                   October 6, 1998  ..

 

PIR:A32493

PIR:S05776        Begin:43 End:682

PIR:B36590

 

///////////////////////////////////////

You can limit the range of interest for all of the sequences in the alignment by including expressions like -BEGin=20 and -END=70 on the command line. The command-line range limiters take precedence over the range limiters for sequences in a list file when both are used. If no range limitation is specified, the entire length of each sequence is aligned.

RELATED PROGRAMS

[ Previous | Top | Next ]

ClustalW+ creates a multiple sequence alignment from a group of related sequences

RESTRICTIONS

[ Previous | Top | Next ]

Please make sure that your sequences have different names as the first 30 characters of the name are significant. If ClustalW+ finds two or more sequences with the same name it will fail!

Some word processors may yield unpredictable results as hidden/control characters may be present in the files. It is best to save files with the UNIX format option to avoid hidden windows characters while preparing the input files

ALGORITHM

[ Previous | Top | Next ]

ClustalW+ uses a progressive pairwise approach for multiple sequence alignment. It consists of basic alignment method similar to that of PileUp, with a modified progressive alignment stage to improve the sensitivity and accuracy of the final alignment.

The basic alignment method

The basic multiple alignment algorithm consists of three main stages: 1) all pairs of sequences are aligned separately in order to calculate a distance matrix giving the divergence of each pair of sequences; 2) a guide tree is calculated from the distance matrix; 3) the sequences are progressively aligned according to the branching order in the guide tree.

1) The distance matrix/pairwise alignments

ClustalW+ offers a choice between a fast but approximate and a slow but more accurate method in the pairwise alignments stage. The fast alignment method allows very large numbers of sequences to be aligned and the scores are calculated as the number of “k-tuple” matches (runs of identical residues, typically 1 or 2 long for proteins or 2 to 4 long for nucleotide sequences) in the best alignment between two sequences minus a fixed penalty for every gap. The slower but more accurate method derives scores from full dynamic programming alignments using two gap penalties (for opening or extending gaps) and a full amino acid scoring matrix. These scores are calculated as the number of identities in the best alignment divided by the number of residues compared (gap positions are excluded). Both of these scores are initially calculated as percent identity scores and are converted to distances by dividing by 100 and subtracting from 1.0 to give number of differences per site. Note that multiple substitutions in the initial distances were not corrected.

2) The guide tree

The tree (or “dendrogram”) used to guide the final multiple alignment process is calculated from the distance matrix created in stage 1 using the Neighbor-Joining (NJ) method. This produces an unrooted tree with branch lengths proportional to estimated divergence along each branch. The root is placed by a “mid-point” method at a position where the means of the branch lengths on either side of the root are equal. These trees are also used to derive a weight for each sequence (see Sequence Weighting below). The weights are dependent upon the distance from the root of the tree but sequences which have a common branch with other sequences share the weight derived from the shared branch.

3) Progressive alignment

At this stage a series of pairwise alignments are used to align larger and larger groups of sequences, following the branching order in the guide tree. At each step a full dynamic programming algorithm is used with a residue scoring matrix and penalties for opening and extending gaps. Each step consists of aligning two existing alignments or sequences. Gaps that are present in older alignments remain fixed. The score between a position from one sequence or alignment and one from another is calculated by averaging all the pairwise scoring matrix scores from the amino acids in the two sets of sequences. For example, if you align 2 alignments with 2 and 4 sequences respectively, the score at each position is the average of 8 (2x4) comparisons. If either set of sequences contains one or more gaps in one of the positions being considered, each gap versus a residue is scored as zero. Since the default amino acid scoring matrices used have been rescored to have only positive values, this treatment of gaps treats the score of a residue versus a gap as having the worst possible score. When sequences are weighted (see Sequence Weighting below), each scoring matrix value is multiplied by the weights from the 2 sequences.

Improvements to progressive alignment

ClustalW+ implements the following modifications to the final progressive alignment stage to improve the accuracy and sensitivity of the alignment: 1) sequence weighting; 2) dynamically adjusted gap penalties; 3) variable scoring matrices; and 4) delayed alignment of highly divergent sequences.

Sequence weighting

Sequence weights are calculated directly from the guide tree. The weights are normalized such that the biggest one is set to 1.0 and the rest are all less than one. Groups of closely related sequences receive lowered weights because they contain much duplicated information. Highly divergent sequences without any close relatives receive high weights. These weights are used as simple multiplication factors for scoring positions from different sequences or pre-aligned groups of sequences.

Initial gap penalties

Initially, two gap penalties are used: a gap opening penalty (GOP) which gives the cost of opening a new gap of any length and a gap extension penalty (GEP) which gives the cost of every item in a gap. Initial values can be set by the user. The program then automatically attempts to choose appropriate gap penalties for each sequence alignment, depending on the following factors.

1) Dependence on the scoring matrix

ClustalW+ uses the average score for two mismatched residues (i.e. off-diagonal values in the matrix) as a scaling factor for the GOP.

2) Dependence on the similarity of the sequences

The percent identity of the two (groups of) sequences to be aligned are used to increase the GOP for closely related sequences and decrease it for more divergent sequences on a linear scale.

3) Dependence on the lengths of the sequences

The scores for both true and false sequence alignments grow with the length of the sequences. ClustalW+ uses the logarithm of the length of the shorter sequence to increase the GOP with sequence length.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases.  Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

 

ClustalW+ performs multiple alignments on a set of sequences or a set of previously aligned sequences.

 

Minimal Syntax: % clustalw+ [-infile=]value -Default

 

Minimal Parameters (case-insensitive):

 

-infile         [Type: InFile / Default: EMPTY / Aliases: infile1 in]

                Input files specification.

 

Prompted Parameters (case-insensitive):

 

-action         [Type: String / Default: 'alignslow' / Aliases: act]

                The following actions are supported: alignfast: multiple alignments using fast pairwise alignment alignslow: multiple alignments using slow pairwise alignment tree: generate a phylogenetic tree given the alignment.

               

-outfile        [Type: OutFile / Default: EMPTY / Aliases: out]

Output file produced by ClustalW. This will be an alignment file if ACTION is alignfast or alignslow. It is a tree file if ACTION is tree. Default output file is clustalw.msf.

 

Optional Parameters (case-insensitive):

 

-check          [Type: Boolean / Default: 'false' / Aliases: che help]

                Prints out this usage message.

 

-default        [Type: Boolean / Default: 'false' / Aliases: d def]

                Specifies that sensible default values be used for all parameters where possible.

 

-documentation  [Type: Boolean / Default: 'true' / Aliases: doc]

                Prints banner at program startup.

 

-quiet          [Type: Boolean / Default: 'false' / Aliases: qui]

                Tells application to print only a minimal amount of information.

 

-outorder       [Type: String / Default: 'input' / Aliases: ord]

Whether the sequences should be output in input order or the order in which they were aligned. Valid values are : Input, Aligned.

 

-range          [Type: List / Default: EMPTY / Aliases: rng]

The sequence range to write as a comma-seaparated value, e.g. m,n will write from m to m+n.

 

-pwmatrix       [Type: String / Default: EMPTY]

The scoring matrix to use for pairwise alignment when performing slow pairwise alignments. Depending upon sequence type, it may either refer to DNA scoring matrix or a protein scoring matrix. This option is relevant only when ACTION=alignslow. Valid matrices are : blosum, pam, gonnet, id or filename.

 

-pwgapopen      [Type: Double / Default: EMPTY]

The gap opening penalty during slow pairwise alignments. This option is relevant only when ACTION=alignslow.

 

-pwgapext       [Type: Double / Default: EMPTY]

The gap extension penalty during slow pairwise alignments. This option is relevant only when ACTION=alignslow.

 

-ktuple         [Type: Integer / Default: EMPTY]

Window size while doing fast pairwise alignment. This option is relevant only when ACTION=alignfast.

 

-topdiags       [Type: Integer / Default: EMPTY]

Number of windows around best diagonals while doing fast pairwise alignment. This option is relevant only when ACTION=alignfast.

 

-window         [Type: Integer / Default: EMPTY]

Number of windows around each of the top diagnoals. This option is relevant only when ACTION=alignfast.

 

-pairgap        [Type: Integer / Default: EMPTY]

The number of matching residues required to open a gap while doing fast pairwise alignments. This option is only relevant when ACTION=alignfast.

 

-score          [Type: String / Default: 'absolute']

Whether pairwise alignment scores should be reported as (raw)  absolute scores or percentages absolute|percent}. This option is relevant only when ACTION=alignfast.

 

-matrix         [Type: String / Default: EMPTY]

                The scoring matrix to be used for multiple sequence alignments.

 

-gapopen        [Type: Double / Default: EMPTY]

                The gap opening penalty when doing multiple sequence alignments

 

-gapext         [Type: Double / Default: EMPTY]

                The gap extension penalty when doing multiple sequence alignments.

 

-endgaps        [Type: Boolean / Default: 'false']

                Turn on/off end gap separation penalty.

 

-pgap           [Type: Boolean / Default: 'true']

                Turn on/off residue-specific gap separation penalties.

 

-hgap           [Type: Boolean / Default: 'true']

                Turn on/off hydrophilic residue gaps.

 

-hgapresidues   [Type: List / Default: EMPTY]

                List of hydrophilic residues.

 

-maxdiv         [Type: Double / Default: EMPTY]

                Minimum percentage identity required for delay.

 

-outputtree     [Type: String / Default: 'nj' / Aliases: outtreeformat

treefmt] The output format of the guide tree produces. Valid values: NJ, PHYLIP, DIST, NEXUS.

 

-negative       [Type: Boolean / Default: 'false' / Aliases: neg]

                Sets whether protein matrix contains negative values.

 

-monitor        [Type: Boolean / Default: 'false' / Aliases: mon]

                Turn on/off result monitoring.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory [$WPROOT/share/matrix/] unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like –data1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either

1) Have a data file with exactly the same name as the program default scoring matrix in your current working directory; or

 2) Have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name Share_Matrix; or

3) Name a file on the command line with an expression like -matrix=mymatrix.cmp. If you don't include a directory specification when you name a file with -matrix, the program searches for the file first in your local directory, then in the directory with the logical name Share_Matrix,. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Section 4, Using Data Files in the User's Guide.

ClustalW+ reads a scoring matrix from your local directory or the public database with the values for every possible symbol comparison. The file Clustalw+dna.cmp has a 10 at every place where the set of bases implied by the alphabetic IUB ambiguity codes (see Appendix III) overlap. All of the other locations have zeros. The file blosum62.cmp is based on substitutions between amino acid pairs in ungapped blocks of aligned protein segments as measured by Henikoff and Henikoff. The scores in this matrix for pairwise amino acid comparisons range from -4 to +11. You can use the Fetch+ program to copy these files and then modify them to suit you own needs.

 

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, -in

 

                         Inputs file specification.

 

-action, -act        

               

The following actions are supported: alignfast: multiple alignments using fast pairwise alignment alignslow: multiple alignments using slow pairwise alignment  tree: generate a phylogenetic tree given the alignment.

               

-outfile, -out

 

Output file produced by ClustalW+. This will be an alignment file if action is alignfast or alignslow. It is a tree file if action is tree. Default output file is clustalw.msf.

 

-check, -che, -help

 

Prints out this usage message.

 

-default, -d, -def

 

                        Specifies that sensible default values be used for all parameters where possible.

 

-documentation, -doc

 

Prints banner at program startup.

 

-quiet, -qui

 

This parameter is not supported.

 

-outorder

 

Whether the sequences should be output in input order or the order in which they were aligned. Valid values are: Input, Aligned.

 

-range

 

The sequence range to write as a comma-separated value, e.g. m,n will write from m to m+n

 

-pwmatrix

 

The scoring matrix to use for pairwise alignment when performing slow pairwise alignments. Depending upon sequence type, it may either refer to DNA scoring matrix or a protein scoring matrix. This option is relevant only when action=alignslow. Valid matrices are: blosum, pam, gonnet, id or filename

 

-pwgapopen

 

The gap opening penalty during slow pairwise alignments. This option is relevant only when action=alignslow.

 

-pwgapext

 

The gap extension penalty during slow pairwise alignments. This option is relevant only when action=alignslow.

 

-ktuple

 

Window size while doing fast pairwise alignment. This option is relevant only when action=alignfast.

 

-topdiags

 

Number of windows around best diagonals while doing fast pairwise alignment. This option is relevant only when action=alignfast.

 

-window

 

Number of windows around each of the top diagonals. This option is relevant only when action=alignfast.

 

-pairgap

 

The number of matching residues required to open a gap while doing fast pairwise alignments. This option is only relevant when action=alignfast.

 

-score

 

Whether pairwise alignment scores should be reported as (raw) absolute scores or percentages {absolute|percent}. This option is relevant only when action=alignfast.

 

-matrix

 

The scoring matrix to be used for multiple sequence alignments.

 

-gapopen

 

The gap opening penalty when doing multiple sequence alignments.

 

-gapext

 

The gap extension penalty when doing multiple sequence alignments.

 

-endgaps

 

Turn on/off end gap separation penalty.

 

-pgap

 

Turn on/off residue-specific gap separation penalties.

 

-hgap

 

Turn on/off hydrophilic residue gaps.

 

-hgapresidues

 

List of hydrophilic residues.

 

-maxdiv

 

Minimum percentage identity required for delay.

 

-outputtree

 

The output format of the guide tree produces. Valid values: NJ, PHYLIP, DIST, NEXUS.

 

-negative, -neg

 

Sets whether protein matrix contains negative values.

 

-monitor, -mon

Program monitors its progress on your screen by displaying a screen trace of progress. However, when you use -default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: June 3, 2005  10:11


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio