[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
DENDROGRAM
ClustalW+ creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also create a dendrogram (.dnd) showing the clustering relationships used to create the alignment.
Advantages of Plus “+” Programs:
P Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.
P Plus programs remove sequence length restriction of 350,000bp.
If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.
The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis. The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignments are, therefore, of major importance.
The multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment.
Before alignment, the sequences are first clustered by similarity to produce a dendrogram, or tree representation of clustering relationships. It is this dendrogram that directs the order of the subsequent pairwise alignments.
ClustalW+ is a general
purpose multiple sequence alignment program for DNA or Proteins. It produces biologically meaningful
multiple sequence alignments of divergent sequences. It calculates the best match for the
selected sequences, and lines them up so that the identities, similarities and
differences can be seen.
clustalw+ of what sequence(s) (* *) ? gb_pl.msf
What kind of action (alignfast, alignslow, tree)
: (* alignslow
*) ? alignslow
What should I call the
output file.
Default name for alignment
file: clustalw.msf.
Default name for tree file
: clustalw.ph
Output file: (*
*) ?
Creating clustalw.msf as the output file.
CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Clustalw+/MSF
Sequence 1: AB016060.gb_pl 1824 bp
Sequence 2: AB016060
1824 bp
Sequence 3: AB016062.gb_pl 1824 bp
Sequence 4: AB016063.gb_pl 1824 bp
Sequence 5: AB016064.gb_pl 1824 bp
Sequence 6: AB016065.gb_pl 1824 bp
Sequence 7: AB016066.gb_pl 1824 bp
Sequence 8: YSCGCN4.gb_pl 1824 bp
Sequence 9: YSCGCN4_1.gb_pl 1824 bp
Sequence 10: yscgcn4.gb_pl 1824 bp
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score: 100
Sequences (1:3) Aligned. Score: 33
Sequences (1:4) Aligned. Score: 1
Sequences (1:5) Aligned. Score: 1
Sequences (1:6) Aligned. Score: 0
Sequences (1:7) Aligned. Score: 4
Sequences (1:8) Aligned. Score: 2
Sequences (1:9) Aligned. Score: 2
Sequences (
Sequences (2:3) Aligned. Score: 33
Sequences (2:4) Aligned. Score: 1
Sequences (2:5) Aligned. Score: 1
Sequences (2:6) Aligned. Score: 0
Sequences (2:7) Aligned. Score: 4
Sequences (2:8) Aligned. Score: 2
Sequences (2:9) Aligned. Score: 2
Sequences (2:10) Aligned. Score: 2
Sequences (3:4) Aligned. Score: 1
Sequences (3:5) Aligned. Score: 1
Sequences (3:6) Aligned. Score: 3
Sequences (3:7) Aligned. Score: 1
Sequences (3:8) Aligned. Score: 3
Sequences (3:9) Aligned. Score: 3
Sequences (3:10) Aligned. Score: 3
Sequences (4:5) Aligned. Score: 57
Sequences (4:6) Aligned. Score: 53
Sequences (4:7) Aligned. Score: 68
Sequences (4:8) Aligned. Score: 1
Sequences (4:9) Aligned. Score: 1
Sequences (4:10) Aligned. Score: 1
Sequences (5:6) Aligned. Score: 81
Sequences (5:7) Aligned. Score: 65
Sequences (5:8) Aligned. Score: 1
Sequences (5:9) Aligned. Score: 1
Sequences (5:10) Aligned. Score: 1
Sequences (6:7) Aligned. Score: 62
Sequences (6:8) Aligned. Score: 1
Sequences (6:9) Aligned. Score: 1
Sequences (6:10) Aligned. Score: 1
Sequences (7:8) Aligned. Score: 6
Sequences (7:9) Aligned. Score: 6
Sequences (7:10) Aligned. Score: 6
Sequences (8:9) Aligned. Score: 100
Sequences (8:10) Aligned. Score: 100
Sequences (9:10) Aligned. Score: 100
Guide tree file
created: [/var/tmp/bslskAAAomayNz.dnd]
Start of Multiple Alignment
There are 9 groups
Aligning...
Group 1: Sequences: 2 Score:15353
Group 2: Sequences: 2 Score:23055
Group 3: Sequences: 4 Score:17100
Group 4:
Delayed
Group 5: Sequences: 3 Score:34656
Group 6: Sequences: 7 Score:12169
Group 7: Sequences: 2 Score:28956
Group 8: Sequences: 3 Score:16945
Group 9: Sequences:
10
Score:10629
Alignment Score 142840
GCG-Alignment file created [/var/tmp/bslskBAApmayNz.msf]
Moved tree file from /var/tmp/bslskAAAomayNz.dnd
to clustalw.dnd
Moved alignment file from /var/tmp/bslskBAApmayNz.msf
to clustalw.msf
Here is some portion of the output: clustalw.msf
!!NA_MULTIPLE_ALIGNMENT 1.0
MSF: 2420 Type: N December 07, 2004 18:34 Check: 4225 ..
Name: AB016060.gb_pl Len: 2420 Check: 6882 Weight: 1.0
Name: AB016060 Len: 2420 Check: 6882 Weight: 1.0
Name: AB016062.gb_pl Len: 2420 Check: 6225 Weight: 1.0
Name: AB016063.gb_pl Len: 2420 Check: 6517 Weight: 1.0
Name: AB016064.gb_pl Len: 2420 Check: 3692 Weight: 1.0
Name: AB016065.gb_pl Len: 2420 Check: 668 Weight: 1.0
Name: AB016066.gb_pl Len: 2420 Check: 7136 Weight: 1.0
Name: YSCGCN4.gb_pl Len: 2420 Check: 8741 Weight: 1.0
Name: YSCGCN4_1.gb_pl Len: 2420 Check: 8741 Weight: 1.0
Name: yscgcn4.gb_pl Len: 2420 Check: 8741 Weight: 1.0
//
1
50
AB016060.gb_pl .......... .......... ..........
.......... ..........
AB016060
.......... .......... .......... .......... ..........
AB016062.gb_pl .......... .......... ..........
.......... ..........
AB016063.gb_pl .......... .......... ..........
.......... ..........
AB016064.gb_pl .......... .......... ..........
.......... ..........
AB016065.gb_pl .......... .......... ..........
.......... ..........
AB016066.gb_pl .......... .......... ..........
.......... ..........
YSCGCN4.gb_pl ATCTTCGGGG ATATAAAGTG CATGAGCATA
CATCTTGAAA AAAAAAGATG
YSCGCN4_1.gb_pl
ATCTTCGGGG ATATAAAGTG CATGAGCATA CATCTTGAAA AAAAAAGATG
yscgcn4.gb_pl ATCTTCGGGG ATATAAAGTG CATGAGCATA
CATCTTGAAA AAAAAAGATG
51
100
AB016060.gb_pl .......... .......... .......... ..........
..........
AB016060
.......... .......... .......... .......... ..........
AB016062.gb_pl .......... .......... ..........
.......... ..........
AB016063.gb_pl .......... .......... ..........
.......... ..........
AB016064.gb_pl .......... .......... ..........
.......... ..........
AB016065.gb_pl .......... .......... ..........
.......... ..........
AB016066.gb_pl .......... .......... ..........
.......... ..........
YSCGCN4.gb_pl AAAAATTTCC GACTTTAAAT ACGGAAGATA
AATACTCCAA CCTTTTTTTC
YSCGCN4_1.gb_pl
AAAAATTTCC GACTTTAAAT ACGGAAGATA AATACTCCAA CCTTTTTTTC
yscgcn4.gb_pl AAAAATTTCC GACTTTAAAT ACGGAAGATA
AATACTCCAA CCTTTTTTTC
101
150
AB016060.gb_pl .......... .......... ...ATGGTGT
TGTCTGAGTC CAACTTCCTG
AB016060
.......... .......... ...ATGGTGT TGTCTGAGTC CAACTTCCTG
AB016062.gb_pl .......... .......... ..........
..ATGGATTT CTACACACT.
AB016063.gb_pl ..AAGCAAAC GCAGCATTGG GAGATAGAAA
GAGAGAGAGA AAGAGAGAGA
AB016064.gb_pl .......... .......... ..........
.......... ..........
AB016065.gb_pl .......... ......TTCA CCCTCCGCCG
CCTCGTCAAT TCCACGCGAA
AB016066.gb_pl .......... .......... ..........
.......... ..........
YSCGCN4.gb_pl CAATTCCGAA ATTTTAGTCT TCTTTAAAGA AGTTTCGGCT
CGCTGTCTTA
YSCGCN4_1.gb_pl
CAATTCCGAA ATTTTAGTCT TCTTTAAAGA AGTTTCGGCT CGCTGTCTTA
yscgcn4.gb_pl CAATTCCGAA ATTTTAGTCT TCTTTAAAGA
AGTTTCGGCT CGCTGTCTTA
151
200
AB016060.gb_pl TTATGTCTTA TTTCCATTTC AATAGCTTCT
GTTTTCTTCT TTCTCTTGAA
AB016060
TTATGTCTTA TTTCCATTTC AATAGCTTCT GTTTTCTTCT TTCTCTTGAA
AB016062.gb_pl .TGCGT.TTG GATCAATTTT ..TGCCTGCG
GTTTGCTTTA TATTCTAGCA
AB016063.gb_pl GAGAAAGACC CTTACCCTTC TCTATCGCTC GCTTTCCTTT
GACGCTTCTG
AB016064.gb_pl .......... .......... ..........
.......... ..TGCAAAAA
AB016065.gb_pl CGCGAGAGCT CTCGGAAAGC ACCACCACCA
GCACAGAGCC AGCGCGAGAG
AB016066.gb_pl .......... .......... ..........
.......... ..........
YSCGCN4.gb_pl CCTTTTAAAA TCTTCTACTT CTTGACAGTA
CTTATCTTCT TATATAATAG
YSCGCN4_1.gb_pl
CCTTTTAAAA TCTTCTACTT CTTGACAGTA CTTATCTTCT TATATAATAG
yscgcn4.gb_pl CCTTTTAAAA TCTTCTACTT CTTGACAGTA
CTTATCTTCT TATATAATAG
The gaps at the ends of each sequence are written as dots (.)Which may
represent differences in input sequence lengths rather than missing characters
or significant differences in the alignment. Internal gaps in each sequence are
written as periods (.). See Appendix III for
more information about the two different gap characters.
Clustalw+ creates a dendrogram file called clustalw.dnd (default). It has the following information.
(
(
(
(
(
AB016060.gb_pl:0.00000,
AB016060:0.00000)
:0.33486,
AB016062.gb_pl:0.33312)
:0.15633,
(
(
YSCGCN4.gb_pl:0.00000,
YSCGCN4_1.gb_pl:0.00000)
:0.00000,
yscgcn4.gb_pl:0.00000)
:0.48288)
:0.28754,
(
AB016064.gb_pl:0.08285,
AB016065.gb_pl:0.10552)
:0.11331)
:0.03569,
AB016063.gb_pl:0.18615,
AB016066.gb_pl:0.12824);
Any dendrogram tree viewer can interpret the
distances mentioned in the dendrogram file to draw an
appropriate dendrogram for given set of input
sequence alignments.
ClustalW+ accepts multiple (two or more) nucleotide sequences or
multiple (two or more) protein sequences as input. You can specify multiple
sequences in a number of ways: by using a list file, for example @project.list; by
using an MSF or RSF file, for example project.msf{*}; or by
using a sequence specification with an asterisk (*) wildcard, for
example
GenBank:*. The function of ClustalW+ depends on whether your input sequence(s)
are protein or nucleotide. Programs determine the type of a sequence by the
presence of either Type: N or Type: P on the last line of the text heading just
above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set
the type of a sequence.
If the input sequences are named in a list file, you can specify the
reverse complement strand of any particular nucleotide sequence in the list as
input by using the strand:- sequence attribute. You can restrict the range of
interest for any particular sequence with appropriate sequence attributes like Begin:43 and End:682. (See "Using List
Files" in Section 2, Using Sequence Files and Databases in the User's
Guide for more information about sequence attributes in list files.) For
example:
This is part of a list file suitable for
input to CLUSTALW+.
October 6, 1998 ..
PIR:A32493
PIR:S05776 Begin:43
End:682
PIR:B36590
///////////////////////////////////////
You can limit the range of interest for all of the sequences in the
alignment by including expressions like -BEGin=20 and -END=70 on the
command line. The command-line range limiters take precedence over the range
limiters for sequences in a list file when both are used. If no range
limitation is specified, the entire length of each sequence is aligned.
ClustalW+ creates a multiple sequence alignment from a group of related sequences
Please make sure that your sequences have different names as the first 30 characters of the name are significant. If ClustalW+ finds two or more sequences with the same name it will fail!
Some word processors may yield unpredictable results as hidden/control characters may be present in the files. It is best to save files with the UNIX format option to avoid hidden windows characters while preparing the input files
ClustalW+ uses a progressive pairwise
approach for multiple sequence alignment. It consists of basic alignment method
similar to that of PileUp, with a modified progressive alignment stage to
improve the sensitivity and accuracy of the final alignment.
The
basic alignment method
The basic multiple alignment algorithm consists of three
main stages: 1) all pairs of sequences are aligned separately in order to
calculate a distance matrix giving the divergence of each pair of sequences; 2)
a guide tree is calculated from the distance matrix; 3) the sequences are
progressively aligned according to the branching order in the guide tree.
1) The distance matrix/pairwise
alignments
ClustalW+ offers a choice between a fast but approximate and
a slow but more accurate method in the pairwise
alignments stage. The fast alignment method allows very large numbers of
sequences to be aligned and the scores are calculated as the number of “k-tuple” matches (runs of identical residues, typically
1 or 2 long for proteins or 2 to 4 long for nucleotide sequences) in the best
alignment between two sequences minus a fixed penalty for every gap. The slower
but more accurate method derives scores from full dynamic programming
alignments using two gap penalties (for opening or extending gaps) and a full
amino acid scoring matrix. These scores are calculated as the number of
identities in the best alignment divided by the number of residues compared
(gap positions are excluded). Both of these scores are initially calculated as
percent identity scores and are converted to distances by dividing by 100 and
subtracting from 1.0 to give number of differences per site. Note that multiple
substitutions in the initial distances were not corrected.
2) The guide tree
The tree (or “dendrogram”)
used to guide the final multiple alignment process is calculated from the
distance matrix created in stage 1 using the Neighbor-Joining (NJ) method. This
produces an unrooted tree with branch lengths
proportional to estimated divergence along each branch. The root is placed by a
“mid-point” method at a position where the means of the branch
lengths on either side of the root are equal. These trees are also used to
derive a weight for each sequence (see Sequence Weighting below). The weights
are dependent upon the distance from the root of the tree but sequences which
have a common branch with other sequences share the weight derived from the
shared branch.
3) Progressive alignment
At this stage a series of pairwise
alignments are used to align larger and larger groups of sequences, following
the branching order in the guide tree. At each step a full dynamic programming
algorithm is used with a residue scoring matrix and penalties for opening and
extending gaps. Each step consists of aligning two existing alignments or
sequences. Gaps that are present in older alignments remain fixed. The score
between a position from one sequence or alignment and one from another is
calculated by averaging all the pairwise scoring
matrix scores from the amino acids in the two sets of sequences. For example,
if you align 2 alignments with 2 and 4 sequences respectively, the score at
each position is the average of 8 (2x4) comparisons. If either set of sequences
contains one or more gaps in one of the positions being considered, each gap
versus a residue is scored as zero. Since the default amino acid scoring matrices
used have been rescored to have only positive values, this treatment of gaps
treats the score of a residue versus a gap as having the worst possible score.
When sequences are weighted (see Sequence Weighting below), each scoring matrix
value is multiplied by the weights from the 2 sequences.
Improvements
to progressive alignment
ClustalW+ implements the following modifications to the
final progressive alignment stage to improve the accuracy and sensitivity of
the alignment: 1) sequence weighting; 2) dynamically adjusted gap penalties; 3)
variable scoring matrices; and 4) delayed alignment of highly divergent
sequences.
Sequence
weighting
Sequence weights are calculated directly from the guide
tree. The weights are normalized such that the biggest one is set to 1.0 and
the rest are all less than one. Groups of closely related sequences receive
lowered weights because they contain much duplicated information. Highly
divergent sequences without any close relatives receive high weights. These
weights are used as simple multiplication factors for scoring positions from
different sequences or pre-aligned groups of sequences.
Initial
gap penalties
Initially, two gap penalties are used: a gap opening penalty
(GOP) which gives the cost of opening a new gap of any length and a gap
extension penalty (GEP) which gives the cost of every item in a gap. Initial
values can be set by the user. The program then automatically attempts to
choose appropriate gap penalties for each sequence alignment, depending on the
following factors.
1) Dependence on the scoring matrix
ClustalW+ uses the average score for two mismatched residues
(i.e. off-diagonal values in the matrix) as a scaling factor for the GOP.
2) Dependence on the similarity of the sequences
The percent identity of the two (groups of) sequences to be
aligned are used to increase the GOP for closely related sequences and decrease
it for more divergent sequences on a linear scale.
3) Dependence on the lengths of the sequences
The scores for both true and false sequence alignments grow
with the length of the sequences. ClustalW+ uses the logarithm of the length of
the shorter sequence to increase the GOP with sequence length.
All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases. Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.
ClustalW+ performs multiple alignments on a set of sequences or a set of previously aligned sequences.
Minimal Syntax: % clustalw+ [-infile=]value -Default
Minimal Parameters (case-insensitive):
-infile [Type: InFile / Default: EMPTY / Aliases: infile1 in]
Input files specification.
Prompted Parameters (case-insensitive):
-action [Type: String / Default: 'alignslow' / Aliases: act]
The following actions are supported: alignfast: multiple alignments using fast pairwise alignment alignslow: multiple alignments using slow pairwise alignment tree: generate a phylogenetic tree given the alignment.
-outfile [Type: OutFile / Default: EMPTY / Aliases: out]
Output file produced by ClustalW. This will be an alignment file if ACTION is alignfast or alignslow. It is a tree file if ACTION is tree. Default output file is clustalw.msf.
Optional Parameters (case-insensitive):
-check [Type: Boolean / Default: 'false' / Aliases: che help]
Prints out this usage message.
-default [Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where possible.
-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
Prints banner at program startup.
-quiet [Type: Boolean / Default: 'false' / Aliases: qui]
Tells application to print only a minimal amount of information.
-outorder [Type: String / Default: 'input' / Aliases: ord]
Whether the sequences should be output in input order or the order in which they were aligned. Valid values are : Input, Aligned.
-range [Type: List / Default: EMPTY / Aliases: rng]
The sequence range to write as a comma-seaparated value, e.g. m,n will write from m to m+n.
-pwmatrix [Type: String / Default: EMPTY]
The scoring matrix to use for pairwise alignment when performing slow pairwise alignments. Depending upon sequence type, it may either refer to DNA scoring matrix or a protein scoring matrix. This option is relevant only when ACTION=alignslow. Valid matrices are : blosum, pam, gonnet, id or filename.
-pwgapopen [Type: Double / Default: EMPTY]
The gap opening penalty during slow pairwise alignments. This option is relevant only when ACTION=alignslow.
-pwgapext [Type: Double / Default: EMPTY]
The gap extension penalty during slow pairwise alignments. This option is relevant only when ACTION=alignslow.
-ktuple [Type: Integer / Default: EMPTY]
Window size while doing fast pairwise alignment. This option is relevant only when ACTION=alignfast.
-topdiags [Type: Integer / Default: EMPTY]
Number of windows around best diagonals while doing fast pairwise alignment. This option is relevant only when ACTION=alignfast.
-window [Type: Integer / Default: EMPTY]
Number of windows around each of the top diagnoals. This option is relevant only when ACTION=alignfast.
-pairgap [Type: Integer / Default: EMPTY]
The number of matching residues required to open a gap while doing fast pairwise alignments. This option is only relevant when ACTION=alignfast.
-score [Type: String / Default: 'absolute']
Whether pairwise alignment scores should be reported as (raw) absolute scores or percentages absolute|percent}. This option is relevant only when ACTION=alignfast.
-matrix [Type: String / Default: EMPTY]
The scoring matrix to be used for multiple sequence alignments.
-gapopen [Type: Double / Default: EMPTY]
The gap opening penalty when doing multiple sequence alignments
-gapext [Type: Double / Default: EMPTY]
The gap extension penalty when doing multiple sequence alignments.
-endgaps [Type: Boolean / Default: 'false']
Turn on/off end gap separation penalty.
-pgap [Type: Boolean / Default: 'true']
Turn on/off residue-specific gap separation penalties.
-hgap [Type: Boolean / Default: 'true']
Turn on/off hydrophilic residue gaps.
-hgapresidues [Type: List / Default: EMPTY]
List of hydrophilic residues.
-maxdiv [Type: Double / Default: EMPTY]
Minimum percentage identity required for delay.
-outputtree [Type: String / Default: 'nj' / Aliases: outtreeformat
treefmt] The output format of the guide tree produces. Valid values: NJ, PHYLIP, DIST, NEXUS.
-negative [Type: Boolean / Default: 'false' / Aliases: neg]
Sets whether protein matrix contains negative values.
-monitor [Type: Boolean / Default: 'false' / Aliases: mon]
Turn on/off result monitoring.
The files described below supply auxiliary data to this program. The
program automatically reads them from a public data directory [$WPROOT/share/matrix/] unless you
either 1) have a data file with exactly the same name in your current working
directory; or 2) name a file on the command line with an expression like –data1=myfile.dat. For
more information see Section 4, Using Data Files in the User's Guide.
Local Scoring Matrices
This program reads one or more scoring matrices for the comparison of
sequence characters. The program automatically reads the program's default
scoring matrix in a public data directory unless you either
1) Have a data file with exactly the same name as the program default
scoring matrix in your current working directory; or
2) Have a data file with
exactly the same name as the program default scoring matrix in the directory
with the logical name Share_Matrix; or
3) Name a file on the command line with an expression like -matrix=mymatrix.cmp. If you
don't include a directory specification when you name a file with -matrix, the program
searches for the file first in your local directory, then in the directory with
the logical name Share_Matrix,. For more information
see "Using a Special Kind of Data File: A Scoring Matrix" in Section
4, Using Data Files in the User's Guide.
ClustalW+ reads a scoring matrix from your local directory or the public
database with the values for every possible symbol comparison. The file Clustalw+dna.cmp has a 10 at every place where the set of
bases implied by the alphabetic IUB ambiguity codes (see Appendix III) overlap. All of the other locations
have zeros. The file blosum62.cmp is based on substitutions between amino acid
pairs in ungapped blocks of aligned protein segments
as measured by Henikoff and Henikoff.
The scores in this matrix for pairwise amino acid
comparisons range from -4 to +11. You can use the Fetch+
program to copy these files and then modify them to suit you own needs.
You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.
-infile, -infile1,
-in
Inputs file specification.
-action, -act
The following actions are supported: alignfast: multiple alignments using fast pairwise alignment alignslow:
multiple alignments using slow pairwise
alignment tree: generate a phylogenetic tree given the alignment.
-outfile, -out
Output file produced by ClustalW+. This will be
an alignment file if action is alignfast or alignslow. It is a tree file if action is tree. Default
output file is clustalw.msf.
-check, -che, -help
Prints out this usage
message.
-default, -d, -def
Specifies
that sensible default values be used for all parameters where possible.
-documentation, -doc
Prints banner at program
startup.
-quiet, -qui
This parameter is not
supported.
-outorder
Whether the sequences should be output in input order
or the order in which they were aligned. Valid values are: Input, Aligned.
-range
The sequence range to write as a comma-separated value, e.g. m,n will write from m to m+n
-pwmatrix
The scoring matrix to use for pairwise alignment when performing slow pairwise
alignments. Depending upon sequence type, it may either refer to DNA scoring
matrix or a protein scoring matrix. This option is relevant only when action=alignslow. Valid matrices are: blosum,
pam, gonnet, id or filename
-pwgapopen
The gap opening penalty during slow pairwise alignments. This option is relevant only when
action=alignslow.
-pwgapext
The gap extension penalty during slow pairwise alignments. This option is relevant only when
action=alignslow.
-ktuple
Window size while doing fast pairwise
alignment. This option is relevant only when action=alignfast.
-topdiags
Number of windows around best diagonals while
doing fast pairwise alignment. This option is
relevant only when action=alignfast.
-window
Number of windows around each of the top
diagonals. This option is relevant only when action=alignfast.
-pairgap
The number of matching residues required to open
a gap while doing fast pairwise alignments. This
option is only relevant when action=alignfast.
-score
Whether pairwise
alignment scores should be reported as (raw) absolute scores or percentages {absolute|percent}. This option is relevant only when
action=alignfast.
-matrix
The scoring matrix to be
used for multiple sequence alignments.
-gapopen
The gap opening penalty
when doing multiple sequence alignments.
-gapext
The gap extension
penalty when doing multiple sequence alignments.
-endgaps
Turn on/off end gap
separation penalty.
-pgap
Turn on/off residue-specific
gap separation penalties.
-hgap
Turn on/off hydrophilic
residue gaps.
-hgapresidues
List of hydrophilic
residues.
-maxdiv
Minimum percentage
identity required for delay.
-outputtree
The output format of the
guide tree produces. Valid values: NJ, PHYLIP, DIST, NEXUS.
-negative, -neg
Sets whether protein
matrix contains negative values.
-monitor, -mon
Program monitors its progress on your screen by displaying a screen trace of progress. However, when you use -default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.
Printed: June 3, 2005 10:11
[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Technical
Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.