OLDDISTANCES

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

RELATED PROGRAMS

RESTRICTIONS

ALGORITHM

CONSIDERATIONS

SUGGESTIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

OldDistances makes a table of the pairwise similarities within a group of aligned sequences.

DESCRIPTION

[ Previous | Top | Next ]

OldDistances writes a matrix of the pairwise similarities between up to 50 different sequences in a multiple sequence alignment. The similarity value is the number of "matches" between each sequence pair divided by the sequence length.

Matches

A match occurs if the value in the scoring matrix for a pair of bases or amino acids is greater than or equal to a set match threshold.

Denominator

The denominator can be any of four functions of sequence length: 1) the length of the shorter sequence of the pair; 2) the length without gaps of the shorter sequence of the pair; 3) the average of the sequence lengths; or 4) the average of the sequence lengths without gaps.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using OldDistances to determine similarities between aligned sequences in the file hsp70.msf:

 
 
% olddistances
 
 OLDDISTANCES within what multiple sequence alignment ?  hsp70.msf{*}
 
 What is the threshold for a match (* 2 *) ?
 
 Divide the sum of the matches by:
 
     1)  Length of shorter sequence including gaps
     2)  Length of shorter sequence excluding gaps
     3)  Average sequence length including gaps
     4)  Average sequence length excluding gaps
     5)  Nothing
 
 Please choose one (* 2 *) :
 
     hsp70.msf{s11448} 743
     hsp70.msf{s06443} 743
     hsp70.msf{a25398} 743
 
     /////////////////////
 
     hsp70.msf{s20149} 743
     hsp70.msf{a32493} 743
     hsp70.msf{s29261} 743
 
 What should I call the output file (* hsp70.olddistances *) ?
 
%

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file; it contains a 25 X 25 matrix (not all of which is shown):

 
 
 OLDDISTANCES within: hsp70.msf{*}  October 20, 1998 12:35
 
Threshold of comparison: 2
            Denominator: "Length of shorter sequence without gaps"
    Number of sequences: 25
Symbol Comparison Table: GenRunData:blosum62.cmp
 
Key for column and row indices:
 
  1         hsp70.msf{S11448}  Length: 743       Length without gaps: 653
  2         hsp70.msf{S06443}  Length: 743       Length without gaps: 516
  3         hsp70.msf{A25398}  Length: 743       Length without gaps: 661
 
  ///////////////////////////////////////////////////////////////////////
 
 25         hsp70.msf{S29261}  Length: 743       Length without gaps: 638
 
 Distance Matrix Part: 1
 
                 1         2         3         4         5         6   ...
 _____________________________________________________________________ ...
|    1   |    1.0000    0.9845    0.8698    0.8760    0.7679    0.7668 ...
|    2   |              1.0000    0.9380    0.9360    0.8120    0.8140 ...
|    3   |                        1.0000    0.9334    0.7586    0.7590 ...
|    4   |                                  1.0000    0.7695    0.7637 ...
 
//////////////////////////////////////////////////////////////////////

RELATED PROGRAMS

[ Previous | Top | Next ]

PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. Gap makes sequence alignments. ProfileGap aligns a new sequence to an existing multiple sequence alignment. Pretty displays multiple sequence alignments.

Distances writes a matrix of the pairwise genetic distances between sequences in a multiple sequence alignment. These distances are suitable for input into programs such as GrowTree that create evolutionary trees. Distances provides several methods for correcting the distance calculations to account for multiple substitions at a single site, and the distance value is expressed as the number of nucleotide or amino acid substitutions per 100 residues.

RESTRICTIONS

[ Previous | Top | Next ]

The sequences must be aligned properly for OldDistances to work.

ALGORITHM

[ Previous | Top | Next ]

OldDistances compares each pair of aligned sequences base by base from the first symbol to the last symbol of the shorter sequence. The sequences must have already been aligned for the comparison to make sense. OldDistances simply counts the matches where the scoring matrix value is greater than a set match threshold. The sum of the matches is divided by various denominators such as the length of the shorter sequence.

Gaps are treated like any other symbol. The gap symbol (.) matches another symbol if that pair's value in the scoring matrix is above the threshold.

CONSIDERATIONS

[ Previous | Top | Next ]

OldDistances chooses a default match threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix wit the -MATRix command-line parameter, the program will adjust the default match threshold accordingly.

SUGGESTIONS

[ Previous | Top | Next ]

If the sequences are not in an MSF file, use Pretty to display the aligned sequences you pass to OldDistances. If they look right in the Pretty display, they work sensibly with OldDistances.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % olddistances [-INfile=]hsp70.msf{*} -Default
 
Prompted Parameters:
 
-THReshold=2                sets minimum scoring matrix score for a match
-MENu=2                     divides the sum of the matches by:
                              1=length of the shorter sequence
                              2=length of the shorter sequence without gaps
                              3=average length
                              4=average length without gaps
                              5=nothing
 
[-OUTfile=]hsp70.distances  names the output file
 
Local Data Files:
 
-MATRix=blosum62.cmp        assigns the scoring matrix for proteins
-MATRix=dnadistances.cmp    assigns the scoring matrix for nucleic acids
 
Optional Parameters: None

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Section 4, Using Data Files in the User's Guide.

OldDistances reads the scoring matrix file blosum62.cmp for peptide comparisons and dnadistances.cmp for nucleotide comparisons.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-THReshold=2

Sets the minimum scoring matrix value for a match.

-MENu=2

Sets the method used to determine the final score. Methods 1 through 4 divide the sum of the matches by the length of the shorter sequence, the length of the shorter sequence without gaps, the average length of the two sequences, and the average length of the two sequence without gaps, respectively. Method 5 reports the sum of the matches without modification.

-MATRix=mymatrix.cmp

Allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData.

For more information see the Local Scoring Matrices section.

Printed: May 27, 2005  13:52 


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio