PRETTY

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

CALCULATING AND DISPLAYING A CONSENSUS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE

FUNCTION

[ Top | Next ]

Pretty displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.

DESCRIPTION

[ Previous | Top | Next ]

Pretty prints sequences with their columns aligned and can display a consensus for the alignment, allowing you to look at relationships among the sequences. This program can be used for aligned sequences in an MSF (multiple sequence format) or RSF (rich sequence format) file, or for separate sequences that have had gaps added to make them all align.

You can change the alignments displayed by Pretty with a text editor. The output from Pretty can then be separated into individual sequence files by running Pretty with the command-line parameter -UGLy.

EXAMPLE

[ Previous | Top | Next ]

By repeatedly using the Gap program with the command-line parameter -OUT, gaps were added to a group of picorna virus capsid proteins in the antigenic region to make them align with each other and with a growing consensus sequence. Here is a session using Pretty to display the alignment and calculate a consensus sequence of the antigenic region from those picorna virus capsid protein sequences.

% pretty -CONsensus -CASe

 PRETTY format what sequence(s) ?  @pretty.list

                      fa10.ugly  len: 349  wgt: 0.50

                      fa12.ugly  len: 349  wgt: 0.50

                      //////////////////////////////

                       r14.ugly  len: 349  wgt: 0.50

                        r2.ugly  len: 349  wgt: 0.50

                  Begin (* 1 *) ?

                End (*   349 *) ?

 Find consensus to what minimum plurality (* 2.00 *) ?

 What should I call the output file (* pretty.pretty *) ?

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file:

Plurality: 2.00  Threshold: 4

AveWeight 0.55  AveMatch 2.91  AvMisMatch -2.00

PRETTY of: @pretty.list   October 7, 1998 10:35  ..

           1                                                   50

fa10.ugly  .......... .......... .......... ..TTttGESA D.PvtTtVE.

fa12.ugly  .......... .......... .......... ..TTatGESA D.PvtTtVE.

fo1k.ugly  .......... .......... .......... ..TTsaGESA D.PvtTtVE.

   e.ugly  Gvenae.kgv tEnTna.Tad fvaqpvyLPe .nqT...... kv.Affynrs

 p1m.ugly  GlgqmlEsmI .dnTvreTvg AatsrdaLPn teasGPthSk eiPALTAVET

 p1s.ugly  GlgqmlEsmI .dnTvreTvg AatsrdaLPn teasGPahSk eiPALTAVET

 p2s.ugly  GigdmiEgav .Egitknalv pptstnsLPg hkpsGPahSk eiPALTAVET

 p3s.ugly  Giedliseva .qgal..Tls lpkqqdsLPd tkasGPahSk evPALTAVET

 cb3.ugly  ...gpvEdaI .......T.. Aaigr..vad tvgTGPtnSe aiPALTAaET

 r14.ugly  GlgdelEevI vEkT.kqTv. Asi....... ..ssGPkhtq kvPiLTAnET

  r2.ugly  ...npvEnyI dEvlnevlv. .......vPn inssnPttSn saPALdAaET

Consensus  G-----E--I -E-T---T-- A------LP- --TTGPGESA D-PALTAVET

/////////////////////////////////////////////////////////////////

           301                                               349

fa10.ugly  aElyCPRPll AIkvtsqdRy KqKI.iAPa. ..KQll.... .........

fa12.ugly  aElyCPRPll AIevssqdRh KqKI.iAPg. ..KQll.... .........

fo1k.ugly  aEtyCPRPll AIhpt.eaRh KqKI.vAPv. ..KQTl.... .........

   e.ugly  krvfCPRPtv ffPwpTsG.D Kidmtpragv lmlespnald isrty....

 p1m.ugly  irvWCPRPPR AlaYygpGvD ykdgtltPls tkdlTTy... .........

 p1s.ugly  irvWCPRPPR AvaYygpGvD ykdgtltPls tkdlTTy... .........

 p2s.ugly  VrvWCPRPPR AvPYfgpGvD ykdg.ltPlp ekglTTy... .........

 p3s.ugly  VrvWCPRPPR AvPYygpGvD yrn.nldPls ekglTTy... .........

 cb3.ugly  VkaWiPRPPR lcqYekakn. vnfrssgvtt trqsiTtmtn tgaiwtti.

 r14.ugly  VEaWiPRaPR AlPY.Tsigr tny..pknte pvikkrk.gd i.ksy....

  r2.ugly  VkaWCPRPPR AleY.Trahr tnfkiedrsi qtaivTrpii ttagpsdmy

Consensus  VE-WCPRPPR AIPY-T-GRD K-KI--AP-- --KQTT---- ---------

INPUT FILES

[ Previous | Top | Next ]

Pretty accepts multiple (one or more) aligned nucleotide sequences or aligned protein sequences as input. You can specify an MSF file, such as the output file from a session with PileUp, as input to Pretty with a command like % pretty pileup.msf{*}. Similarly, you can specify an RSF file, such as the output file from a session with PileUp in SeqLab, as input to Pretty with a command like % pretty pileup.rsf{*}. Weights can be specified for sequences in both MSF and RSF files. (See the Vote Weight discussion below.) Multiple sequence alignments can also be represented with list files. For Pretty, these files may include a vote weight for each sequence with the wgt: sequence attribute.

Here is the input file of sequence names (pretty.list) from the example session:

!!SEQUENCE_LIST 1.0

A multiple sequence alignment represented as a list file for input to

the programs PRETTY, PROFILEMAKE and LINEUP.

7/30/94   ..

GenDocData:fa10.ugly    wgt: 0.5

GenDocData:fa12.ugly    wgt: 0.5

GenDocData:fo1k.ugly    wgt: 1.0

GenDocData:e.ugly       wgt: 1.0

GenDocData:p1m.ugly     wgt: 0.25

GenDocData:p1s.ugly     wgt: 0.25

GenDocData:p2s.ugly     wgt: 0.25

GenDocData:p3s.ugly     wgt: 0.25

GenDocData:cb3.ugly     wgt: 1.0

GenDocData:r14.ugly     wgt: 0.5

GenDocData:r2.ugly      wgt: 0.5

The function of Pretty depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

PileUp creates a multiple sequence alignment of a group of related sequences. If you run Gap with the command-line parameters for sequence output, it writes sequence files with the sequences expanded by the addition of gaps.

PrettyBox produces a PostScript file containing a multiple sequence alignment with residues shaded on the basis of agreement to a calculated consensus sequence, allowing you to identify relationships among the sequences.

PlotSimilarity plots the average similarity of two or more aligned sequences at each position in the alignment.

RESTRICTIONS

[ Previous | Top | Next ]

Pretty displays sequences that have already been aligned. You can use up to 500 sequences, although the total length of all sequences combined must be less than 2,000,000 characters.

CALCULATING AND DISPLAYING A CONSENSUS

[ Previous | Top | Next ]

If you use one of the command-line parameters -CONsensus, -DIFferences, or -CASe, Pretty calculates a consensus for each column of the alignment using the scoring matrix blosum62.cmp for peptides or prettydna.cmp for nucleic acids. The consensus symbol for a column is determined in two steps:

1) The program finds the symbol whose comparison to all of the symbols in the column (including itself) yields the greatest number of votes. A vote is cast for each symbol comparison that is greater than or equal to some set threshold value; votes can be either 1.0 or some vote weight assigned to the sequence from which the vote comes.

2) Among the coalition of symbols that voted for the winning symbol, the most common symbol is chosen as the consensus.

If there is no coalition of votes that is larger than all of the other coalitions, or if the largest coalition of votes is below the minimum plurality, then there is no consensus for the column.

The weights for each sequence and the minimum plurality are floating point numbers. The threshold value is an integer.

If you use -IDEntity, a consensus symbol is chosen only when all of the sequence symbols in a column of the alignment are identical, regardless of their votes.

If you use -CASe, Pretty shows the symbols in a column in uppercase when their comparison value with the consensus symbol meets or exceeds the threshold. All other symbols are in lowercase.

If you use -DIFferences, Pretty only shows those symbols in a column whose comparison value with the consensus symbol is lower than the threshold. These symbols are shown in lowercase; all other positions in the column are left blank.

If you use -CONsensus, Pretty adds a line to your alignment with the consensus sequence.

-THReshold=1

determines the scoring matrix value below which a symbol may not vote for a coalition. Pretty chooses a default threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default threshold accordingly. Use -THReshold to specify an alternative threshold if you don't want to accept the default value.

-PLUrality=2.0

defines the number of votes (vote weights) below which there is no consensus.

Vote Weight

If several of your sequences are very similar, you may not want their votes to dominate the consensus for the column. If your input file specification to Pretty is a list file, you can assign each sequence a vote weight with the wgt sequence attribute. The vote weight is the vote that each row casts for the consensus. A weight of 1.0 is assumed if no vote weight is specified. (See the INPUT FILES topic for information about the list file used to run the example above.) Note how each kind of sequence is assigned a vote weight so that their combined impact on the election is never more than one vote. For more information about list files, see "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide.

You can assign vote weights to sequences in an MSF file by editing the MSF file and modifying the weight on the name/weight line for each sequence at the top of the file. (See "Using Multiple Sequence Format (MSF) Files" in Section 2, Using Sequence Files and Databases in the User's Guide for a complete description of MSF files.)

You can assign vote weights to sequences in an RSF (rich sequence format) file by modifying the weight attribute for each sequence within SeqLab. (See "Using Rich Sequence Format (RSF) Files" in Section 2, Using Sequence Files and Databases in the User's Guide for a complete description of RSF files. Also see "Viewing and Editing Sequence Attribute and Reference Information" in Section 2, Editing Sequences and Alignments in the SeqLab Guide for more information about modifying the weight attribute for each sequence within an RSF file.)

If a sequence from an MSF or RSF file is listed in a list file with a vote weight, the vote weight in the list file is used; the sequence weight in the MSF or RSF file is ignored. If you add -WEIGHT=1.0 to the command line, Pretty ignores weights specified for individual sequences and gives all of the sequences in the alignment equal weight.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % pretty [-INfile=]@pretty.list -Default

Prompted Parameters:

-BEGin=1 -END=349         sets the range of interest

[-OUTfile=]pretty.pretty  names the output file

Local Data Files:

-MATRix=prettydna.cmp  assigns the scoring matrix for nucleotides

-MATRix=blosum62.cmp   assigns the scoring matrix for proteins

Optional Parameters:

-CONsensus         generates a consensus sequence

-IDEntity[=*]      shows only positions of unanimous agreement

                     in the consensus

-DIFferences[="-"] shows only positions disagreeing with the calculated

                     consensus

-CASe              shows positions agreeing with the calculated consensus

                     in uppercase

-THReshold=1       sets minimum comparison value for symbol to vote

                     in consensus

-PLUrality=2.0     defines the minimum number of votes for a consensus

                     to exist

-LINesize=50       sets the number of residues per line

-WEIGHT=1.0        sets the weight for all input sequences

-BLOcksize=10      sets the number of residues per block

-UGLy              writes the individual sequences into new files

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program's default scoring matrix in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Section 4, Using Data Files in the User's Guide.

If you use one of the command-line parameters -CONsensus, -DIFferences, or -CASe, Pretty calculates a consensus for each column using a scoring matrix (see Section 4, Using Data Files in the User's Guide). You can provide your own matrix called either blosum62.cmp for peptides or prettydna.cmp for nucleic acids. You can specify some other matrix with the command-line parameter -MATRix=filename.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-MATRix=mymatrix.cmp

Allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData.

For more information see the Local Scoring Matrices section.

-CONsensus

Causes Pretty to show a consensus sequence for the set of sequences you are displaying. (Read how Pretty finds the consensus above.)

-IDEntity=*

Causes Pretty to show a consensus indicating where there is complete agreement among all of the sequences. If an optional character is added after the command-line parameter, Pretty uses that character to indicate complete agreement. Otherwise, the consensus contains the completely conserved sequence symbol.

-DIFferences="-"

Causes Pretty to print only those symbols in each column whose comparison value with the consensus symbol is lower than the threshold (see -THReshold below), and to print blank spaces at all other positions. If an optional character is added, Pretty prints that character instead of blank spaces. The optional character has to be enclosed in quotes.

-CASe

Causes Pretty to print in uppercase all those symbols in each column whose comparison value with the consensus symbol is greater than or equal to the threshold (see -THReshold below), and to print all other symbols in lowercase. This parameter overrides -DIFferences if both are used.

-THReshold=1

Determines the scoring matrix value below which a symbol may not vote for a coalition (see the CALCULATING A CONSENSUS topic above). Pretty chooses a default threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default threshold accordingly. Use -THReshold to specify an alternative threshold if you don't want to accept the default value.

-PLUrality=2.0

Defines the number of votes (vote weights) below which there is no consensus (see the CALCULATING A CONSENSUS topic above).

-LINesize=50

Specifies the number of sequence symbols to display on each line.

-WEIGHT=1.0

Sets the sequence weight for all input sequences. When the weight is set from the command line, Pretty ignores weights for individual sequence files in a list file, a multiple sequence format (MSF) file, or a rich sequence format (RSF) file.

-BLOcksize=10

Specifies the number of sequence symbols to put into each block.

-UGLy

Rewrites the sequences in a Pretty output file into individual sequence files in GCG format. The Pretty output file must have a line with two periods (..) separating the text in the heading from the sequences. -UGLy also causes Pretty to write a list file to go with the new sequence files.

Printed: May 27, 2005 14:04

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.