FITCONSENSUS

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

STATISTICS USED

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

FitConsensus uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality.

DESCRIPTION

[ Previous | Top | Next ]

FitConsensus uses a consensus table, generated by the Consensus program, as a probe. The program checks all possible alignments of the top strand of your nucleotide sequence to the table and reports those with the best fits. You can select the number of fits you want to see. If there are any positions of the consensus table with values of 100%, FitConsensus allows you to choose if these are truly a necessary condition for a fit. The fits are reported in ascending order of the position in the sequence where the fit was found. The technique used by FitConsensus is discussed by Staden (Nucl. Acids Res. 12(1); 505-519 (1984)). FitConsensus cannot put gaps in the alignments of the table to the sequence.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using FitConsensus to find the best examples of intervening sequence donor splice sites in the sequence gamma.seq:

 
 
% fitconsensus
 
  FITCONSENSUS into what sequence file ?  gamma.seq
 
             Begin (* 1 *) ?
           End (* 11375 *) ?
 
  Using what consensus table file ?  donor.csn
 
  Your consensus table has position(s) with 100% certainty,
 
  Are these necessary conditions for a fit (* Yes *) ?
 
  Show how many fits (* 40 *) ?
 
  What should I call the output file (* gamma.fit *) ?
 
        .................................................
        .................................................
        .................
 
%

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file:

 
 
 FITCONSENSUS of: gamma.seq  Check: 6474  from: 1  to: 11375
 
Human fetal beta globins G and A gamma
from Shen, Slightom and Smithies,  Cell 26; 191-203.
Analyzed by Smithies et al. Cell 26; 345-353.
 
 Using Consensus: donor.csn
 
!!AA_SEQUENCE 1.0
 CONSENSUS from:
Splice site sequences
from Stephen Mount NAR 10(2) 459;472 figure 1 page 460
 
 List-size: 40  Average quality: 38.26      October 13, 1998 14:52   ..
 
  position:   416   607  1430  1452  1764  2229  2267  2612  3120  3132  4267
     frame:     2     1     2     3     3     3     2     2     3     3     1
   quality: 51.42 50.75 50.33 49.08 48.50 51.75 51.42 58.25 50.83 48.00 51.25
 
   //////////////////////////////////////////////////////////////////////////
 
  position:  9801 10333 10420 10433 11059 11315 11334
     frame:     3     1     1     2     1     2     3
   quality: 48.25 48.92 51.33 47.75 51.42 48.75 48.92

INPUT FILES

[ Previous | Top | Next ]

FitConsensus takes the output file from Consensus as one of its input files. Preassembled donor and acceptor splice site consensus tables are present in the public data files called donor.csn and acceptor.csn. Here is the donor.csn file:

 
 
!!AA_SEQUENCE 1.0
 CONSENSUS from:
 
Splice site sequences
from Stephen Mount NAR 10(2) 459;472 figure 1 page 460
 
                                            *****
 
 %G      20    9   11   74  100    0   29   12   84    9   18   20
 %A      30   40   64    9    0    0   61   67    9   16   39   24
 %U      20    7   13   12    0  100    7   11    5   63   22   27
 %C      30   44   11    6    0    0    2    9    2   12   20   28
 
 Total  140  140  140  140  140  140  140  140  140  140  137  137
 
                                             *****
 
  CONSENSUS sequence to a certainty level of  75 percent at each position:
 
  Donor.Csn  Length: 12  July 20, 1994 11:03  Type: N  Check: 6055  ..
 
       1  VMWKGTRRGW HH
 

The other type of input file accepted by Fitconsensus is a single nucleotide sequence file. If FitConsensus rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

The Consensus program writes a consensus table from a group of prealigned sequences into a file with the correct format for input to FitConsensus. FindPatterns finds short sequence patterns allowing ambiguity and mismatch.

ProfileMake creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap). ProfileGap makes an optimal alignment between a profile and one or more sequences.

RESTRICTIONS

[ Previous | Top | Next ]

FitConsensus only searches the top strand of a nucleotide sequence. The statistic used is under study and probably changes as more sensitive instruments are found.

STATISTICS USED

[ Previous | Top | Next ]

A program very similar to FitConsensus is described by Staden (Nucl. Acids Res. 12(1); 505-519(1984)). A table of the kind shown below is aligned over the input sequence in every frame. If you require that any 100%'s in your table are necessary conditions for a fit, then the alignment is first checked to see if the sequence is correct in these known positions. If the known positions test is passed, then for each alignment, the input sequence at each position is given a score equivalent to its value in the table. T in the first position would get a score of 20 in this example. An ambiguity code would get the average value for the several nucleotides it represents. R (representing A or G) in the second position would get a score of 24.5 in this example. The values for each position of an alignment are summed and divided by the size of the consensus table (12 in this case). Starting at the first position in the range of interest in your sequence, the frame cycles through these three steps repeatedly to the end of the sequence range.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % fitconsensus [-INfile1=]gamma.seq \
                  [-INfile2=]donor.csn -Default
 
Prompted Parameters:
 
-BEGin=1 -END=11375   sets the range of interest
-NONECessary          specifies that 100% position certainty is unnecessary
-LIStsize=40          sets the number of fits to show
[-OUTfile=]gamma.fit  specifies the output file name
 
Local Data Files:     None
 
Optional Parameters:  None

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-NONECessary

Allows display of fits that do not perfectly match the consensus table.

-LIStsize=40

Sets the number of fits to be displayed.

Printed: April 5, 2005  14:58


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio