XNU

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

INPUT FILES

RELATED PROGRAMS

RESTRICTIONS

CONSIDERATIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

Xnu replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

DESCRIPTION

[ Previous | Top | Next ]

The Karlin-Altschul statistics that underlie BLAST assume that the probability of finding a residue at any particular position in a sequence is simply proportional to its composition. Tandem repeats may violate this assumption. Such regions occur frequently in proteins. Query sequences containing such repeats may give significant similarity scores when compared to unrelated proteins containing similar repeats.

Xnu is a program described by Claverie and States in Computers and Chemistry, 17; 191-201 (1993) that is used to mask off tandem repeats in protein sequences. The output sequence is just like the input sequence except that if tandem repeats are found, the amino acid characters comprising such repeats are replaced by X's. Regions containing X's are ignored in a BLAST search.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Xnu to mask off the repeats in a human major prion protein precursor.

 
 
% xnu
 
 XNU of what input sequence(s) ?  PIR:Ujhu
 
                  Begin (* 1 *) ?
                End (*   253 *) ?
 
 What should I call the output file (* ujhu.xnu *) ?
 
        PIR1:UJHU   Len:     253
%

OUTPUT

[ Previous | Top | Next ]

Each output file contains the input sequence with the amino acid characters that comprise statistically-significant tandem repeats changed into X's. Here is the output file from the session above.

 
 
!!AA_SEQUENCE 1.0
  XNU of: ujhu  check: 8781  from: 1  to: 253
 
P1;UJHU - major prion protein precursor - human
N;Alternate names: 11K amyloid protein; 27-30K sialoglycoprotein; PrP 27-30;
 PrP 33-35C; scrapie prion protein
C;Species: Homo sapiens (man)
C;Date: 25-Oct-1987 #sequence_revision 12-Apr-1996 #text_change 05-Sep-1997
C;Accession: A24173; A40372; A05017; S14078; I54322; I68597; I58135; I59184;
 I79633; I79634
R;Kretzschmar, H.A.; Stowring, L.E.; Westaway, D.; Stubblebine, W.H.; Prusiner,
 S.B.; Dearmond, S.J
 
ujhu.xnu  Length: 253  October 13, 1998 13:56  Type: P  Check: 1796  ..
 
       1  MANLGCWMLV LFVATWSDLG LCKKRPKPGG WNTGGSRYPG QGSPGGNRYP
 
      51  PQGGGGWGQP HGGGWGQPHG GGWGQPHGGG WGQPHGGGWG QGGGTHSQWX
 
     101  XXXXXXXXXX XXXXXXXXXX XXXXXXXYML GSAMSRPIIH FGSDYEDRYY
 
     151  RENMHRYPNQ VYYRPMDEYS NQNNFVHDCV NITIKQHTVT TTTKGENFTE
 
     201  TDVKMMERVV EQMCITQYER ESQAYYQRGS SMVLFSSPPX XXXXXXXXXX
 
     251  XVG

INPUT FILES

[ Previous | Top | Next ]

You can specify either a single protein sequence or multiple protein sequences as input to Xnu. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. If Xnu rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Seg replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

Repeat finds direct repeats in sequences. You must set the size, stringency, and range within which the repeat must occur; all the repeats of that size or greater are displayed as short alignments.

RESTRICTIONS

[ Previous | Top | Next ]

Xnu does not recognize repeats if the width is set much longer than the length of either the repeat or the sequence. Its behavior is not characterized for sequence symbols that are not among the standard unambiguous IUPAC-IUB amino acid single-letter symbols (ACDEFGHIKLMNPQRSTVWY).

CONSIDERATIONS

[ Previous | Top | Next ]

Repeat sequences are scored as segment pairs (short gapless alignments). All of the residues in both of the segments of a significant pair are replaced with X's.

Xnu uses a PAM120 scoring matrix for scoring similarities. You cannot select any other scoring matrix. By default, repeats less than five residues long are eliminated unless you set a different maximum repeat length with -WIDth.

Many single tandem repeats will not be masked, while triplet repeats of the same kind will be. STUSTU would not be found where STUSTUSTU will be.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % xnu [-INfile=]pir:ujhu -Default
 
Prompted Parameters: (for single sequences)
 
-BEGin=1  -END=253      sets the range of interest
[-OUTfile=]ujhu.xnu     names the output file
 
Local Data Files:       None
 
Optional Parameters:
 
-BEGin=1  -END=100      sets the range of interest (for multiple sequences)
-PRObability=.01        sets the expectation level for a repeat
-WIDth=4                sets the minimum size of a repeat
-EXTension=.xnu         sets the default output file name extension
-LIStfile[=xnu.list]    writes a list file of output sequence names
-NOMONitor              suppresses screen monitor of input sequence names
-NOSUMmary              suppresses the screen summary

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-BEGin=1

Sets the beginning position for all input sequences. When the beginning position is set from the command line, Xnu ignores beginning positions specified for individual sequences in a list file.

-END=100

Sets the ending position for all input sequences. When the ending position is set from the command line, Xnu ignores ending positions specified for sequences in a list file.

-PRObability=.01

For a repeat to be recognized, it must score high enough so that you would not expect to see a higher score more than once in 100 searches of random sequences of average length and composition. Use this parameter to change that expectation cutoff. Setting this cutoff lower than its default of 0.01 makes the search more stringent and the number of repeats masked off fewer. The minimum and maximum values of this parameter are 0.0001 and 0.1.

-WIDth=4

Sets the maximum size of a repeat. If a repeat were of length five, even if it were significant, it would not be found if this parameter were set to four. When this value is set to zero, Xnu will search for repeats of any size. Very short repeats may not score above the default probability cutoff (see -PRObability above). The maximum value of this parameter is 100. The larger it is, the longer the search will take.

-EXTension=.xnu

This program normally creates output file names by using the original input file name for the base name and the program name for the name extension. Use this parameter to specify some other file name extension.

-LIStfile=xnu.list

Writes a list file with the names of the output sequence files. This list file is suitable for input to other GCG Package programs that support list files (see Chapter 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Xnu makes one up using xnu for the file name and .list for the file name extension.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

Writes a summary of the program's work to the screen when you've used -Default to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: April 5, 2005 15:48


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio