BREAKUP

[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

INPUT FILES

RELATED PROGRAMS

CONSIDERATIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

BreakUp reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Accelrys GCG (GCG) programs.

DESCRIPTION

[ Previous | Top | Next ]

This program converts a user sequence that is longer than 350,000 bases to a set of sequences, none longer than 110,000 bases, by breaking the input sequence at 100,000 base boundaries and including 10,000 bases of overlap in the output files.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using BreakUp to convert the user sequence lengthy.seq, of length 600,000 bases, to a set of six output sequence files, each with no more than 110,000 bases.

 
 
% breakup
 
 BREAKUP what file(s) ?  lengthy.seq
        lengthy_0.seq  length: 110000 bp
        lengthy_1.seq  length: 110000 bp
        lengthy_2.seq  length: 110000 bp
        lengthy_3.seq  length: 110000 bp
        lengthy_4.seq  length: 110000 bp
        lengthy_5.seq  length: 100000 bp
 
%

INPUT FILES

[ Previous | Top | Next ]

BreakUp accepts a single sequence or multiple sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*. The function of BreakUp depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

            SeqConv+ is GCG file utilities programs. Breakup option of SeqConv+ splits each sequence that would suffice.

 

CONSIDERATIONS

[ Previous | Top | Next ]

Sequence files prepared with a text editor or brought to your computer from other sources may contain lines longer than 511 characters.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % breakup [-INfile=]breakup.txt -Default
 
Prompted Parameters:  None
 
Local Data Files:   None
 
Optional Parameters:
 
-NOMONitor                suppresses the screen trace showing each file
-LINesize=50              sets number of characters per line
-BLOcksize=10             sets number of characters per block
-BLAnklines=1             puts blank lines between the sequence lines
-SEGmentsize=100000       sets number of nonoverlapping bases per segment
-OVErlap=10000            sets number of overlapping bases per segment
-NONUMbering              suppresses numbering
-NOCOMments               suppresses comments
-PROtein                  insists that the sequences are reformatted as
                            protein sequences
-NUCleotide               insists that the sequences are reformatted as
                            nucleic acid sequences
[-OUTfile=]newseqname     lets you name the output file
-EXTension=.seq           defines a file name extension

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-LINesize=50

Lets you set the number of sequence characters per line to any number between 1 and 120.

-BLOcksize=10

Lets you set the number of sequence characters in each block to any number between 1 and the line size.

-BLAnklines=1

Leaves zero or more blank lines between the sequence lines.

-SEGmentsize=100000

Lets you set the number of non-overlapping sequence characters in each output file to any number greater than the overlap and less than 350000.

-OVErlap=10000

Lets you set the number of overlapping sequence characters in each output file to any number between 0 and the segment size. The sum of the segment size and the overlap size must, however, be less than 350000.

-NONUMbering

Suppresses the numbering next to each sequence line.

-NOCOMments

Suppresses any comments that may have been in the input sequence file.

-PROtein

Sets the sequence type to protein.

-NUCleotide

Sets the sequence type to nucleotide.

-OUTfile=newseqname

Selects an output filename other than the name of the input file.

-EXTension=.seq

Selects a filename extension other than the input filename extension.

Printed: May 27, 2005 11:49


[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio