SEQMANIP+

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

FUNCTION

[ Top | Next ]

SeqManip+ is a utility program that allows users to manipulate sequences in ways that include splitting sequences into a set of overlapping segments, extracting a segment from a sequence, translating or complementing nucleotide sequences and back translating protein sequences. SeqManip+ provides a single program for performing operations similar to those previously done in GCG with Translate, Backtranslate, Reverse, and Breakup.

DESCRIPTION

[ Previous | Top | Next ]

Advantages of Plus “+” Programs:

P Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.

P Plus programs remove sequence length restriction of 350,000bp.

If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.

Operations that can be invoked from SeqManip+ are:

1. Translate

2. Backtranslate

3. Reverse complement

4. Sample

5. Reverse

6. Complement

7. Extract

EXAMPLE

[ Previous | Top | Next ]

Here is a session using SeqManip+ to translate the G-gamma gene in gamma.seq into the protein sequence for the human fetal beta globin G gamma.

%seqmanip+ -translate –open=50

SeqManip+ is a utility that accepts DNA or protein sequences to perform single/multiple operations on the selected sequences like extract, sample, translate, backtranslate, reverse and reverse complement.

Manipulate what sequence(s) ? ggamma.seq

Begin (* 1 *) ?

End (-1 for entire sequence) (* -1 *) ?

What should I call the output file (* <sequence_name>.seqmanip+ *) ?

Extracting the region 1 to 1700 from ggamma.seq ...

Writing 1 sequence(s) to output file ggamma.seq.seqmanip+

OUTPUT

[ Previous | Top | Next ]

Here is the output file ggamma.seq.seqmanip+:

!!RICH_SEQUENCE 1.0

{

name ggamma.seq_c366

descrip ORF translation of ggamma.seq from : 366 to : 650. [ 95 residues ]

type PROTEIN

checksum 9400

creation-date 02/17/2005 13:45:23

strand 1

sequence

MGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKVSPGDVSALL

PLVSRQLRQLSIDLSTAGCELFEDTGVGSEETAED

}

INPUT FILES

[ Previous | Top | Next ]

SeqManip+ accepts multiple (one or more) sequences as input. Input sequences may be either nucleotides or proteins, but only one type of sequence can be analyzed at a time. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*.

Single Sequence Input

If you specify a single sequence on the command line or in response to the first program prompt, and -default is not on the command line, SeqManip+ prompts you for the sequence range. After reading that range, SeqManip+ performs the operation specified in the command line. If no operation is specified on command line then SeqManip+ converts the given sequence(s) into an RSF format by default.

Multiple Sequence Input

When you specify multiple sequences, SeqManip+, by default, converts all the sequences in the MSF file into a single RSF format. You can also use a list file to specify multiple sequences.

For more information about list files, see "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide.

RELATED PROGRAMS

[ Previous | Top | Next ]

SeqConv+: It is a utility program used for conversion of sequences in to standard sequence formats. (BSML, FastA, SwissProt, and GenBank)

Translate: It translates nucleotide sequences into peptide sequences.

Sample: It extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.

Backtranslate: It backtranslates an amino acid sequence into a nucleotide sequence. The output helps you identify areas with fewer ambiguities that might be candidates for synthetic probes.

Reverse: It reverses and/or complements the symbols in a sequence. The output is written into a new sequence file.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases. Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.

Minimal Syntax: % seqmanip+ [-infile=] value -Default

Minimal Parameters (case-insensitive):

-infile         [Type: InFile / Default: EMPTY / Aliases: infile1 in]
                The name of the input sequence(s).

Prompted Parameters (case-insensitive):

-begin          [Type: Integer / Default: '1' / Aliases: beg]
                First base to read for each query sequence.

-end [Type: Integer / Default: '-1']
Last base to read for each query sequence.

-outfile        [Type: OutFile / Default: '<sequence_name>.seqmanip+' / Aliases: out outfile1] Names the output file.

Optional Parameters (case-insensitive):

-check          [Type: Boolean / Default: 'false' / Aliases: che help]
                Prints out this usage message.

-default [Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where possible.

-documentation [Type: Boolean / Default: 'true' / Aliases: doc]
Prints banner at program startup.

-quiet [Type: Boolean / Default: 'false' / Aliases: qui]
Tells application to print only a minimal amount of information.

-bsml [Type: Boolean / Default: 'false']
Output file will be in bsml format.

-reverse [Type: Boolean / Default: 'false' / Aliases: rev]
Reverse all input sequences.

-complement [Type: Boolean / Default: 'false' / Aliases: comp compl]Complement all input sequences (NA only)

-revcomp [Type: Boolean / Default: 'false' / Aliases: revcompl]
Reverse and complement all input sequences (NA only)

-sample [Type: Double / Default: '100.0']
Percent chance that an input sequence is included in output sequences.

-translate [Type: Boolean / Default: 'false' / Aliases: trans]
Translate all input sequences (NA only).

-open [Type: Integer / Default: '20']
Translates open reading frames only if they exceed the specified minimum peptide length. This option works only if ‘-Translate' is set to True.

-frame [Type: Integer / Default: '1']
Translate in specified reading frame (1, 2, 3, -1, -2, -3).

-allframes [Type: Boolean / Default: 'false' / Aliases: sixframe]
Translate in all 6 reading frames.

-backtranslate [Type: Boolean / Default: 'false' / Aliases: back backtrans revtrans]

Back translate all input sequences (AA only).

-listtranstables [Type: Boolean / Default: 'false' / Aliases: listgeneticcodes listcodes]

Lists available genetic codes.

-transtable [Type: Integer / Default: '0' / Aliases: geneticcode code]

Genetic code to use for (back) translations ("- listtranstables" for listing).

-codonfreq [Type: String / Default: 'share_codon:ecohigh.cod' / Aliases: freq]

File specifying codon frequences for best-guess backtranslation (AA only).

-ambiguities [Type: Boolean / Default: 'false' / Aliases: ambig]
Use ambiguities in back translated sequences instead of best-guess (AA only).

-winlen [Type: Integer / Default: '0']
Split sequence into windows of no more than winlen bases (0 means no splitting).

-winoverlap [Type: Integer / Default: '0' / Aliases: winover]
Overlap of windows when splitting sequences.

-listfile [Type: String / Default: EMPTY / Aliases: lis]
Writes a listfile of the output sequence names.

-extract [Type: Boolean / Default: 'true']
Extracts a range of the input sequence. Range is specified using -begin and -end option.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.

-infile, -infile1, -in

The name of the input sequence(s).

-begin, -beg

First base to read for each query sequence.

-end

Last base to read for each query sequence.

-outfile, -out, -outfile1

Names the output file.

-check, -che, -help

Prints out this usage message.

-default, -def

Specifies that sensible default values be used for all parameters where possible.

-documentation, -doc

Prints banner at program startup.

-quiet, -qui

This parameter is not supported.

-bsml

Output file will be in bsml format.

-reverse, -rev

Reverse all input sequences.

-complement, -comp

Complement all input sequences (NA only).

-revcomp

Reverse and complement all input sequences (NA only).

-sample

Percent chance that an input sequence is included in output sequences.

-translate, -trans

Translate all input sequences (NA only).
-open

Translates open reading frames only if they exceed the specified minimum peptide length. This option works only if '-Translate' is set to True.

-frame

Translate in specified reading frame (1, 2, 3, -1, -2, -3).

-allframes

Translate in all 6 reading frames.

-backtranslate, -backtrans

Back translate all input sequences (AA only).

-listtranstables

Lists available genetic codes.

-transtable

Genetic code to use for (back) translations ("-listtranstables" for listing).

-codonfreq

File specifying codon frequences for best-guess back translation (AA only).

-ambiguities, -ambig

Use ambiguities in back translated sequences instead of best-guess (AA only).
-winlen

Split sequence into windows of no more than winlen bases (0 means no splitting).

-winoverlap, -winover

Overlap of windows when splitting sequences.

-listfile, -lis

Writes a listfile of the output sequence names.

-extract

Extracts a range of the input sequence. Range is specified using -begin and -end option.

Printed: May 27, 2005 14:26

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.