[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
SeqManip+ is a utility program that allows users to manipulate sequences in ways that include splitting sequences into a set of overlapping segments, extracting a segment from a sequence, translating or complementing nucleotide sequences and back translating protein sequences. SeqManip+ provides a single program for performing operations similar to those previously done in GCG with Translate, Backtranslate, Reverse, and Breakup.
Advantages of Plus “+” Programs:
P Plus programs are enhanced to be able to read sequences in a variety of native formats such as GCG RSF, GCG SSF, GCG MSF, GenBank, EMBL, FastA, SwissProt, PIR, and BSML without conversion.
P Plus programs remove sequence length restriction of 350,000bp.
If you do not need these features and wish to have more interactivity, you might wish to seek out and run the original program version.
Operations that can be invoked from SeqManip+ are:
1. Translate
2. Backtranslate
3. Reverse complement
4. Sample
5. Reverse
6. Complement
7. Extract
Here is a session using SeqManip+ to translate the G-gamma gene in gamma.seq into the protein sequence for the human fetal beta globin G gamma.
%seqmanip+ -translate –open=50
SeqManip+ is a utility that accepts DNA
or protein sequences to perform single/multiple operations on the selected
sequences like extract, sample, translate, backtranslate, reverse and reverse
complement.
Manipulate what sequence(s) ?
ggamma.seq
Begin (* 1 *) ?
End (-1 for entire sequence) (* -1 *) ?
What should I call the output file (*
<sequence_name>.seqmanip+ *) ?
Extracting the region 1 to 1700 from
ggamma.seq ...
Writing 1 sequence(s) to output file
ggamma.seq.seqmanip+
Here is the output file ggamma.seq.seqmanip+:
!!RICH_SEQUENCE 1.0
..
{
name ggamma.seq_c366
descrip ORF translation of
ggamma.seq from : 366 to : 650. [ 95 residues ]
type PROTEIN
checksum 9400
creation-date
strand 1
sequence
MGNPKVKAHGKKVLTSLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKVSPGDVSALL
PLVSRQLRQLSIDLSTAGCELFEDTGVGSEETAED
}
SeqManip+ accepts multiple (one or more) sequences as input. Input sequences may be either nucleotides or proteins, but only one type of sequence can be analyzed at a time. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*.
Single Sequence Input
If you specify a single sequence on the command line or in response to the first program prompt, and -default is not on the command line, SeqManip+ prompts you for the sequence range. After reading that range, SeqManip+ performs the operation specified in the command line. If no operation is specified on command line then SeqManip+ converts the given sequence(s) into an RSF format by default.
Multiple Sequence Input
When you specify multiple
sequences, SeqManip+, by default, converts all the sequences in the MSF file
into a single RSF format. You can also use a list file to specify multiple
sequences.
For more information about list
files, see "Using List Files" in Section 2, Using Sequence Files and
Databases in the User's Guide.
SeqConv+: It is a utility program used for conversion of sequences in to standard sequence formats. (BSML, FastA, SwissProt, and GenBank)
Translate: It translates nucleotide sequences into peptide sequences.
Sample: It extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.
Backtranslate: It backtranslates an amino acid sequence into a nucleotide sequence. The output helps you identify areas with fewer ambiguities that might be candidates for synthetic probes.
Reverse: It reverses and/or complements the symbols in a sequence. The output is written into a new sequence file.
All parameters for this program may be added to the command line. Use -check to view the summary below and to specify parameters before the program executes. In the syntax summary below, square brackets ([ and ]) enclose parameter values that are optional. For each program parameter, square brackets enclose the type of parameter value specified, the default parameter value, and shortened forms of the parameter name, aliases. Programs with a plus in the name use either the full parameter name or a specified alias. If “Type” is “Boolean”, then the presence of the parameter on the command line indicates a true condition. A false condition needs to be stated as, parameter=false.
Minimal
Syntax: % seqmanip+ [-infile=] value -Default
Minimal Parameters (case-insensitive):
-infile
[Type: InFile / Default: EMPTY / Aliases: infile1 in]
The name of the input sequence(s).
Prompted Parameters (case-insensitive):
-begin
[Type: Integer / Default: '1' / Aliases: beg]
First base to read for each query sequence.
-end
[Type: Integer / Default: '-1']
Last base to read for each query sequence.
-outfile
[Type: OutFile / Default: '<sequence_name>.seqmanip+' / Aliases:
out outfile1] Names the output file.
Optional Parameters (case-insensitive):
-check
[Type: Boolean / Default: 'false' / Aliases: che help]
Prints out this usage message.
-default
[Type: Boolean / Default: 'false' / Aliases: d def]
Specifies that sensible default values be used for all parameters where
possible.
-documentation [Type: Boolean / Default: 'true' /
Aliases: doc]
Prints banner at program startup.
-quiet
[Type: Boolean / Default: 'false' / Aliases: qui]
Tells application to print only a minimal amount of information.
-bsml
[Type: Boolean / Default: 'false']
Output file will be in bsml format.
-reverse
[Type: Boolean / Default: 'false' / Aliases: rev]
Reverse all input
sequences.
-complement [Type: Boolean / Default: 'false' / Aliases: comp compl]Complement all input sequences (NA only)
-revcomp [Type:
Boolean / Default: 'false' / Aliases: revcompl]
Reverse and complement all input sequences (NA only)
-sample
[Type: Double / Default: '100.0']
Percent chance that an input sequence is included in output sequences.
-translate [Type: Boolean /
Default: 'false' / Aliases: trans]
Translate all input
sequences (NA only).
-open
[Type: Integer / Default: '20']
Translates open reading frames only if they exceed the specified minimum peptide
length. This option works only if ‘-Translate' is set to True.
-frame
[Type: Integer / Default: '1']
Translate in specified
reading frame (1, 2, 3, -1, -2, -3).
-allframes [Type: Boolean /
Default: 'false' / Aliases: sixframe]
Translate in all 6 reading frames.
-backtranslate [Type: Boolean / Default: 'false' / Aliases: back backtrans revtrans]
Back translate all input sequences (AA only).
-listtranstables [Type: Boolean / Default: 'false' / Aliases: listgeneticcodes listcodes]
Lists available genetic codes.
-transtable [Type: Integer / Default: '0' / Aliases: geneticcode code]
Genetic code to use for (back) translations ("- listtranstables" for listing).
-codonfreq [Type: String / Default: 'share_codon:ecohigh.cod' / Aliases: freq]
File specifying codon frequences for best-guess backtranslation (AA only).
-ambiguities [Type: Boolean / Default:
'false' / Aliases: ambig]
Use ambiguities in back translated sequences instead of best-guess (AA
only).
-winlen
[Type: Integer / Default: '0']
Split sequence into windows of no more than winlen bases (0 means no
splitting).
-winoverlap [Type: Integer /
Default: '0' / Aliases: winover]
Overlap of windows when
splitting sequences.
-listfile [Type:
String / Default: EMPTY / Aliases: lis]
Writes a listfile of the output sequence names.
-extract
[Type: Boolean / Default: 'true']
Extracts a range of the input sequence. Range is specified using -begin
and -end option.
You can set the parameters listed below from the command line. Shortened forms of the parameter name, aliases, are shown, separated by commas.
-infile, -infile1, -in
The name of the input
sequence(s).
-begin, -beg
First base to read for each query sequence.
-end
Last base to read for each query sequence.
-outfile, -out, -outfile1
Names
the output file.
-check, -che, -help
Prints out this usage message.
-default, -def
Specifies that sensible default values be used for all parameters where possible.
-documentation, -doc
Prints banner at program startup.
-quiet, -qui
This parameter is not supported.
-bsml
Output file will be in bsml format.
-reverse, -rev
Reverse all input sequences.
-complement, -comp
Complement all input sequences (NA only).
-revcomp
Reverse and complement
all input sequences (NA only).
-sample
Percent chance that an input sequence is included in output sequences.
-translate, -trans
Translate all input
sequences (NA only).
-open
Translates open reading
frames only if they exceed the specified minimum peptide length. This option
works only if '-Translate' is set to True.
-frame
Translate in specified reading frame (1, 2, 3, -1, -2, -3).
-allframes
Translate in all 6 reading frames.
-backtranslate, -backtrans
Back translate all input sequences (AA only).
-listtranstables
Lists available
genetic codes.
-transtable
Genetic code to use for (back) translations ("-listtranstables"
for listing).
-codonfreq
File specifying codon frequences for best-guess back translation (AA
only).
-ambiguities, -ambig
Use ambiguities in
back translated sequences instead of best-guess (AA only).
-winlen
Split sequence into windows of no more than winlen bases (0 means no splitting).
-winoverlap,
-winover
Overlap of windows when splitting sequences.
-listfile, -lis
Writes a listfile of the output sequence names.
-extract
Extracts a range of the input sequence. Range is specified using -begin
and -end option.
Printed: May 27, 2005 14:26
[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]
Technical
Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.