ASSEMBLE

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

FUNCTION

[ Top | Next ]

Assemble constructs new sequences from pieces of existing sequences. It concatenates the fragments you specify and writes them out as a new sequence file. Assemble is best for assembling sequences from fragments defined in a list file.

DESCRIPTION

[ Previous | Top | Next ]

Assemble lets you choose segments from existing sequences. The segments can be of any length and can come from either strand. Unlike most GCG programs, Assemble lets you specify segments that extend across the end and into the beginning of the sequence. Assemble concatenates all of the segments you specify in the order in which you specify them and then writes the resulting construct into a new sequence file.

You can specify each segment interactively, one after another, in response to the program prompts, or alternatively, you can identify the segments to be assembled in a list file.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Assemble to assemble a coding sequence from the file gamma.seq interactively:

% assemble

 ASSEMBLE from what sequence ?  gamma.seq

               Begin (* 1 *) ?  2179

            End (*  11375 *) ?  2270

            Reverse (* No *) ?

 That range begins ATGGG and ends GGAAG.  Is this correct (* Yes *) ?

 That is done, now would you like to:

     A)dd another segment from this sequence

     G)et segments from another sequence

     W)rite out this assembly into a file

 Please choose one (* W *):  a

           Begin (*   1   *) ?  2393

             End (* 11375 *) ?  2615

           Reverse (* No *)  ?

 That range begins GCTCC and ends TCAAG. Is this correct (* Yes *) ?

 That is done, now would you like to:

     A)dd another segment from this sequence

     G)et segments from another sequence

     W)rite out this assembly to file

 Please choose one (* W *):  a

           Begin (*   1   *) ?  3502

             End (* 11375 *) ?  3630

           Reverse (* No *)  ?

 That range begins CTCCT and ends ACTGA.  Is this correct (* Yes *) ?

 That is done, now would you like to:

     A)dd another segment from this sequence

     G)et segments from another sequence

     W)rite out this assembly to file

 Please choose one (* W *):

 What should I call the output file (* gamma.seg *) ?

OUTPUT FILE

[ Previous | Top | Next ]

Here is some of the output file:

!!NA_SEQUENCE 1.0

 ASSEMBLE    October 5, 1998 10:13

Symbols:     1 to: 92    from: gamma.seq         ck: 6474,  2179 to: 2270

Symbols:    93 to: 315   from: gamma.seq         ck: 6474,  2393 to: 2615

Symbols:   316 to: 444   from: gamma.seq         ck: 6474,  3502 to: 3630

Human fetal beta globins G and A gamma

from Shen, Slightom and Smithies,  Cell 26; 191-203.

Analyzed by Smithies et al. Cell 26; 345-353.

gamma.seg  Length: 444  October 5, 1998 10:14  Type: N  Check: 2906  ..

       1  ATGGGTCATT TCACAGAGGA GGACAAGGCT ACTATCACAA GCCTGTGGGG

      51  CAAGGTGAAT GTGGAAGATG CTGGAGGAGA AACCCTGGGA AGGCTCCTGG

     ///////////////////////////////////////////////////////////

     351  CCATTTCGGC AAAGAATTCA CCCCTGAGGT GCAGGCTTCC TGGCAGAAGA

     401  TGGTGACTGG AGTGGCCAGT GCCCTGTCCT CCAGATACCA CTGA

RELATED PROGRAMS

[ Previous | Top | Next ]

Reformat puts a sequence file that has been modified with a text editor into GCG sequence file format.

INPUT FILES

[ Previous | Top | Next ]

Assemble accepts multiple (one or more) nucleotide or protein sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for exampleproject.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenBank:*.

Assemble does not check the type of sequence in each segment being concatenated so it would allow you to concatenate peptide and nucleotide sequences together without complaining. Embedded comments from the input segments are lost in the output sequence.

Single Sequence Input

If you choose a single sequence on the command line or in response to the first program prompt, Assemble prompts you for the sequence range and strand. After processing that sequence segment, the program allows you to choose another sequence. You can continue to choose single sequences as input until you decide to write out the entire assembly (interactive mode).

Multiple Sequence Input

You can specify multiple sequences on the command line or in response to the first program prompt. Assemble then will process all sequences and write out the entire assembly without prompting you for the range and strand of each sequence (non-interactive mode).

If you use a list file to specify multiple sequences as input, you can add begin,end, and strand sequence attributes to specify the range and strand for each sequence. You can use the join sequence attribute to create different assemblies from the sequences in a single list file. All sequences listed contiguously in the list file that share the same join attribute (i.e. share the same sequence name following the join token) are concatenated into a single assembly and the assembly is named after the sequence name following the join attribute. All sequences listed contiguously in the list file that do not have any join sequence attribute are concatenated into a single assembly and the assembly is named after the last input sequence in the assembly. Here is an example of an input list file, dros_cds.list, for Assemble:

!!SEQUENCE_LIST 1.0

Example list file of coding sequences from D. melanagaster used as input for ASSEMBLE

First 3 exons are for the transformer gene.

Next 4 exons are for the glucose-6-phosphate dehydrogenase gene.

Last 2 exons are for the metallothionein gene.

..

Gb_In:Drotga   Begin:  271  End:  310

Gb_In:Drotga   Begin:  559  End:  962

Gb_In:Drotga   Begin: 1020  End: 1169

Gb_In:M26673   Begin:  438  End:  454  Join: g6pd_drome

Gb_In:M26674   Begin:   53  End:  316  Join: g6pd_drome

Gb_In:M26674   Begin:  377  End:  592  Join: g6pd_drome

Gb_In:M26674   Begin:  655  End: 1729  Join: g6pd_drome

Gb_In:Drometx  Begin:  498  End:  519

Gb_In:Drometx  Begin:  862  End:  962

Using this file as input, Assemble writes three output files. The first file, called drotga.seg, contains the assembly from the first three sequence segments in the list file. The second output file, g6pd_drome.seg, contains the assembly from the next four sequence segments. The third output file, drometx.seg, contains the assembly from the last two sequence segments in the list file. For more information about list files, see "Using List Files" in Section 2, Using Sequence Files and Databases in the User's Guide.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

Minimal Syntax: % assemble [-INfile=]@transformer.list -Default

Prompted Parameters:

[-OUTfile=]drotga.seg     sets the output file name (single seq. output only)

Local Data Files: None

Optional Parameters:

-BEGin=1 -END=100         sets the range of interest for each sequence

-REVerse                  specifies the strand for each sequence

-NOJOIN                   ignores join operators in list file

-LIStfile[=assemble.list] writes a list file of output sequence names

-NOMONitor                suppresses the screen monitor

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-BEGin=1

Sets the beginning position for all input sequences. When the beginning position is set from the command line, Assemble ignores beginning positions specified for individual sequences in a list file.

-END=100

Sets the ending position for all input sequences. When the ending position is set from the command line, Assemble ignores ending positions specified for sequences in a list file.

-REVerse

Sets the program to use the reverse strand for each input sequence. When -REVerse or -NOREVerse is on the command line, Assemble ignores any strand designation for individual sequences in a list file.

-NOJOIN

Sets Assemble to ignore all join sequence attributes specified in the input list file. All sequence segments specified in the list file are assembled into a single output sequence file.

-LIStfile=assemble.list

Writes a list file with the names of the output sequence files. This list file is suitable for input to other Accelrys GCG (GCG) programs that support list files (see Section 2, Using Sequence Files and Databases in the User's Guide.) If you don't specify a file name, then Assemble makes one up using assemble for the file name and .list for the file name extension.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

Printed: May 27, 2005 11:41

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.