Fragment Assembly System
[ Program Manual |
User's Guide |Data Files |
Databases ]
IntroductionThe Fragment Assembly System (FAS)
is a series of programs that help you assemble the
overlapping
fragment sequences from a sequencing
project. Specifically, the FAS
enables you to: 1) store fragment
sequences; 2) recognize overlapping sequences and
create aligned assemblies, called contigs;
and 3) display
and edit the contigs.
The sequence data for a
sequencing project are maintained and
manipulated in a sequencing project
database.
Using the six programs of
the FAS, you create and
maintain a separate project database
for each sequencing
project. After setting the project names and other logical names using GelProject you create project
databases with GelStart, enter fragment
sequences with GelEnter, assemblecontigs with GelMerge, and edit
assembled contigs with GelAssemble.
As you enter new sequence
data with GelEnter and run GelMerge and
GelAssemble repeatedly, your project evolves
from a collection of short
sequence fragments into a final
contig that represents the entire
underlying genomic sequence from which
all
fragments in the project were
derived. At any time,
you can create a schematic
display of the current state of your project database with GelView.
If you want to
break up the contigs in
a sequencing project,
GelDisassemble recreates the database as
single, unassembled fragments.
GelProject
You need to set the logical names (TempDisk, GelCommand and GelData) in $HOME/.wp/dirs.conf as the first step before working on GelStart. GelProject prompts you to input project name. Please note that you do not use spaces in the project name.
GelStart
Use GelStart to create a
new project database for each
sequencing project. Use the same project name as defined in GelProject. For each
new project, GelStart
creates a new directory, named
after the project, as a
subdirectory of your current working
directory. For
example, if you create a
new sequencing project named myproject,
a new file named myproject
appears in your
current working directory.
You must also run GelStart
each time you want to
begin work on an existing
project to lock in the
appropriate
sequencing project database. Once
a database is locked in,
all Fragment Assembly programs work
on the
data in the corresponding sequencing
project. The FAS remains
locked in to the same
project database until
you either run GelStart again
to lock in a different
project or until you log
off from the computer.
GelEnter
After locking in a sequencing
project database with GelStart, use
GelEnter to enter fragment sequences
into
the project database. GelEnter
is a sequence editor that
accepts sequence data from: 1) a
terminal keyboard;
2) a digitizer; or 3) existing sequence
files. You can enter
new sequences at any time;
they do not all have
to
be entered when you first
create the project. Once
you enter sequences into a
project database, you can no
longer edit them with GelEnter.
You can edit the
sequences later with GelAssemble.
GelMerge
After entering sequences into a
project database, use GelMerge to
assemble contigs of aligned sequences
from
the overlapping fragments in the
sequencing project. GelMerge automatically
recognizes overlaps among all
of the sequences in a
project database and creates aligned
assemblies, called contigs, from the
overlapping
sequences. These contigs are
stored in the project database.
As you add new
sequences that connect separate
contigs to the project database,
GelMerge aligns the contigs into
larger assemblies. GelMerge can
also
automatically remove vector sequences from
the individual fragment sequences.
GelAssemble
After assembling contigs with GelMerge,
use the contig editor, GelAssemble,
to review and modify the
alignments. After choosing a
contig for review, GelAssemble lets
you edit the individual sequences
in that
contig to resolve inconsistencies.
GelAssemble creates a consensus sequence
that uses the IUB nucleotide
ambiguity codes (see Appendix III of
this manual). You can
modify a sequence and change
the alignment in
the same way you edit
text with a text editor.
Although GelMerge assembles and
aligns contigs
automatically, you can assemble contigs
manually using GelAssemble. For
example, you could manually
assemble separate contigs that do
not share sufficient overlap for
GelMerge to assemble automatically.
You
can also separate fragments from
a contig if you believe
they should not be included.
Once you are satisfied with
a contig, you can store
it in the sequencing project
database.
GelView
GelView displays bar diagrams that
show the overlaps among the
fragments in each contig, providing
a
schematic view of the whole
sequencing project.
GelDisassemble
GelDisassemble breaks up the contigs
in a sequencing project, thus
recreating the database as a
collection of
single fragments.
Structure of a Fragment Assembly
Database
You do not have to
understand the structure of a
fragment assembly database to successfully
use the FAS.
All of the programs access
and manipulate the project database
in a manner that is
transparent to you. This
description of the database is,
therefore, just for those who
want to know more about
the Fragment Assembly
System.
The data in the FAS
are stored as text files
in a group of subdirectories.
This makes the database
vulnerable
to corruption since you can
edit, delete, and rename any
of the files in the
database with UNIX commands.
Use GelMerge and GelAssemble to
modify these files. Do
not manipulate any file in
the database with a text
editor!
A fragment assembly database consists
of a command directory, with
the same name as your
project, and four
data subdirectories below it: archive,
working, consensus, and relation.
FAS stores the data for
each
fragment in separate files in
each of these subdirectories.
A newly-entered fragment becomes a
new contig
before it is assembled, and
is represented by a new
file in each of these
four subdirectories.
-
- The archive directory stores the
original fragment sequences that you
entered into the database with
GelEnter. The FAS never
modifies the files in this
directory.
- The working directory contains the
same fragment sequences as the
archive, but with all of
the gap
insertions and edits that were
made to assemble the fragments
into contigs.
- The consensus directory has a
consensus sequence file for each
contig in the project database.
Each
contig is named after the
left-most fragment in the alignment.
Newly-entered fragments and other
unassembled fragments are also considered
contigs and have a consensus
sequence in the consensus
directory. Because they do
not yet align with any
other contig, they are called
contigs-of-one or
single-fragment contigs.
- The relation directory contains a
file for each contig that
lists the orientation, position, and
length of
each fragment in the contig.
In addition to these subdirectories,
the command directory also contains
a copy of each cloning
vector
specified in GelStart as well
as command initializing files for
GelEnter, GelMerge, and GelAssemble.
Acknowledgement
Dr. Roger Staden's pioneering work remains
the basis of all work
on fragment assembly. GelAssemble
comes
from the MSE editor written
by Dr. William Gilbert. Irv
Edelman developed the method of
fragment
assembly implemented in GelMerge.
We are very grateful to
those of you who have
taken the time to learn
the system and give us
useful
suggestions. We appreciate your
time and hope that implementing
your suggestions expresses our gratitude.
-
Printed: January 9, 2002 13:45 (1162)
[ Program Manual |
User's Guide |
Data Files |
Databases ]
Technical Support: support-us@accelrys.com
or support-eu@accelrys.com
Copyright (c) 1982-2002 Accelrys Inc.
A subsidiary of Pharmacopeia, Inc. All rights reserved.
Licenses and Trademarks Wisconsin
Package is a trademark and GCG and the
GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may
be trademarks, and if so, are trademarks or registered trademarks of
their respective holders and are used in this documentation for
identification purposes only.
www.accelrys.com/bio