What's New in Version 10.0

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Table of Contents

New Programs

Enhancements
Program Enhancements
SeqLab Enhancements
Package-Wide Enhancements
User Documentation Enhancements

Bug Fixes
Program Bug Fixes
SeqLab Bug Fixes
Package-Wide Bug Fix
User Documentation Bug Fix

Programs Removed from GCG  


New Programs

[ Top | Next ]

The programs listed below are new to version 10.0 of the Accelrys GCG (GCG).

Database Searching

New additions from Dr. William Pearson's FastA Version 3.0 family

SSearch searches sequence databases for similar sequences by means of a Smith-Waterman local alignment search. This is a rigorous search capable of identifying very weak similarity but is very CPU intensive. Although it has been optimized to run on multi-processor computers, it is one of the slower search programs available.

FastX searches a protein query sequence against a DNA database. FastX allows for matches across multiple reading frames and therefore is more sensitive than simply searching against a six-frame translation of the query sequence. It has been optimized to run on multi-processor computers.

TFastX searches a DNA query against a protein database. Like FastX it allows for matches across multiple reading frames and is useful when searching EST data or other lower quality DNA databases. It has been optimized to run on multi-processor computers.

Network retrieval of database entries from NCBI

NetFetch remotely retrieves sequences from NCBI. It accepts either a NetBLAST output file or a sequence accession number and saves the returned sequence(s) in an RSF file.

Gene Finding and Pattern Recognition

Motif recognition and searching

MEME accepts a set of unaligned sequences (DNA or protein) and identifies common motifs within them. These motifs are used to generate gapless profiles. You can use these profiles with MotifSearch to search other sequences for occurrences of these motifs. MEME is licensed from Dr. Timothy Bailey and the San Diego Supercomputing Center.

MotifSearch accepts one or more gapless profiles created by MEME and searches a database for sequences that match them. Because MotifSearch searches with gapless profiles and does not use a full dynamic programming comparison, it is considerably faster than ProfileSearch.

Editing and Publication

Publication Graphics for Multiple Sequence Alignments

PrettyBox accepts a multiple sequence alignment and creates a PostScript file containing output in the style of the program Pretty. PrettyBox adds grayscale highlighting based on sequence similarity. The output can then be printed to any PostScript-compatible printer. PrettyBox is not available within SeqLab.


Program Enhancements

[ Previous | Top | Next ]

Comparison

BestFit

Enhancement: When shuffling sequences using the -RANdomizations (SeqLab: "Generate statistics from randomized alignments") parameter in BestFit, you can now preserve the di- or tri-nucleotide sequence composition of the original sequence by specifying either -PREServe=2 or -PREServe=3 (SeqLab: "Randomized alignment preserving").

FrameAlign

Enhancement: FrameAlign now supports -INFRame (SeqLab: "Add gaps to restore the correct reading frame after frameshifts"). This parameter adds additional gaps to the alignment so that the aligned DNA sequence can be translated without frame shifts.

Enhancement: FrameAlign now supports -PENAlizedlength (SeqLab: "Don't penalize gap extensions longer than"). Using this parameter, alignments can contain large gaps without incurring large gap extension penalties. For instance, if you specify -PENAlizedlength=12, any gap longer than 12 is penalized the same as a gap of length 12. This parameter may be useful if you are aligning a protein sequence with the corresponding genomic DNA sequence containing large introns.

Gap

Enhancement: When shuffling sequences using the -RANdomizations (SeqLab: "Generate statistics from randomized alignments") parameter in Gap, you can now preserve the di- or tri-nucleotide sequence composition of the original sequence by specifying either -PREServe=2 or -PREServe=3 (SeqLab: "Randomized alignment preserving").

ProfileGap

Enhancement: ProfileGap no longer supports the -AVErage parameter. Studies have determined that this tended to degrade the performance of alignments. This is the same as specifying -NOAverage in Version 9.1.

Database Searching

BLAST

Enhancement: BLAST now uses the NCBI BLAST 2.0 algorithm. Some of the new capabilities of BLAST include gapped alignments, improvements to the statistics, and performance enhancements. The results generated from NCBI BLAST 2.0 will differ from those generated with NCBI BLAST 1.4 (as was in GCG version 10.1). This is a major update that includes new parameters and a new database format.

Enhancement: Like many other programs in Version 10.0, BLAST now runs in multi-threaded mode with the -PROCessors= (SeqLab: "Number of processors to use for the search") parameter. This capability provides faster performance on computers with more than one processor. In general, we have seen the best improvements in the protein-to-DNA, DNA-to-protein, and translated DNA-to-DNA searches. To determine if your computer has more than one processor, ask your system manager.

Enhancement: BLAST now adds the list file attributes Begin and End to all output list files. These Begin and End attributes specify the locations of the alignments found in the subject (database) sequence. You can disable this parameter using -NOFRAGments (SeqLab: "Save begin/end attributes in the result file").

Enhancement: When running BLAST you can now specify multiple query sequences, as well as multiple databases, in a single search.

Limitation: Due to NCBI restrictions, BLAST supports fewer symbol comparison matrices in Version 10.0 than in previous versions. The available matrices are BLOSUM45, BLOSUM62, BLOSUM80, PAM30, and PAM70. You cannot add new symbol comparison tables to use with BLAST.

Limitation: Version 9.1 data does not work with BLAST in Version 10.0 because of the different database format used by NCBI's Gapped BLAST 2.0. To use the new BLAST with Version 10.0, you must rebuild your old BLAST databases.

NetBLAST

Enhancement: NetBLAST now supports the -BATch parameter.

FastA / TFastA

Enhancement: FastA and TFastA are now based on Dr. William Pearson's FastA 3.0 package. There have been many fundamental changes to the statistics, general operation of the programs, and program output. Due to these changes, some statistical anomalies present in Version 9.1 are no longer present in Version 10.0.

Enhancement: Like many other programs in Version 10.0, FastA and TFastA now run in multi-threaded mode with the -PROCessor= (SeqLab: "Number of processors to use for the search") parameter. If you are running GCG on a computer with more than one processor, this parameter may greatly reduce your search time. To determine if your computer has more than one processor, ask your system manager.

Limitation: FastA and TFastA no longer add the list file attributes Begin and End to their output files. This limitation will be corrected in a future release.

FrameSearch

Enhancement: Like many other programs in Version 10.0, FrameSearch now runs in multi-threaded mode with the -PROCessor= (SeqLab: "Number of processors to use for the search") parameter. If you are running GCG on a computer with more than one processor, this parameter may greatly reduce your search time. To determine if your computer has more than one processor, ask your system manager.

Enhancement: FrameSearch now supports -INFRame (SeqLab: "Add gaps to restore the correct reading frame after frameshifts"). This parameter adds additional gaps to the alignment so that the DNA sequence can be translated without frame shifts.

Enhancement: FrameSearch now supports the -PENAlizedlength (SeqLab: "Don't penalize gap extensions longer than") parameter. Using this parameter, alignments can contain large gaps without incurring large gap extension penalties. For instance, if you specify -PENAlizedlength=12, any gap longer than 12 is penalized the same as a gap of length 12. This may be useful if you are aligning a protein sequence with the corresponding genomic DNA sequence containing large introns.

ProfileSearch

Enhancement: ProfileSearch can now search any number of database sequences. ProfileSearch calculates its statistics from a subset of the database sequences read and extrapolates onto the entire search set in a manner similar to FastA.

Enhancement: ProfileSearch no longer supports the -AVErage parameter. Studies have determined that this tended to degrade the performance of searches. This is the same as specifying -NOAverage in Version 9.1.

ProfileSegments

Enhancement: ProfileSegments no longer supports the -AVErage parameter. Studies have determined that this tended to degrade the performance of alignments. This is the same as specifying -NOAverage in Version 9.1.

DNA / RNA Secondary Structure

Enhancement: MFold now matches Dr. Michael Zuker's MFold Version 2.3. This includes separate energy tables for terminal mismatched bases in hairpin and interior loops. Also, the stacking tables have energies for non-canonical base pairs. The remaining energy tables have also been updated.

Enhancement: MFold can now predict single-stranded DNA folding. Use the DNA option to treat your input sequence as DNA rather than RNA. The energy tables for DNA folding are based on the work of Dr. John Santa Luccia.


SeqLab Enhancements

[ Previous | Top | Next ]

Enhancement: SeqLab can now display output from several GCG programs as annotated features in the Editor. These programs create features by means of the -RSF (SeqLab: "Reformat sequences into an RSF output file") parameter, which creates an RSF file containing sequences that are annotated with analysis results. The following programs create feature results: CoilScan, FindPatterns, FrameSearch, HTHScan, Map, Motifs, MotifSearch, PeptideMap, PeptideStructure, Prime, and SPScan.

Enhancement: You can now change SeqLab's fonts directly from SeqLab within Options->Preferences->Fonts. Note that some modifications in font size, such as changing from very small to very large fonts may have layout problems until you restart SeqLab.

Also note that this new font capability replaces the -SMAll and -LARge parameters, which are no longer available.

Enhancement: Instead of scrolling horizontally to view sequence data, you can now display sequences in "wrapped" or multi-line mode in the SeqLab Editor. A toggle button within the Editor labeled "Wrap" controls this. When wrapped, the horizontal scroll bar is disabled, and scrolling is accomplished by means of the vertical scroll bar only.

Enhancement: SeqLab displays a ruler under the last sequence loaded in the Editor. You can also insert a ruler line anywhere in a multiple sequence alignment by choosing File->New Sequence->Ruler.

Enhancement: The SeqLab Editor display properties were moved from the Preferences->Options dialog box to the main Editor window. There you will find the "Invert" toggle button, which controls reverse video, and a menu button labeled "Insert/Check/Overstrike," which controls the keyboard mode.

Enhancement: You can now specify a fill pattern for features by editing a feature from the SeqLab's Features window or by specifying a fill pattern in feature.cols located in GenRunData. If you choose to edit feature.cols, copy the file into your login directory before editing it. Fill patterns let you make features that appear semi-transparent, allowing you to see one feature through another.

Enhancement: You can now add list files directly from a directory to a search set when using search tools (such as FastA, FrameSearch, ProfileSearch, and related programs). Previously, you had to add them to your working list before you could add them to a search set.


Package-Wide Enhancements

[ Previous | Top | Next ]

Graphics Drivers

PNG

Portable Network Graphics (PNG) is a new graphics driver useful for creating graphics files compatible with most current web browsers. PNG files have many of the useful compression properties that are available with the GIF(TM) format, but does not require special licensing.

GIF

Enhancement: The GIF driver now supports three new parameters, -GIFWidth (SeqLab: "PNG or GIF image width"), -GIFHeight (SeqLab: "PNG or GIF image height"), and -GIFInterlace (SeqLab: not available). Previously, the GIF driver looked only for the symbols PlotWidth, PlotHeight, and GIFInterlace to control the output geometry.

Scoring Matrices

Enhancement: The default gap extension penalty was changed from 4 to 2 based on conversations with Dr. William Pearson.


User Documentation Enhancements

[ Previous | Top | Next ]

GCG now supplies its user documentation in Portable Document Format (PDF) files via FTP to licensed sites. You can view and print these files using the Adober Acrobatr Reader, a free multi-platform software downloadable from Adobe's web site (www.adobe.com).

Using the PDF files, you can produce high-quality copies of the following documentation: GCG User's Guide, SeqLab Guide, SeqLab Tutorial, Program Manual, and SeqWeb Guide. You can also purchase binders and tabs from GCG to make your locally-produced documents more appealing.

Contact GCG Customer Relations at (608) 231-5200 for a password to access the PDF files via our FTP server.

Note With the availability of PDF files, GCG is no longer offering WisPPr -- GCG Printing Files Service.


Program Bug Fixes

[ Previous | Top | Next ]

Database Searching

NetBLAST

Problem: If you ran NetBLAST simultaneously from two different login sessions (or Telnet windows), the second job overwrote a temporary file needed by the first, and the first job failed.

Update: Now you can safely run multiple NetBLAST jobs at the same time.

Problem: If you put NetBLAST in the background, NetBLAST failed to complete.

Update: Now NetBLAST works when you use <Ctrl>Z to suspend execution, and bg to place the job in the background.

Problem: The smallest -EXPect (SeqLab: "Ignore hits that might occur more than how many times by chance alone") value you could specify was 0.001. If you specified a smaller value, then it was treated as 0.0.

Update: Now you specify an -EXPect value as low as 0.0001, which is consistent with NCBI remote BLAST via the web.

Problem: If you ran NetBLAST from SeqLab on a DNA sequence, you could not use -FILter=dust (SeqLab: "Filter out: low complexity DNA regions").

Update: Now you can use -FILter=dust with NetBLAST in SeqLab.

FrameSearch

Problem: If you ran FrameSearch on a single query sequence and used -Default, the default figure output was named "framesearch.figure," instead of query_name.figure as documented.

Update: Now FrameSearch saves figure files as documented.

Problem: If you ran FrameSearch on multiple query sequences, the graphic output was not automatically sent to Figure files as documented; instead, the program tried to send graphics to your current graphics output device.

Update: Now FrameSearch automatically sends graphics output to Figure files when run with multiple query sequences.

FindPatterns

Problem: In some instances, short patterns might not find matches to the last character of a sequence. For example, the pattern (G,A) (that is, "G" or "A") did not find the second match within the sequence "GA." This problem occurred only if the pattern itself resolved to a single character.

Update: Now short patterns can match the last character of a sequence.

DNA / RNA Secondary Structure

Problem: When running MFold from SeqLab, you could not save the MFold output file. You had to rerun MFold to display the output with different display options in PlotFold.

Update: Now you can save this MFold output file within SeqLab and can run PlotFold multiple times without rerunning MFold.

Editing and Publication

Assemble

Problem: If you ran Assemble simultaneously from two different login sessions (or Telnet windows), the second job overwrote a temporary file needed by the first, and the first job failed.

Update: Now you can safely run multiple Assemble jobs simultaneously.

Evolution

GrowTree

Problem: In some circumstances, GrowTree created very short tree plots, making it difficult to distinguish different branch lengths.

Update: Now GrowTree now consistently scales the plotted tree to fill the plotter page/window.

PAUPDisplay

Problem: In some circumstances, PAUPDisplay created very short tree plots, making it difficult to distinguish different branch lengths.

Update: Now PAUPDisplay now consistently scales the plotted tree to fill the plotter page/window.

Fragment Assembly

GelAssemble

Problem: In some situations, GelMerge created a contig that was larger than 100,000 by merging two contigs whose individual sizes were less than 100,000. GelAssemble, however, could not edit these contigs.

Update: Now GelAssemble can handle contigs as large as 200,000 bases.

Gene Finding and Pattern Recognition

CodonFrequency

Problem: If you ran CodonFrequency simultaneously from two different login sessions (or Telnet windows), the second job overwrote a temporary file needed by the first, and the first job failed.

Update: Now you can safely run multiple CodonFrequency jobs simultaneously.

Importing and Exporting

BreakUp

Problem: BreakUp did not work on a sequence file larger than 2 MB.

Update: Now Breakup now works with arbitrarily large files, with the only limiting factor being the amount of memory available.

Protein Analysis

HTHScan

Problem: If you used the Korn shell, HTHScan did not work.

Update: You can use Korn shell or C shell to run HTHScan in Version 10.0.

PeptideSort

Problem: The extinction coefficient as calculated in PeptideSort was slightly incorrect, due to an error in the Gill and von Hippel paper. This error was corrected in an erratum published in Analytical Biochemistry 189: 283 (1990).

Update: PeptideSort has been updated to correct the calculation.

Isoelectric

Problem: For large proteins, the charge values for each amino acid in the text output report were not separated by spaces, resulting in a formatting problem. There were no problems with the actual calculation of the charge values.

Update: Now Isoelectric reports charge values to the nearest 0.1 instead of the nearest 0.01 so that the numerical values do not run together.

Moment

Problem: If you specified -NCONtours=10 (SeqLab: "Number of contours to plot"), Moment sometimes did not run to completion.

Update: Now you can specify -NCONtours=10, and the program runs to completion.

Translation

Translate

Problem: If you ran Translate simultaneously from two different login sessions (or Telnet windows), the second job overwrote a temporary file needed by the first, and the first job failed.

Update: Now you can safely run multiple Translate jobs at the same time.

Utilities

Simplify

Problem: If you used Simplify on a DNA sequence, Simplify created a protein sequence as output.

Update: Now if you use Simplify on a DNA sequence, it creates a DNA sequence as output.

Reformat

Problem: When reformatting a sequence to an RSF file using the -RSF (SeqLab: "Reformat sequences into an RSF output file") parameter, all reference information was lost from the original sequence.

Update: Now Reformat copies the reference information into the comments section of the RSF file and (if appropriate) converts them into sequence features.


SeqLab Bug Fixes

[ Previous | Top | Next ]

Problem: When printing from the SeqLab Editor, you could control the number of characters being printed on the width of the page. However, the width measurement you provided took into account the sequence name as well as the number of columns of sequence residues. Because the length of the sequence name was taken into consideration, it was difficult to get printouts of exactly 60 columns of sequence text.

Update: Now when printing from the SeqLab Editor, you control exactly how many columns of sequence text are displayed. You can also specify the left margin of your printout.

Problem: When printing from the SeqLab Editor, if your display fits on a single page, then SeqLab created an Encapsulated PostScript (EPS) file. You primarily use EPS files for importing graphics into other software. However, this file did not print on many PostScript-compatible printers.

Update: Now SeqLab uses EPS only when the new EPS toggle is selected. This toggle becomes active when printing a single page to a PostScript file and not directly to a printer.

Problem: If you created alternative output files when running programs from SeqLab (for example, the option to create a paired output file from GapShow), the default name of the output file was derived from the name of the selected sequence. On subsequent runs during the same session, the first sequence name was reused even if you selected a new sequence.

Update: Now the default name is reset for each run and always matches the name of the selected sequence.

Problem: SeqLab did not allow two sequences to have the same name in the Editor. When you attempted to load a second sequence with the same name, SeqLab renamed the original sequence by appending a number to the end of the name, and then loaded the second copy with the original name.

Update: Now SeqLab leaves the original sequence name intact and modifies the name of the second sequence by appending a number to the end of the name.

Problem: In some circumstances, if you attempted to view a sequence's attributes, SeqLab incorrectly displayed a message indicating you modified attributes. This happened in Main List mode, either by going to Edit->Sequence Attributes or by double clicking on the sequence itself and then immediately clicking OK.

Update: Now no message appears unless you made a change to the sequence's attributes.

Problem: If you selected the Overstrike option in the SeqLab Editor, you could not overstrike the first character in a sequence.

Update: When you select the Overstrike option, you can now overstrike the first and subsequent characters in a sequence.

Problem: If you loaded a large number of single sequence files at one time (for example 200) into the Editor, SeqLab could not access your computer's hard disk to save or load files. This did not occur when working with database sequences files, RSF, or MSF files.

Update: Now you can load large numbers of single sequence files without problem.

Problem: When using the Database Browser on a sequence from a division with a name longer than 12 characters (SP_Unclassified for example), SeqLab sometimes stopped working.

Update: SeqLab now works with databases and database divisions with names of any length.

Problem: If you edited a sequence feature in the Editor and then switched to Main List mode, SeqLab did not prompt you to save your changes, and the edited feature was lost. This was not a problem if you had edited the sequence.

Update: Now SeqLab recognizes the modified feature and prompts you to save your changes.

Problem: If you selected a Mask sequence in the Editor and switched to Main List mode, SeqLab applied the mask to the other sequences when you saved the file.

Update: Now SeqLab correctly saves sequences when you switch between modes, even when a mask sequence is selected.

Problem: You could not expand more than five RSF files within SeqLab's Main List mode.

Update: Now you can expand any number of RSF files in the Main List.

Known Bug: (Solaris only) On some X servers, the SeqLab Editor uses a proportionally-spaced font instead of a fixed-width font, which results in gaps appearing in the text display. This problem has been seen when you select the font named "application (dt)" in the Editor, but may occur in other situations as well.

Work Around: From the Options menu, select Preferences->Fonts. Find the "Edit Mode" scroll menu toward the bottom of the dialog box, and choose any font other than "application (dt)."


Package-Wide Bug Fix

[ Previous | Top | Next ]

Scoring Matrices

Problem: The default gap creation penalty in the BLOSUM62 scoring matrix (12) was incorrect.

Update: The default gap creation penalty in the BLOSUM62 scoring matrix has been changed from 12 to 8. This change was prompted by a re-reading of the Henikoff and Henikoff paper where their gap creation penalty of 12 includes the initial gap extension penalty. Since the initial gap extension penalty is separately added to the gap creation penalty in GCG alignment implementations (Gap, BestFit, and PileUp), our gap creation penalty had been made equal to their gap creation penalty (12) minus their gap extension penalty (4). You should note that programs in the FastA family (FastA, TFastA, FastX, TFastX, and SSearch) use built-in gap penalties and not the penalties listed in the scoring matrix.


User Documentation Bug Fix

[ Previous | Top | Next ]

Known Bug: The printed version of the Program Manual does not include the "Version 2.0 Profiles" section at the end of Appendix VII. That section describes the format of the profiles created by the program MEME and processed by the program MotifSearch.

Work Around: See the online help for Appendix VII; it includes the "Version 2.0 Profiles" section. Or, type the following commands to print an updated version of Appendix VII, which includes the "Version 2.0 Profiles" section.

% to Program_Manual
% red appendix_vii.red


Programs Removed from GCG

[ Previous | Top ]

Program

Reason for Removal

GENESEQToGCG

GENESEQ is now distributed in GCG format, and the older GENESEQ format is no longer valid.

FoldRNA

Superceded by MFold

Squiggles

Incorporated within PlotFold

Circles

Incorporated within PlotFold

Domes

Incorporated within PlotFold

Mountains

Incorporated within PlotFold


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio