What's New in Version 10.2

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Table of Contents

New Programs

Enhancements
Program Enhancements
Package-Wide Enhancements

Bug Fixes
Program Bug Fixes
SeqLab Bug Fixes
Package-Wide Bug Fixes
User Documentation Bug Fix


New Programs

[ Top | Next ]

The programs listed below are new to version 10.2 of Accelrys GCG (GCG).

Multiple Sequence Comparison

Integration of Dr. Sean Eddy's HMMER software

HmmerAlign aligns one or more sequences to a profile Hidden Markov Model (HMM). This provides a very efficient way to align a large number of sequences. To do so, first create a "seed" alignment from a small number of representative sequences with PileUp, for example. Then use HmmerBuild to create a profile HMM from this small alignment. Then, use HmmerAlign to quickly align any number of additional sequences to the profile HMM, representing the "seed" alignment.

HmmerBuild creates a profile HMM from a group of aligned sequences. A profile HMM is a statistical model of the consensus of a multiple sequence alignment. It can be used for sensitive database searching, quickly aligning large numbers of sequences, and emitting random sequences that match the profile HMM.

Database Searching

Integration of Dr. Sean Eddy's HMMER software

HmmerCalibrate calibrates a profile HMM. This is important when using the profile HMM for database searching, since a calibrated profile HMM will make the search more sensitive. It has been optimized to run on multi-processor computers.

HmmerPfam searches a query sequence against a profile HMM database to identify known domains in the query sequence. The searched profile HMM database can be either public, such as Pfam, or a profile HMM database that you build. It has been optimized to run on multi-processor computers.

HmmerSearch searches a profile HMM against a sequence database to find sequences similar to the sequence family represented by the profile HMM. This is a compute-intensive search capable of finding a distantly related homolog. Although it has been optimized to run on multi-processor computers, it is one of the slower search programs in GCG.

Primer Selection

PrimePair determines the compatibility of primers with each other, independent of a template sequence. You can provide the primers in files, or you can type them in when you run the program.

MeltTemp computes the melting temperature of oligonucleotides. You can provide the oligos in a file or type them in when you run the program.

Sequence Utilities

Integration of Dr. Sean Eddy's HMMER software

HmmerEmit randomly generates sequences that match a profile HMM or creates a single majority-rule consensus sequence for a profile HMM.

Miscellaneous Utilities

Integration of Dr. Sean Eddy's HMMER software

HmmerConvert converts profile HMMs into different file formats. HmmerConvert can convert between binary and ASCII profile HMMs, add a profile HMM to a profile HMM database, and change a profile HMM into a Gribskov profile.

HmmerFetch retrieves a single profile HMM from a profile HMM database, such as Pfam. The profile HMM database needs to be indexed by HmmerIndex.

HmmerIndex creates an index for a profile HMM database so that profile HMMs can be retrieved from the database with HmmerFetch.


Program Enhancements

[ Previous | Top | Next ]

Database Searching

BLAST

Enhancement: BLAST now uses NCBI's BLAST version 2.1.3. GCG version 10.0 shipped with NCBI's BLAST version 2.0.5. GCG version 10.1 shipped with NCBI's BLAST version 2.0.10.

Enhancement: Two new translation tables can now be used with the parameters -TRANSlate and -DBTRANSlate. The new tables are 22 (Scenedesmus obliquus mitochondrial code) and 23 (Thraustochytrium mitochondrial code).

Enhancement: The parameter -VIEW=7 can be used to generate a BLAST report in XML format.

NetBLAST

Enhancement: Two new translation tables can now be used with the parameter -TRANSlate. The new tables are 22 (Scenedesmus obliquus mitochondrial code) and 23 (Thraustochytrium mitochondrial code).

FindPatterns

Enhancement: A new command line parameter, -LIStfile, can now be used to produce list file output, in addition to the normal output file. This parameter replaces -NAMes, which will be removed in a future release. To ensure compatibility with earlier versions, -LIStfile has no effect if -NAMes is also specified.

Motifs

Enhancement: A new command line parameter, -LIStfile, can now be used to produce list file output, in addition to the normal output file. This parameter replaces -NAMes, which will be removed in a future release. To ensure compatibility with earlier versions, -LIStfile has no effect if -NAMes is also specified.

LookUp

Enhancement: Lookup now searches up to 24 libraries. Previously, Lookup was limited to searching no more than 15 libraries, which could be a problem for sites that incorporated their own databases into GCG. When the number of libraries is above 15, the "quit" item will be moved to "z" to allow for more room in the library listing.

Enhancement: LookUp now supports NCBI's RefSeq database.

DNA/RNA Secondary Structure

MFold, OldMFold

Enhancement: The units kcal/mole are now displayed whenever minimum energy values appear in prompts or in the output.

Importing and Exporting

FromTrace

Enhancement: Previously, FromTrace would terminate with the error message "Error: failed to read 0" when you ran it on a large number of trace files including an erroneous file. Now FromTrace prints a more informative error message that includes the name of the bad file, "Error in FILE: failed to read0", and then continues to translate the rest of the trace files.

Enhancement: The FromTrace output file now includes the trace file version number.

Primer Selection

Prime

Enhancement: Prime now supports -PRIMERSF (SeqLab: "Select forward primers from file") and -PRIMERSR (SeqLab: "Select reverse primers from file"). The behavior of these parameters is similar to the behavior of parameter -PRImers, except that primers input with parameter -PRIMERSF are only considered for forward primers, and primers input with parameter -PRIMERSR are only considered for reverse primers. You can use these two parameters together or individually. When used individually, Prime searches for the opposite primers in the template sequence. These new parameters -PRIMERSF and -PRIMERSR are only effective if the parameter -PRImers is NOT specified.

Enhancement: Prime now supports -RELAx (SeqLab: "Ignore most of the constraints set by default"). Using this parameter, most constraints set by default (for example, CLAmp, TMMINPRImer, etc.) are relaxed. Additionally, Prime now recognizes the prefix NO for these parameters to relax each constraint individually.

Enhancement: Prime now supports -SORtbyta (SeqLab: "Sort the output list of products by their annealing temperature"). This parameter outputs the products found in increasing order of their annealing temperature, instead of the default order, which is by increasing annealing score. However, it is important to note that the products saved are still the same; that is, the products saved are those products with the lowest annealing scores.

Enhancement: Prime now supports -FOUndprimers (SeqLab: "Save primers found to a pattern file"). This parameter saves a list of each primer found, or of each primer involved in one or more PCR products, to a data file. The format of this file is the same as the enzyme data files and, therefore, can be used as input to programs such as Prime, PrimePair, and FindPatterns.

Protein Analysis

PeptideSort

Enhancement: PeptideSort has a new parameter, -SHOwseq, which displays the relevant peptide fragments, sorted by position, in the table of cleavage products.

Translation

Translate

Enhancement: Translate has a new parameter -RSF, which sends all output to a single RSF file rather than to separate GCG-formatted sequence files.

Enhancement: Translate has a new parameter -OPEn[=20], which produces translations of open reading frames. By default, the resulting peptide sequences must contain at least 20 amino acids to be reported, however the user may optionally specify a minimum ORF length of from 1 to 32,000 amino acids. When you specify -OPEn, the -RSF parameter is automatically selected.

Enhancement: Translate now supports the -SUMmary parameter. Previously, this parameter had no effect.

Utilities

BreakUp

Enhancement: BreakUp now processes files containing lines as long as 350,000 characters. You no longer need to run ChopUp on a file prior to running BreakUp.

Reformat

Enhancement: Reformat now processes files containing lines as long as 350,000 characters. You no longer need to run ChopUp on a file prior to running Reformat.


Package-Wide Enhancements

[ Previous | Top | Next ]

Translation Tables

Enhancement: New tables for the Scenedesmus obliquus and Thraustochytrium mitochondrial codes are available for any program that supports the -TRANSlate parameter. The two new tables are located in the files GenMoreData:transl_table_22.txt and GenMoreData:transl_table_23.txt, respectively.

Scoring Matrices

Enhancement: The BLOSUM matrices in the files GenRunData:blosum62.cmp and GenRunData:blosum50.cmp were revised. The older versions of these matrices were moved to the files GenMoreData:oldblosum62.cmp and GenMoreData:oldblosum50.cmp, respectively.


Program Bug Fixes

[ Previous | Top | Next ]

Database Searching

BLAST

Known bug: If the -VIEW parameter is specified with TBLASTX or BLASTX searches, BLAST stops and outputs an ambiguous error message of the form

[gcg_blastall] FATAL ERROR: This option is not available with blastx

Known bug: When a TBLASTN search is performed with -VIEW set to values of 1 through 6, the program may display inappropriate error messages of the form:

[gcg_blastall] ERROR: ncbiapi [000.000] SeqPortNew: lcl|GB_SY:AF084443
stop(473) >= len(459)

MotifSearch

Problem: If you ran MotifSearch on a long sequence and specified a high hit threshold such that you found several thousand hits, the program could quit unexpectedly.

Update: MotifSearch can now handle over 1 billion hits. If this limit is exceeded, MotifSearch displays an error message before quitting.

FastA, TFastA, FastX, TFastX, SSearch

Problem: If all of the database sequences had identical scores when compared to the query sequence, a divide-by-zero error would occur when programs such as FastA, TFastA, FastX, TFastX, and SSearch attempted to calculate the z-scores and expectation values. This caused a program termination on Compaq Tru64 UNIX and expectation values of NaN on Solaris and IRIX.

Update: FastA-family programs now test to see if all the database sequences have the same score; if so, the z-scores and expectations values are not computed.

Motifs

Problem: Lines in output list files could be preceded by varying numbers of blanks.

Update: Lines in list files are now left-justified.

FindPatterns

Problem: Lines in output list files could be preceded by varying numbers of blanks.

Update: Lines in list files are now left-justified.

Editing and Publication

PrettyBox

Problem: If you lowered the threshold for the consensus sequence, the consensus character predicted might not be highlighted.

Update: When a lowered threshold for the consensus sequence leads to a predicted consensus character, the consensus character is properly highlighted.

Evolution

PAUPDisplay

Problem: If a sequence name used as input to PAUPSearch contained both hyphens and underscores, PAUPSearch wrote the name into the output file in a form that PAUPDisplay could not read.

Update: PAUPSearch now puts single quotes around sequence names containing both hyphens and underscores when converting GCG file specifications into NEXUS format so that PAUPDisplay can read the names properly.

PAUPSearch

Problem: If a sequence name used as input to PAUPSearch contained both hyphens and underscores, PAUPSearch wrote the name into the output file in a form that PAUPDisplay could not read.

Update: PAUPSearch now puts single quotes around sequence names containing both hyphens and underscores when converting GCG file specifications into NEXUS format so that PAUPDisplay can read the names properly.

Gene Finding and Pattern Recognition

CodonFrequency

Problem: If you ran CodonFrequency on a large data set, the totals for each codon would only advance to 16,777,216, and the reported frequencies were not correct.

Update: CodonFrequency can now handle individual totals over 9x 10<sup>15</sup>.

FindPatterns

Problem: If you specified a pattern that had an OR expression and allowed matching of zero occurrences of this OR expression, this pattern would not match sequences that had zero occurrences of the expression. For example, the pattern CA(W,T){0,3}DR (match CA followed any combination of W or T from 0 to 3 times followed by DR) would not match the sequence CADR.

Update: FindPatterns now supports matching of zero occurrences of an OR expression.

Problem: Lines in output list files could be preceded by varying numbers of blanks.

Update: Lines in list files are now left-justified.

Terminator

Problem: If you specified primary or secondary thresholds interactively, Terminator ignored these values and used the defaults.

Update: You can now specify thresholds interactively with Terminator.

Motifs

Problem: If you gave Motifs a nucleotide sequence, it treated it as a protein sequence, and produced meaningless results.

Update: Now Motifs skips nucleotide sequences and only processes proteins.

Problem: Lines in output list files could be preceded by varying numbers of blanks.

Update: Lines in list files are now left-justified.

Importing and Exporting

FromTrace

Problem: If you ran FromTrace with a file name longer than 80 characters as input and specified the -RSF parameter, the descrip line in the RSF output file possibly contained part of the file name.

Update: FromTrace now handles long input file names properly.

FromFastA

Problem: If you ran FromFastA on a sequence longer than 350,000 bases, and the name of that sequence exceeded 80 characters, FromFastA would quit unexpectedly.

Update: FromFastA now correctly reformats the sequence, splitting it into multiple files as needed.

Problem: If you ran FromFastA on a sequence much longer than 350,000 bases, and that sequence was split into 20 or more files, the sequence names would be padded with extra zeroes and the program could quit unexpectedly.

Update: FromFastA now correctly splits the sequence into properly named files.

FromGenBank

Problem: If you specified an output directory for FromGenBank and that directory did not exist, the program could not create the output file, but it did not display any error messages.

Update: If FromGenBank cannot create the output file, it now displays an error message.

Multiple Comparison

PileUp

Update: Begin and end ranges for sequences listed in a file of sequence names (a GCG list file) are no longer recognized by PileUp if they are listed in the old file format:
myseq.pep /begin 1 /end 100

If you want to specify begin and end ranges for a sequence in a file of sequence names, the ranges must be in the list file format:
myseq.pep Begin: 1 End:100

ProfileMake

Update: Begin and end ranges for sequences listed in a file of sequence names (a GCG list file) are no longer recognized by ProfileMake if they are listed in the old file format:
myseq.pep /begin 1 /end 100

If you want to specify begin and end ranges for a sequence in a file of sequence names, the ranges must be in the list file format:
myseq.pep Begin: 1 End:100

Pairwise Comparison

Gap

Problem: Gap always printed a summary when run without -DEFAULT, even if you used the -NOSUMmary parameter.

Update: Using -NOSUMmary now suppresses the summary.

Problem: If you gave Gap a sequence file that contained no sequence data, the program ran indefinitely or halted unexpectedly with a system error.

Update: Gap now informs you if one of your sequence files contains no sequence data.

BestFit

Problem: If you gave BestFit a sequence file that contained no sequence data, the program ran indefinitely or halted unexpectedly with a system error.

Update: BestFit now informs you if one of your sequence files contains no sequence data.

Problem: BestFit always printed a summary when run without -DEFAULT, even if you used the -NOSUMmary parameter.

Update: Using -NOSUMmary now suppresses the summary.

Primer Selection

Prime

Problem: If you ran Prime and selected a range of the input sequence to be analyzed, the resulting RSF file would contain only the selected range of the input sequence, but the feature positions correlated to the entire input sequence. Therefore, the features listed in the RSF file would not match onto the sequence in the RSF file.

Update: The RSF file now contains the entire input sequence and the feature positions are correct.

Protein Analysis

PeptideSort

Known Bug: Molecular weight calculations are incorrect for sequences containing one or more X characters.

SPScan

Problem: The McGeogh scan could fail with input sequences containing lowercase characters.

Update: The presence of lowercase characters will not cause the McGeogh scan to fail.

Utilities

BreakUp

Problem: If you ran BreakUp on a file with a name that had no extension and used a relative path containing a period, BreakUp would fail.

Update: BreakUp now correctly processes files with names using relative paths and containing periods.

Reformat

Known Bug: If you run Reformat on a sequence-only text file to try to convert it into an RSF file, for example:
% reformat seq.txt -rsf
the program will terminate with the following error message: "*** ERROR: can't read sequence" and produce an empty RSF file.

Workaround: You can use a list file intermediate, for example:
% reformat seq.txt -list
% reformat @reformat.list -rsf

GCGToBLAST

Problem: U (selenocysteine) residues were removed from the protein sequences during formatting.

Update: All occurrences of the character U in protein sequences are now replaced with the character X.

Problem: It was not possible to set a BLAST database volume size greater than 500 million.

Update: You can now specify volume sizes of up to 2 billion sequence characters.

Translation

Translate

Problem: The -SUMmary parameter was not recognized by Translate.

Update: Translate now supports the -SUMmary parameter.


SeqLab Bug Fixes

[ Previous | Top | Next ]

Problem: If you selected an aligned group of sequences in the editor and used them as input to a program that produces RSF output, the alignment in the RSF output file could be distorted.

Update: Alignments are now preserved in the RSF output file.

Problem: If you ran SeqLab from a shell not derived from csh, which did not set the USER environment variable, it failed to start and displayed the message "The server had a startup problem:"

Update: SeqLab's server no longer relies on the USER environment variable.

Problem: The CoilScan and FrameSearch programs often produced empty figure files in the output manager.

Update: These programs no longer generate empty figure files.

Programs Run Through SeqLab

Prime

Problem: If you ran Prime in SeqLab and specified a PCR target range, the -INClude parameter would be recalculated depending on the -MAXPROduct parameter.

Update: Prime in SeqLab now behaves correctly, as on the command line, and resets the -MINPROduct and -MAXPROduct parameters to be in agreement with the -INClude value, which by default is always set to 100%.

LookUp

Problem: In the options menu for LookUp, you could select to show the ID field of the annotation instead of the correct option, NAMe.

Update: The options menu now includes NAMe, which replaced the ID token for LookUp.

PAUPDisplay

Problem: When using SeqLab, you could not set a value for the shape parameter when using the gamma distribution option with the maximum likelihood criterion.

Update: The shape parameter is now available when running PAUPDisplay under SeqLab.

BLAST

Known Bug: If you selected a region of a sequence from the main list and used it as a BLAST query sequence, the full length of the sequence was used.

Workaround: Use the SeqLab Editor to select begin and end ranges of the BLAST query sequence.


Package-Wide Bug Fixes

[ Previous | Top | Next ]

Support for Selenocysteine

Limitation: Limited support for selenocysteine residues in protein sequences.

GenBank, along with the DNA Data Bank of Japan (DDBJ) and the EMBL Data Library, allows the use of selenocysteine (the letter U) in the translations that appear in the Features Table of their entries. Since the GenPept protein database is derived from this Features Table specification, the letter U appears for selenocysteine residues in some GenPept sequences.

Currently, the letter U is not represented in scoring matrices and in other resource files used by many GCG programs.

Since the research community has yet to adopt a uniform way to treat selenocysteines while performing various kinds of sequence analysis, protein sequences that contain the letter U should not be used with GCG programs. Accelrys recommends that all occurrences of the letter U in protein sequences be replaced with the letter X (that is, "unknown"), as is done for the version of GenPept distributed with the Database Update Service.

Please contact Accelrys Bioinformatics Support if you require additional information concerning support for selenocysteine residues in protein sequences with GCG.


User Documentation Bug Fix

[ Previous | Top ]

Editing and Publishing

LineUp

Known Bug: The Screen Mode Summary documentation for LineUp erroneously indicates that Z should be used to enter Command Mode. Using Z will suspend the process.

Workaround: You should use D to enter Command Mode from Screen Mode.

Known Bug: The Screen Mode Summary documentation for LineUp erroneously indicates that D should be used to pull over all sequences starting past the current column. Using D will cause you to leave Screen Mode and enter Command Mode.

Workaround: You should use P to pull over all sequences starting past the current column.


[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio