Short Descriptions

This appendix lists and briefly describes programs in Accelrys GCG (GCG). Programs are grouped by function and may appear under multiple functional headings. For more information on using these programs, see the Program Manual.

Table notes:

The following explains notations used in the tables.

“2” These programs generate graphics that require a graphics output device. (Example of usage: DotPlot2)

“+” These programs are new or enhanced in GCG 11.0. The “+” is part of the program name and is required when executing any of these programs. (Example of usage: ClustalW+)

Comparison

Pairwise Comparison

Gap	Uses the algorithm of Needleman and Wunsch to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps.
BestFit	Makes an optimal alignment of the best segment of similarity between two sequences. Optimal alignments are found by inserting gaps to maximize the number of matches using the local homology algorithm of Smith and Waterman.
FrameAlign	Creates an optimal alignment of the best segment of similarity (local alignment) between a protein sequence and the codons in all possible reading frames on a single strand of a nucleotide sequence. Optimal alignments may include reading frame shifts.
Compare	Compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.
DotPlot²	Makes a dot-plot with the output file from Compare or StemLoop.
GapShow²	Displays an alignment by making a graph that shows the distribution of similarities and gaps. The two input sequences should be aligned with either Gap or BestFit before they are given to GapShow for display.
ProfileGap	Makes an optimal alignment between a profile and one or more sequences.
Multiple Comparison
ClustalW+	Creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment.
PileUp²	Creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment.
SeqLab	It is the graphical user interface for GCG. For additional information, refer to the SeqLab Guide.
PlotSimilarity²	Plots the running average of the similarity among the sequences in a multiple sequence alignment.
Pretty	Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.
PrettyBox²	Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it.
MEME	(Multiple EM for Motif Elicitation) Finds conserved motifs in a group of unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
MEME+	(Multiple EM for Motif Elicitation) Finds conserved motifs in a group of unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
ProfileMake	Creates a position-specific scoring table, called a profile that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap).
ProfileGap	Makes an optimal alignment between a profile and one or more sequences.
HmmerAlign	Uses a profile hidden Markov model (HMM) as a template to create an optimal multiple alignment of a group of sequences.
Overlap	Compares two sets of DNA sequences to each other in both orientations using a WordSearch style comparison.
NoOverlap	Identifies the places where a group of nucleotide sequences do not share any common subsequences.
OldDistances	Makes a table of the pairwise similarities within a group of aligned sequences.
HmmerBuild	HmmerBuild creates a position-specific scoring table, called a profile hidden Markov model (HMM), that is a statistical model of the consensus of a multiple sequence alignment. The profile HMM can be used for database searching (HmmerSearch), sequence alignment (HmmerAlign) or generating random sequences that match the model (HmmerEmit).
HmmerCalibrate	HmmerCalibrate “calibrates” a profile hidden Markov model in order to increase the sensitivity of database searches performed using that profile HMM as a query. The program compares the original profile HMM with a large number of randomly generated sequences and computes the extreme value distribution (EVD) parameters for this simulated search. The original profile HMM is replaced with a new one that contains these EVD parameters.

Database Searching

Reference Searching
LookUp	Identifies sequence database entries by name, accession number, author, organism, keyword, title, reference, feature, definition, length, or date. The output is a list of sequences.
StringSearch	Identifies sequences by searching for character patterns such as "globin" or "human" in the sequence documentation.
Names	Identifies GCG data files and sequence entries by name. It can show you what set of sequences is implied by any sequence specification.
Sequence Searching
BLAST	Searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST can produce gapped alignments for the matches it finds.
BLAST+	Searches one or more nucleic acid or protein databases for sequences similar to one or more query sequences of any type. BLAST+ can produce gapped alignments for the matches it finds.
NetBLAST	Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.
NetBLAST+	Searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. NetBLAST+ can search only databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.
PSIBLAST	Iteratively searches one or more protein databases for sequences similar to one or more protein query sequences. PSIBLAST is similar to BLAST except that it uses position-specific scoring matrices derived during the search.
FastA	Does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA may be more sensitive than BLAST.
FastA+	Does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA+ may be more sensitive than BLAST.
SSearch	Does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it can be very slow.
SSearch+	Does a rigorous Smith-Waterman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). This may be the most sensitive method available for similarity searches. Compared to BLAST and FastA, it can be very slow.
TFastA	Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"
TFastA+	Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences. TFastA+ translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?"
TFastX	Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA, and like TFastA, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" TFastA treats each of the six reading frames of a nucleotide sequence as a separate sequence, resulting in three separate alignments for each strand. TFastX, on the other hand, compares the protein query sequence to only one translated protein per strand of the nucleotide sequence, resulting in one alignment per strand. It calculates a similarity score for alignments that takes frameshifts into account, allowing it to "join" short regions separated by frameshifts into a single long alignment. TFastX may alert you to more meaningful hits than TFastA does when the nucleotide sequences contain frameshift errors.
TFastX+	Does a Pearson and Lipman search for similarity between a protein query sequence and any group of nucleotide sequences, taking frameshifts into account. It is designed to be a replacement for TFastA+, and like TFastA+, it is designed to answer the question, "What implied protein sequences in a nucleotide sequence database are similar to my protein sequence?" TFastA+ treats each of the six reading frames of a nucleotide sequence as a separate sequence, resulting in three separate alignments for each strand. TFastX+, on the other hand, compares the protein query sequence to only one translated protein per strand of the nucleotide sequence, resulting in one alignment per strand. It calculates a similarity score for alignments that takes frameshifts into account, allowing it to "join" short regions separated by frameshifts into a single long alignment. TFastX may alert you to more meaningful hits than TFastA does when the nucleotide sequences contain frameshift errors.
FastX	Does a Pearson and Lipman search for similarity between a nucleotide query sequence and a group of protein sequences, taking frameshifts into account. FastX translates both strands of the nucleic sequence before performing the comparison. It is designed to answer the question, "What implied protein sequences in my nucleic acid sequence are similar to sequences in a protein database?"
FastX+	Does a Pearson and Lipman search for similarity between a nucleotide query sequence and a group of protein sequences, taking frameshifts into account. FastX+ translates both strands of the nucleic sequence before performing the comparison. It is designed to answer the question, "What implied protein sequences in my nucleic acid sequence are similar to sequences in a protein database?"
FrameSearch²	Searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program finds an optimal alignment between the protein sequence and all possible codons on each strand of the nucleotide sequence. Optimal alignments may include reading frame shifts.
HmmerSearch	Uses a profile hidden Markov model as a query to search a sequence database to find sequences similar to the family from which the profile HMM was built. Profile HMMs can be created using HmmerBuild.
MotifSearch	Uses a set of profiles (representing similarities within a family of sequences) as a query to either a) search a database for new sequences similar to the original family, or b) annotate the members of the original family with details of the matches between the profiles and each of the members. Normally, the profiles are created with the program MEME.
ProfileSearch	Uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake.
ProfileSegments	Makes optimal alignments showing the segments of similarity found by ProfileSearch.
FindPatterns	Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
FindPatterns+	Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
Motifs	Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
HmmerBuild	HmmerBuild creates a position-specific scoring table, called a profile hidden Markov model (HMM), that is a statistical model of the consensus of a multiple sequence alignment. The profile HMM can be used for database searching (HmmerSearch), sequence alignment (HmmerAlign) or generating random sequences that match the model (HmmerEmit).
HmmerCalibrate	HmmerCalibrate “calibrates” a profile hidden Markov model in order to increase the sensitivity of database searches performed using that profile HMM as a query. The program compares the original profile HMM with a large number of randomly generated sequences and computes the extreme value distribution (EVD) parameters for this simulated search. The original profile HMM is replaced with a new one that contains these EVD parameters.
HmmerPfam	Compares one or more sequences to a database of profile hidden Markov models, such as the Pfam library, in order to identify known domains within the sequences.
WordSearch²	Identifies sequences in the database that share large numbers of common words in the same register of comparison with your query sequence. The output of WordSearch can be displayed with Segments.
Segments	Aligns and displays the segments of similarity found by WordSearch.
Sequence Retrieval
Fetch	Copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.
Fetch+	Copies GCG sequences or data files from the GCG database into your directory or displays them on your terminal screen.
NetFetch	Retrieves entries from NCBI listed in a NetBLAST output file. It can also be used to retrieve entries individually by entry name or accession number. The output of NetFetch is an RSF file.
NetFetch+	Retrieves entries from NCBI listed in a NetBLAST+ output file. It can also be used to retrieve entries individually by entry name or accession number. The output of NetFetch+ is an RSF file.

DNA/ RNA Secondary Structure

MFold	Predicts optimal and suboptimal secondary structures for an RNA or DNA molecule using the most recent energy minimization method of Zuker.
PlotFold	Displays the optimal and suboptimal secondary structures for an RNA or DNA molecule predicted by MFold.
StemLoop	Finds stems (inverted repeats) within a sequence. You specify the minimum stem length, minimum and maximum loop sizes, and the minimum number of bonds per stem. All loops or only the best loops can be displayed on your screen or written into a file.
DotPlot²	Makes a dot-plot with the output file from Compare or StemLoop.

Editing and Publication

SeqLab	Is the graphical user interface for GCG. For additional information, refer to the SeqLab Guide.
Assemble	Constructs new sequences from pieces of existing sequences. It concatenates the fragments you specify and writes them out as a new sequence file. SeqEd is a better tool for assembling sequences interactively, but Assemble is best for assembling sequences from fragments defined in a list file.
Pretty	Displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.
PrettyBox²	Displays multiple sequence alignments as shaded boxes in Postscript format for printing or displaying with a Postscript-compatible device. PrettyBox optionally calculates a consensus sequence. The program does not create the alignment; it simply displays it.
PlasmidMap²	Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort.
Figure²	Makes figures and posters by drawing graphics and text together. You can include output from other GCG graphics programs as part of a figure.

Evolution

PAUPSearch	Provides a GCG interface to the tree-searching options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a set of aligned sequences, you can search for phylogenetic trees that are optimal according to parsimony, distance, or maximum likelihood criteria; reconstruct a neighbor-joining tree; or perform a bootstrap analysis. The program PAUPDisplay can produce a graphical version of a PAUPSearch trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP.
PAUPDisplay	Provides a GCG interface to tree manipulation, diagnosis, and display options in PAUP (Phylogenetic Analysis Using Parsimony). Starting with a trees file that contains a sequence alignment and one or more trees reconstructed from this alignment (such as the output from PAUPSearch), you can plot the tree(s); compute the score of the tree(s) according to the criteria of parsimony, distance, or maximum likelihood; or calculate a consensus tree (two or more input trees). PAUPDisplay can also plot the trees from a GrowTree trees file. PAUP is the copyrighted property of the Smithsonian Institution. Use the program Fetch to obtain a copy of paup-license.txt to read about rights and limitations for using PAUP.
Distances	Creates a table of the pairwise distances within a group of aligned sequences.
GrowTree	Creates a phylogenetic tree from a distance matrix created by Distances using either the UPGMA or neighbor-joining method. You can create a text or graphics output file.
Diverge	Estimates the pairwise number of synonymous and nonsynonymous substitutions per site between two or more aligned nucleic acid sequences that code for proteins. It uses a variant of the method published by Li et al.

Fragment Assembly

SeqMerge

SeqMerge is GCG’s powerful new fragment assembly application with an X Windows graphical user interface. SeqMerge allows you to intuitively assemble fragments in a sequencing project into contigs, or alignments of overlapping fragments. From the contig, SeqMerge creates a consensus sequence representing the underlying sequence from which your fragments were derived.

Gene Finding and Pattern Recognition

TestCode	Helps you identify protein coding sequences by plotting a measure of the non-randomness of the composition at every third base. The statistic does not require a codon frequency table.
CodonPreference	Is a frame-specific gene finder that tries to recognize protein coding sequences by virtue of the similarity of their codon usage to a codon frequency table or by the bias of their composition (usually GC) in the third position of each codon.
Frames	Shows open reading frames for the six translation frames of a DNA sequence. Frames can superimpose the pattern of rare codon choices if you provide it with a codon frequency table.
Terminator	Searches for prokaryotic factor-independent RNA polymerase terminators according to the method of Brendel and Trifonov.
Motifs	Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
MEME	(Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
MEME+	(Multiple EM for Motif Elicitation) Finds conserved motifs in a group unaligned sequences. MEME+ saves these motifs as a set of profiles. You can search a database of sequences with these profiles using the MotifSearch program.
Repeat	Finds direct repeats in sequences. You must set the size, stringency, and range within which the repeat must occur; all the repeats of that size or greater are displayed as short alignments.
FindPatterns	Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
FindPatterns+	Identifies sequences that contain short patterns like GAATTC or YRYRYRYR. You can define the patterns ambiguously and allow mismatches. You can provide the patterns in a file or simply type them in from the terminal.
Composition	Determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.
CodonFrequency	Tabulates codon usage from sequences and/or existing codon usage tables. The output file is correctly formatted for input to the CodonPreference, Correspond, and Frames programs.
Correspond	Looks for similar patterns of codon usage by comparing codon frequency tables.
Window	Makes a table of the frequencies of different sequence patterns within a window as it is moved along a sequence. A pattern is any short sequence like GC or R or ATG. You can plot the output with the program StatPlot.
StatPlot²	Plots a set of parallel curves from a table of numbers like the table written by the Window program. The statistics in each column of the table are associated with a position in the analyzed sequence.
FitConsensus	Uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality.
Consensus	Calculates a consensus sequence for a set of pre-aligned short nucleic acid sequences by tabulating the percent of G, A, T, and C for each position in the set. FitConsensus uses the Consensus output table as a probe to search for the best examples of the derived consensus in other nucleotide sequences.
Xnu	Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg	Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
FromTrace	FromTrace converts one or more ABI or SCF trace files into GCG single sequence files.

Importing / Exporting

SeqConv+	SeqConv+ is a utility program that provides batch conversions between different sequence formats. The motivation for the program is to allow an end user to easily convert between file formats to easily import data into Accelrys’ bioinformatics applications. In addition, the converter allows the user to convert our internally used formats (e.g. BSML, RSF) into formats more commonly accepted by third-party tools. The supported file formats will include BSML, GenBank, FastA, and RSF.
Reformat	Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs.
BreakUp	BreakUp reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by GS GCG programs.
HmmerConvert	HmmerConvert converts profile hidden Markov model files into different profile formats.

Mapping

Map	Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
Map+	Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map+ can also create a peptide map of an amino acid sequence.
MapPlot²	Displays restriction sites graphically. If you don't have a plotter, MapPlot can write a text file that approximates the graph.
MapSort	Finds the coordinates of the restriction enzyme cuts in a DNA sequence and sorts the fragments of the resulting digest by size. MapSort can sort the fragments from single or multiple enzyme digests.
Fingerprint	Identifies the products of T1 ribonuclease digestion.
PeptideMap	Creates a peptide map of an amino acid sequence.
PlasmidMap	Draws a circular plot of a plasmid construct. It can display restriction patterns, inserts, and known genetic elements. The plot is suitable for publication, record keeping, or analysis. It is drawn from one or more labeling files such as those written by MapSort.
PeptideSort	Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.

Primer Selection

Prime	Selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.
Prime+	Selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime+ to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.
PrimePair	Evaluates individual primers to determine their compatibility for use as PCR primer pairs. You can provide the primers in files (one for forward, one for reverse primers) or on the command line, or you can enter them interactively from the keyboard.
MeltTemp	Computes the melting temperature of oligonucleotides. You can provide the oligonucleotide sequences in a file or simply type them in at the keyboard.

HMMER

HmmerAlign	Uses a profile hidden Markov model (HMM) as a template to create an optimal multiple alignment of a group of sequences.
HmmerBuild	Creates a position-specific scoring table, called a profile hidden Markov model (HMM), that is a statistical model of the consensus of a multiple sequence alignment. The profile HMM can be used for database searching (HmmerSearch), sequence alignment (HmmerAlign) or generating random sequences that match the model (HmmerEmit).
HmmerCalibrate	"Calibrates" a profile hidden Markov model in order to increase the sensitivity of database searches performed using that profile HMM as a query. The program compares the original profile HMM with a large number of randomly generated sequences and computes the extreme value distribution (EVD) parameters for this simulated search. The original profile HMM is replaced with a new one that contains these EVD parameters.
HmmerConvert	Converts profile hidden Markov model files into different profile formats.
HmmerEmit	Generates sequences that match a profile hidden Markov model.
HmmerFetch	Retrieves a profile hidden Markov model (HMM) from a database of profile HMMs that has been indexed by HmmerIndex.
HmmerIndex	Creates an index for a profile hidden Markov model database so that profile HMMs can be retrieved from the database with HmmerFetch.
HmmerPfam	Compares one or more sequences to a database of profile hidden Markov models, such as the Pfam library, in order to identify known domains within the sequences.
HmmerSearch	Uses a profile hidden Markov model as a query to search a sequence database to find sequences similar to the family from which the profile HMM was built. Profile HMMs can be created using HmmerBuild.

Protein Analysis

Motifs	Looks for sequence motifs by searching through proteins for the patterns defined in the PROSITE Dictionary of Protein Sites and Patterns. Motifs can display an abstract of the current literature on each of the motifs it finds.
ProfileScan	Uses a database of profiles to find structural and sequence motifs in protein sequences.
HmmerPfam	Compares one or more sequences to a database of profile hidden Markov models, such as the Pfam library, in order to identify known domains within the sequences.
TransMem	Scans for likely transmembrane helices in one or more input protein sequences.
TransMem+	Scans for likely transmembrane helices in one or more input protein sequences.
CoilScan	Locates coiled-coil segments in protein sequences.
HTHScan	Scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation.
SPScan	Scans protein sequences for the presence of secretor signal peptides (SPs).
CoilScan+	Locates coiled-coil segments in protein sequences.
HTHScan+	Scans protein sequences for the presence of helix-turn-helix motifs, indicative of sequence-specific DNA-binding structures often associated with gene regulation.
SPScan+	Scans protein sequences for the presence of secretor signal peptides (SPs).
PeptideSort	Shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.
Isoelectric	Plots the charge as a function of pH for any peptide sequence.
PeptideMap	Creates a peptide map of an amino acid sequence.
PepPlot²	Plots measures of protein secondary structure and hydrophobicity in parallel panels of the same plot.
PeptideStructure	Makes secondary structure predictions for a peptide sequence. The predictions include (in addition to alpha, beta, coil, and turn) measures for antigenicity, flexibility, hydrophobicity, and surface probability. PlotStructure displays the predictions graphically.
PlotStructure	Plots the measures of protein secondary structure in the output file from PeptideStructure. The measures can be shown on parallel panels of a graph or with a two-dimensional "squiggly" representation.
Moment	Makes a contour plot of the helical hydrophobic moment of a peptide sequence.
HelicalWheel	Plots a peptide sequence as a helical wheel to help you recognize amphiphilic regions.
Xnu	Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg	Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

Translation

Translate	Translates nucleotide sequences into peptide sequences.
BackTranslate	Backtranslates an amino acid sequence into a nucleotide sequence. The output helps you recognize minimally ambiguous regions that might be good for constructing synthetic probes.
Map	Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
Reverse	Reverses and/or complements a sequence.
DataSet	Creates a GCG data library from any set of sequences in GCG format. To translate nucleotide sequences into peptide sequences, include the ToProt parameter.
Map+	Maps a DNA sequence and displays both strands of the mapped sequence with restriction enzyme cut points above the sequence and protein translations below. Map can also create a peptide map of an amino acid sequence.
DataSet+	Creates a GCG data library from any set of sequences in GCG format.

Utilities

Sequence Utilities

SeqManip+	SeqManip+ is a utility program that allows the user to perform some manipulations of sequences, including translation, back translation of protein sequences, splitting sequences. While individual programs to perform these tasks already exist in Wisconsin Package 10.3, SeqManip+ provides a single platform to execute all the relevant sequence operations. This saves the users from having to find and run several different applications in order to execute some basic sequence manipulations.
SeqStat+	SeqStat+ is a utility program that reads through any number of input sequences and provides some basic statistics about the files, including total length, number of sequences, and average length. Additionally it provides some extended information about the sequences depending on their type (protein or nucleotide), such as G+C% content.
SeqConv+	SeqConv+ is a utility program that provides batch conversions between different sequence formats. The motivation for the program is to allow an end user to easily convert between file formats to easily import data into Accelrys’ bioinformatics applications. In addition, the converter allows the user to convert our internally used formats (e.g. BSML, RSF) into formats more commonly accepted by third-party tools. The supported file formats will include BSML, GenBank, FastA, and RSF.
Reverse	Reverses and/or complements a sequence.
Shuffle	Randomizes the order of the symbols in a sequence without changing the composition.
Simplify	Lets you reduce the number of symbols in a sequence. Such a simplification would allow you, for instance, to treat all hydrophobic amino acids as equivalent.
CompTable	Creates a scoring matrix using equivalences defined in a simplification scheme such as the one used for Simplify.
HmmerEmit	Generates sequences that match a profile hidden Markov model.
Corrupt	Randomly introduces small numbers of substitutions, insertions, and deletions into nucleotide or protein sequence(s).
Xnu	Replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Seg	Replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.
Sample	Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.
Database Utilities
DataSet	Creates a GCG data library from any set of sequences in GCG format.
DataSet+	Creates a GCG data library from any set of sequences in GCG format.
Sample	Extracts sequence fragments randomly from sequence(s). You can set a sampling rate to determine how many fragments Sample extracts.
Printing / Plotting Utilities
StatPlot2	Allows you to choose a plotting configuration from a menu of available graphics devices at your site.
Figure²	Makes figures and posters by drawing graphics and text together. You can include output from other GCG graphics programs as part of a figure.
PlotTest²	Plots an example graphic to test your graphics configuration. The graphic created by PlotTest uses every GCG graphics feature. It should resemble the example graphic in the Program Manual.
Miscellaneous Utilities
Reformat	Rewrites sequence file(s), scoring matrix file(s), or enzyme data file(s) so that they can be read by GCG programs.
Name	Displays GCG logical name(s) from the GCG logical names table.
Symbol	Displays GCG symbol(s) from the GCG symbol table.

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.