[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes]


Sequence Typing

As you work with the Accelrys GCG (GCG), you will find that some programs accept only nucleotide sequences while others accept only proteins. Many programs allow both nucleotide and protein sequences as input but perform their analysis differently depending on the input sequence type.

You can determine the type of a sequence by looking at the sequence file. Sequences in GCG format contain a dividing line between an optional text heading and the sequence data. Consider the following example of a typical dividing line:

Gamma.Seq Length: 11375 January 1, 1997 10:09 Type: N Checksum: 6474  ..

The sequence type should appear on the dividing line as either Type: N for nucleotide or Type: P for protein. (See "Types of Sequence Files" in Section 2, Using Sequence Files and Databases of the User's Guide for a complete description of sequence file formats.) Sequences created before version 7.0 of GCG (April 1991) do not have this Type: field on the dividing line. If the dividing line doesn't contain a Type: field, GCG infers the sequence type from the characters in the sequence. This inference may not always be correct.

In previous versions of GCG, you could ensure that programs inferred the correct sequence type by specifying the sequence type on the command line when you ran a program. However, starting with version 8.0 of the package, the sequence type is now an inherent part of the sequence; it cannot be changed from the command line.

If the Type: field of any sequence is incorrect or missing, you can correct it with the Reformat program. Type

% reformat /NUCleotide filename or

% reformat /PROtein filename

For more information, see the Reformat documentation in the Program Manual. ("Specifying Sequence Type" in Section 2, Using Sequence Files and Databases of the User's Guide also details how to change the sequence type.)

Printed: November 30, 2004 13:42 (1162)

[ Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

Technical Support:,,

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.