Accelrys GCG (GCG) programs allow all upper- and lowercase letters, periods (.), asterisks (*), tildes (~), ampersands (&), and at (@) symbols in biological sequences. Nucleotide symbols, their complements, and the standard one-letter amino acid symbols are shown below in separate lists. The meanings of the symbols &, and @ have not been assigned at this writing (October, 1996).
GCG supports two gap characters: the period (.) and the tilde (~). GCG programs run from the command line or from the Main List mode of SeqLab treat the two gap characters identically in input sequences. GCG programs run from the Editor mode of SeqLab remove any tilde gap characters from the right end of each input sequence before performing their analyses.
In the future, programs run from either the command line or from SeqLab may differentiate the two gap characters in their analyses. The period gap character will increasingly be used as a space holder that may represent a missing character in a sequence. For example, the period gap character may represent a missed base call in a contig alignment in fragment assembly. The tilde gap character will increasingly be used as a simple place holder that never represents an actual character in a sequence. For example, two tildes may be used in a translated sequence to align each codon in a nucleotide sequence with its corresponding single-letter amino acid symbol. As another example, gaps at the ends of sequences in an alignment may be written as tildes when those gaps are due to differences in input sequence lengths rather than missing characters in the input sequences.
GCG uses the letter codes for amino acid codes and nucleotide ambiguity proposed by IUPAC-IUB. These codes are compatible with the codes used by the GenBank and PIR databases.
The meaning of each symbol, its complement, and the
IUB/GCG Meaning Complement Staden/Sanger
A A T A
C C G C
G G C G
T/U T A T
M A or C K M
R A or G Y R
W A or T W W
S C or G S S
Y C or T R Y
K G or T M K
V A or C or G B V
H A or C or T D H
D A or G or T H D
B C or G or T V B
X/N G or A or T or C X/N N
./~ gap character ./~ -
The uncertainty and frame ambiguity codes used by Staden are not supported by GCG and are converted by FromStaden to the lowercase single base equivalent.
Staden Code Meaning GCG
1 probably C c
2 probably T t
3 probably A a
4 probably G g
5 A or C m
6 G or T k
7 A or T w
8 G or C s
Here is a list of the standard one-letter amino acid codes and their three-letter equivalents. The synonymous codons and their depiction in the IUB codes are shown. You should recognize that the codons following semicolons (;) are not sufficiently specific to define a single amino acid even though they represent the best possible backtranslation into the IUB codes! You can redefine all of the relationships in this list in a local data file as described in Appendix VII.
Symbol 3-letter Meaning Codons Depiction
Alanine GCT,GCC,GCA,GCG !GCX Ala
B Asp,Asn Aspartic,
Asparagine GAT,GAC,AAT,AAC !RAY
C Cys Cysteine TGT,TGC !TGY
D Asp Aspartic GAT,GAC !GAY
E GluGlutamic GAA,GAG !GAR
F Phe Phenylalanine TTT,TTC !TTY
G Gly Glycine GGT,GGC,GGA,GGG !GGX
H His Histidine CAT,CAC !CAY
I Ile Isoleucine ATT,ATC,ATA !ATH
K Lys Lysine AAA,AAG !
L Leu Leucine TTG,TTA,CTT,CTC,CTA,CTG !TTR,CTX,YTR;YTX
M Met Methionine ATG !ATG
N Asn Asparagine AAT,AAC !AAY
P Pro Proline CCT,CCC,CCA,CCG !CCX
Q Gln Glutamine CAA,CAG !CAR
R Arg Arginine CGT,CGC,CGA,CGG,AGA,AGG !CGX,AGR,MGR;MGX
S Ser Serine TCT,TCC,TCA,TCG,AGT,AGC !TCX,AGY;WSX
T Thr Threonine ACT,ACC,ACA,ACG !ACX
V Val Valine GTT,GTC,GTA,GTG !GTX
W TrpTryptophan TGG !TGG
X Xxx Unknown !XXX
Y Tyr Tyrosine TAT, TAC !
Z Glu,Gln Glutamic,
Glutamine GAA,GAG,CAA,CAG !SAR
* End Terminator TAA, TAG, TGA !TAR,TRA;TRR
Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.
Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.