APPENDIX V

References

The information in this appendix lists the citations used throughout the set of Accelrys GCG (GCG) Package documentation.

Adams, E.N. III. (1972). Consensus Techniques and the Comparison of Taxonomic Trees. Systematic Zoology 21, 390-397.

Altschul, Stephen F. (1991). Amino acid substitution matrices from an information theoretic perspective. Journal of Molecular Biology 219, 555-565.

Altschul, S.F. (1993). A protein alignment scoring system sensitive at all evolutionary distances. Journal of Molecular Evolution36, 290-300.

Altschul, S.F., Boguski, M.S., Gish, W., and Wootton, J.C. (1994). Issues in searching molecular sequence databases. Nature Genetics 6, 119-129.

Altschul, S.F. and Erickson, B.W. (1985). Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage. Molecular Biology and Evolution 2, 526-538.

Altschul, S.F., and Gish, Warren. (1996). Local Alignment Statistics. In Methods in Enzymology, (R. Doolittle, ed.), 266, 460-480, Academic Press, San Diego, California, USA.

Altschul, Stephen F., Gish, Warren, Miller, Webb, Myers, Eugene W., and Lipman, David J. (1990). Basic local alignment search tool. Journal of Molecular Biology 215, 403-410.

Altschul, Stephen F., Madden, Thomas L., Schaffer, Alejandro A., Zhang, Jinghui, Zhang, Zheng, Miller, Webb, and Lipman, David J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389-3402.

Bailey, T.L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, California, USA, 28-36.

Bailey, T.L. and Gribskov, M. (1998). Combining evidence using p-values: application application to sequence homology searches. Bioinformatics, 14(1), 48-54.

Baldino, M., Jr. (1989). High Resolution In Situ Hybridization Histochemistry. In Methods in Enzymology, (P.M. Conn, ed.), 168, 761-777, Academic Press, San Diego, California, USA.

Bibb, M.J., Findlay, P.R., and Johnson, M.W. (1984). The relationship between base composition and codon usage in bacterial genes and its use in the simple and reliable identification of protein coding sequences. Gene 30, 157-166.

Borer, P.N., Dengler, B., and Tinoco, I., Jr. (1974). Stability of Ribonucleic Acid and Double-stranded Helices. Journal of Molecular Biology 86, 843-853.

Brendel, V. and Trifonov, E.N. (1984). A Computer Algorithm for Testing Prokaryotic Terminators. Nucleic Acids Research12, 4411-4427.

Brendel, V. and Trifonov, E.N. (1984). CODATA Conference Proceedings, Jerusalem.

Breslauer, K.J., Frank, R., Blocker, H., and Marky, L.A. (1986). Predicting DNA Duplex Stability from the Base Sequence. Proceedings of the National Academy of Sciences USA 83, 3746-3750.

Chao, K.M., Pearson, W.R. and Miller, W. (1992). Aligning Two Sequences within a Specified Diagonal Band. Computer Applications in the Biosciences 8(5), 481-487.

Chou, P.Y. and Fasman, G.D. (1978). Prediction of the Secondary Structure of Proteins From Their Amino Acid Sequence. Advances in Enzymology 47, 45-147.

Claverie, J.-M. and Audic, S. (1996). The statistical significance of nucleotide position-weight matrix matches. CABIOS 12(5), 431-439.

Claverie, J.-M. and States, D.J. (1993). Information enhancement methods for large-scale sequence analysis. Computers and Chemistry 17, 191-201.

Clayton, J. and Kedes, L. (1982). GEL, A DNA Sequencing Project Management System. Nucleic Acids Research10, 305-321.

Devereux, J., Haeberli, P., and Smithies, O. (1984). A Comprehensive Set of Sequence Analysis Programs for the VAX. Nucleic Acids Research 12(1), 387-395.

Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis. Probabilistic Models of Proteins and Nucleic Acids , Cambridge University Press, Cambridge, UK.

Eddy, S.R. (1996). Hidden Markov Models. Current Opinion in Structural Biology 6, 361-365.

Eddy, S.R. (1998). Profile hidden Markov models. Bioinformatics, 14, 755-763.

Eisenberg, D., Sweet, R.M., and Terwilliger, T.C. (1984). The Hydrophobic Moment Detects Periodicity in Protein Hydrophobicity. Proceedings of the National Academy of Sciences USA 81, 140-144.

Emini, E.A., Hughes, J.V., Perlow, D.S., and Boger, J. (1985). Induction of Hepatitis A Virus-Neutralizing Antibody by a Virus-Specific Synthetic Peptide. Journal of Virology 55(3), 836-839.

Engelman, D.M., Steitz, T.A., and Goldman, A. (1986). Identifying Nonpolar Transbilayer Helices in Amino Acid Sequences of Membrane Proteins. Annual Review of Biophysics and Biophysical Chemistry 15, 321-353.

Etzold, T. and Argos, P. (1993). SRS - An Indexing and Retrieval Tool for Flat File Data Libraries. Computer Applications in the Biosciences9(1), 49-57.

Farris, J.S. (1972). Estimating Phylogenetic Trees from Distance Matrices. American Naturalist 106, 645-668.

Feng, D.F. and Doolittle, R.F (1987). Progressive Sequence Alignment as a Prerequisite to Correct Phylogenetic Trees. Journal of Molecular Evolution 25, 351-360.

Fickett, J.W. (1982). Recognition of Protein Coding Regions in DNA Sequences. Nucleic Acids Research 10, 5303-5318.

Finer-Moore, J. and Stroud, R.M. (1984). Amphipathic analysis and possible formation of the ion channel in an acetylcholine receptor. Proceedings of the National Academy of Sciences USA 81, 155-159.

Freier, S.M., Kierzek, R., Jaeger, J.A., Sugimoto, M., Caruthers, M.H., Neilson, T., and Turner, D.H. (1986). Improved Free-Energy Parameters for Predictions of RNA Duplex Stability. Proceedings of the National Academy of Sciences USA83, 9373-9377.

Garnier, J., Osguthorpe, D.J., and Robson, B. (1978). Analysis of the Accuracy and Implications of Simple Methods for Predicting the Secondary Structure of Globular Proteins. Journal of Molecular Biology 120, 97-120.

Gill, S.C. and von Hippel, P.H. (1989). Calculation of Protein Extinction Coefficients from Amino Acid Sequence Data. Analytical Biochemistry 182, 319-326.

Gish, W. and States, D.J. (1993). Identification of protein coding regions by database similarity search. Nature Genetics 3, 266-272.

Grantham, R., Gautier, C., Guoy, M., Jacobzone, M., and Mercier R. (1981). Codon Catalog Usage Is a Genome Strategy Modulated for Gene Expressivity. Nucleic Acids Research 9(1), r43-r74.

Gribskov, M., Burgess, R.R., and Devereux, J. (1986). PEPPLOT, A Protein Secondary Structure Analysis Program for the UWGCG Sequence Analysis Software Package. Nucleic Acids Research 14(1), 327-334.

Gribskov, M., Devereux, J., and Burgess, R.R. (1984). The Codon Preference Plot: Graphic Analysis of Protein Coding Sequences and Prediction of Gene Expression. Nucleic Acids Research 12, 539-549.

Gribskov, M., McLachlan, M., and Eisenberg, D. (1987). Profile Analysis: Detection of Distantly Related Proteins. Proceedings of the National Academy of Sciences USA 84, 4355-4358.

Gribskov, M., Homyak, M., Edenfield, J., and Eisenberg, D. (1988). Profile Scanning for Three-Dimensional Structural Patterns in Protein Sequences. Computer Applications in the Biosciences 4, 61-66.

Gribskov, M. and Eisenberg, D. (1989). Detection of Protein Structural Features With Profile Analysis. In Techniques in Protein Chemistry, (T.E. Hugli, ed.), Academic Press, San Diego, California, USA, 108-117.

Hancock, J. M. and Armstrong, J. S. (1994). SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Compututer Applications in the Biosciences 10, 67-70.

Henikoff, S. and Henikoff, J.G. (1992). Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences USA89, 10915-10919.

Higgins, D.G. and Sharp, P.M. (1988). CLUSTAL: A Package for Performing Multiple Sequence Alignment on a Microcomputer. Gene 73, 237-244.

Higgins, D.G. and Sharp, P.M (1989). Fast and Sensitive Multiple Sequence Alignments on a Microcomputer. Computer Applications in the Biosciences 5, 151-153.

Hillier, L. and Green, P. (1991). OSP: A Computer Program for Choosing PCR and DNA Sequencing Primers. PCR Methods and Applications 1, 124-128.

Hogeweg, P. and Hesper, B. (1984). Energy Directed Folding of RNA Sequences. Nucleic Acids Research 12, 67-74.

IUPAC-IUB Commission on Biological Nomenclature (1966), Journal of Biological Chemistry 241, 2491.

IUPAC-IUB Commission on Biological Nomenclature (1968), Journal of Biological Chemistry 243, 3557.

Jaeger, J.A., Turner, D.H., and Zuker, M. (1989). Improved Predictions of Secondary Structures for RNA. Proceedings of the National Academy of Sciences USA 86, 7706-7710.

Jaeger, J.A., Turner, D.H., and Zuker, M. (1990). Predicting Optimal and Suboptimal Secondary Structures for RNA. In Methods in Enzymology, (R.F. Doolittle, ed.), 183, 281-306, Academic Press, San Diego, California, USA.

Jin, L. and Nei, M. (1990). Limitations of the Evolutionary Parsimony Method of Phylogenetic Analysis. Molecular Biology and Evolution 7, 82-102.

Jukes, T.H. and Cantor, C.R. (1969). Evolution of Protein Molecules. In Mammalian Protein Metabolism, (H.N. Munro, ed.), vol. III, 21-132, Academic Press, San Diego, California, USA.

Karlin, S. and Altschul, S.F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences USA 87, 2264-2268.

Karlin, S. and Altschul, S.F. (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proceedings of the National Academy of Sciences USA 90, 5873-5877.

Karplus, P.A. and Schulz, G.E. (1985). Prediction of Chain Flexibility in Proteins. Naturwissenschaften 72, 212-213.

Kernighan, B.W. and Plauger, P.J. (1976). Software Tools. Addison-Wesley Publishing Company, Reading, Massachusetts.

Kimura, M. (1980). A Simple Method for Estimating Evolutionary Rates of Base Substitutions Through Comparative Studies of Nucleotide Sequences. Journal of Molecular Evolution 16, 111-120.

Kimura, M. (1983). The Neutral Theory of Molecular Evolution, Cambridge University Press, Cambridge, UK.

Kyte, J. and Doolittle, R.F. (1982). A Simple Method for Displaying the Hydropathic Character of a Protein. Journal of Molecular Biology 157, 105-132.

Li, Wen-Hsiung (1993). Unbiased Estimation of the Rates of Synonymous and Nonsynonymous Substitution. Journal of Molecular Evolution 36, 96-99.

Lipman, D.J. and Pearson, W.R. (1985). Rapid and Sensitive Protein Similarity Searches. Science 227, 1435-1441.

Lupas, A. (1996). Prediction and Analysis of Coiled-Coil Structures. In Methods in Enzymology, (R.F. Doolittle, ed.), 266, 513-525, Academic Press, San Diego, California, USA.

Lupas, A., Van Dyke, M., and Stock, J. (1991). Predicting Coiled Coils from Protein Sequences. Science 252, 1162-1164.

Maddison, D.R., Swofford, D.W., and Maddison, W.P. (1997). NEXUS: an extensible file format for systematic information. Systematic Biology 46, 590-621.

Maizel, J.V. and Lenk, R.P. (1981). Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. Proceedings of the National Academy of Sciences USA 78, 7665-7669.

McGeoch, D. (1985). On the Predictive Recognition of Signal Peptide Sequences. Virus Research 3, 271-286.

Moller, S., Croning, M.D.R., and Apweiler, R. (2001). Evaluation of Methods for the prediction of membrane spanning regions. Bioinformatics, 17(7), 646-653.

Mount, S.M. (1982). A Catalogue of Splice Junction Sequences. Nucleic Acids Research 10(2), 459-472.

Needleman, S.B. and Wunsch, C.D. (1970). A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins. Journal of Molecular Biology 48, 443-453.

Nielsen, H., Engelbrecht, G.J., Brunak, S. and von Heijne (1997). Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering 10(1), 1-6.

Nomenclature Committee of the International Union of Biochemistry (NC-IUB) (1985). Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences. European Journal of Biochemistry 150, 1-5.

Nussinov, R., Pieczenik, G., Griggs, J.R., and Kleitman, D.J. (1978). Algorithms for Loop Matchings. SIAM Journal on Applied Mathematics35, 68-82.

Osterburg, G. and Sommer, R. (1981). Computer Support of DNA Sequence Analysis. Computer Programs in Biomedicine13, 101-109.

Pamilo, P, and Bianchi, N.O. (1993). Evolution of the Zfx and Zfy Genes: Rates and Interdependence between the Genes. Molecular Biology and Evolution 10, 271-281.

Pearson, W.R. and Lipman, D.J. (1988). Improved Tools for Biological Sequence Analysis. Proceedings of the National Academy of Sciences USA85, 2444-2448.

Pearson, W.R. (1990). Rapid and Sensitive Sequence Comparison with FASTP and FASTA. In Methods in Enzymology , (R.F. Doolittle, ed.), 183, 63-98, Academic Press, San Diego, California, USA.

Pearson, W.R. (1995). Comparison of Methods for Searching Protein Sequence Databases. Protein Science 4, 1145-1160.

Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., and Dodgson, J. (1980). The Evolution of Genes: The Chicken Preproinsulin Gene. Cell 20, 555-566.

Rechid, R., Vingron, M., and Argos, P. (1989). A New Interactive Protein Sequence Alignment Program and Comparison of its Results with Widely Used Algorithms. Computer Applications in the Biosciences 5, 107-113.

Rychlik, W. and Rhoads, R.E. (1990). Optimization of the Annealing Temperature for DNA Amplification in virtro . Nucleic Acids Research 18, 6409-6412.

Saitou, N. and Nei, M. (1987). The Neighbor-joining Method: A New Method for Reconstructing Phylogenetic Trees. Molecular Biology and Evolution 4, 406-425.

Sankoff, D. and Kruskal, J.B. (1983). Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison , Addison-Wesley, Reading, MA, USA.

SantaLucia, J.Jr. (1998). A Unified View of Polymer, Dumbbell, and Oligonucleotide DNA Nearest-Neighbor Thermodynamics. Proceedings of the National Academy of Sciences USA 95, 1460-1465.

SantaLucia, J.Jr. and Allawi, H.T. (1997). Thermodynamics and NMR of Internal G-T Mismatches in DNA. Biochemistry 36, 10581-10594.

Schaffer, A.A., Aravind, L., Madden, T.L., Shavirin, S., Spouge, J.L., Wolf, Y.I., Koonin, E.V., and Altschul, S.F. (2001). Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29, 2994-3005.

Schroeder, J.L. and Blattner, F.R. (1982). Formal Description of a DNA Oriented Computer Language. Nucleic Acids Research 10, 69-84, Figure 1.

Schwartz, R.M. and Dayhoff, M.O. (1979). Matrices for Detecting Distant Relationships. In Atlas of Protein Sequences and Structure , (M.O. Dayhoff, ed.), 5, Suppl. 3, 353-358, National Biomedical Research Foundation, Washington, D.C., USA.

Sellers, P.H. (1974). On the Theory and Computation of Evolutionary Distances. SIAM Journal on Applied Mathematics26, 787-793.

Slightom, J.L., Bock, J.H., Siemieniak, D.R., Hurst, G.D. and Beattie, K.L. (1994). Nucleotide Sequencing Double-Stranded Plasmids with Primers Selected from a Nonamer Library. BioTechniques 17, 536-544.

Smith, S.W., Overbeek, R., Woese, C.R., Gilbert, W., and Gillevet, P. (1994). The Genetic Data Environment An Expandable GUI for Multiple Sequence Analysis. Computer Applications in the Biosciences 10(6), 671-675.

Smith, T.F. and Waterman, M.S. (1981). Comparison of Bio-Sequences. Advances in Applied Mathematics 2, 482-489.

Smith, T.F., Waterman, M.S., and Sadler, J.R. (1983). Statistical Characterization of Nucleic Acid Sequence Functional Domains. Nucleic Acids Research 11, 2205-2220.

Smithies, O., Engels, W.R., Devereux, J.R., Slightom, J.L., and Shen, S. (1981). Base Substitutions, Length Differences and DNA Strand Asymmetries in the Human G-Gamma and A-Gamma Fetal Globin Gene Region. Cell 26, 345-353.

Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy. The Principles and Practice of Numerical Classification , W.H. Freeman and Company, San Francisco, California, USA.

Sonnhammer, E.L., Eddy, S.R., and Durbin, R. (1997). Pfam: A comprehensive database of protein families based on seed alignments. Proteins, 28, 405-420.

Sonnhammer E.L., von Heijne G., and Krogh A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology 6, 175-182.

Staden, R. (1980). A New Computer Method for the Storage and Manipulation of DNA Gel Reading Data. Nucleic Acids Research 8, 3673-3694.

Staden, R. (1986). The current status and portability of our sequence handling software. Nucleic Acids Research14, 217-231.

States, D.J. and Gish, W. (1993). Combined use of sequence similarity and codon bias for coding region identification. J. Comput. Biology 1(1), 39-50.

States, D.J., Gish, W., and Altschul, S. F. (1991). Improved sensitivity of nucleic acid database similarity searches using application specific scoring matrices. Methods: A companion to Methods in Enzymology 3, 66-70.

Studier, J.A. and Keppler, K.J. (1988). A Note on the Neighbor-joining Algorithm of Saitou and Nei. Molecular Biology and Evolution 5, 729-731.

Swofford, D.L. and Olsen, G.J. (1990). Phylogeny Reconstruction. In Molecular Systematics, (D.M. Hillis and C. Moritz, eds.), Chap. 11, 411-501, Sinauer Associates, Inc, Sunderland, Massachusetts, USA.

Tajima, F. and Nei, M. (1984). Estimation of Evolutionary Distance between Nucleotide Sequences. Molecular Biology and Evolution 1, 269-285.

Tatusov, R. L. and Lipman, D. J., The DUST Program, unpublished.

Tufte, E.R. (1983). The Visual Display of Quantitative Information. Graphics Press, Cheshire, Connecticut, USA.

Turner, D.H., Sugimoto, N., Jaeger, J.A., Longfellow, C.E., Freier, S.M. and Kierzek, R. (1987). Improved Parameters for Prediction of RNA Structure. Cold Spring Harbor Symp. Quant. Biol. 52, 123-133.

Turner, D.H., Sugimoto, N., and Freier, S.M. (1989). RNA Structure Prediction. Annu. Rev. Biophys. Biophys. Chem.17, 167-192.

Uchiyama, H. and Weisblum, B. (1985). N-Methyl-transferase of Streptomyces erythraeus that confers resistance to the macrolide-lincosamide-streptogramin B antibiotics: amino acid sequence and its homology to cognate R-factor enzymes for pathogenic bacilli and cocci. Gene 38, 103-110.

Unger, R., Harel, D., and Sussman J.L. (1986). DNAMAT: An Efficient Graphic Matrix Sequence Homology Algorithm and Its Application to Structural Analysis. Computer Applications in the Biosciences 2, 283-289.

von Heijne, G. (1987). Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit, Academic Press, Inc., San Diego, California, USA.

Wilbur, W.J. and Lipman, D.J. (1983). Rapid Similarity Searches of Nucleic Acid and Protein Data Banks. Proceedings of the National Academy of Sciences USA 80, 726-730.

Wilbur, W.J. and Lipman, D.J. (1984). The Context Dependent Comparison of Biological Sequences. SIAM Journal on Applied Mathematics44, 557-567.

Wisconsin Package, Version 10.0 (1999). Madison, WI, Accelrys, January 1999.

Witkiewicz, Halina, Bolander, Mark E., Edwards, Dylan R. (1993). Improved Design of Riboprobes from pBluescript and Related Vectors for In Situ Hybridization. BioTechniques 14, 458-463.

Wolf, H., Modrow, S., Motz, M., Jameson, B., Hermann, G., and Fortsch, B. (1987). An Integrated Family of Amino Acid Sequence Analysis Programs. Computer Applications in the Biosciences 4(1), 187-191.

Wootton, J.C. and Federhen, S. (1993). Statistics of local complexity in amino acid sequences and sequence databases. Computers in Chemistry 17, 149-163.

Wootton, J. C. and Federhen, S. (1996). Analysis of compositionally biased regions in sequence databases. Methods in Enzymology 266, 554-571.

Zheng, Z., Schwartz, S., Wagner, L., and Miller, W. (2000). A Greedy Algorithm for Aligning DNA Sequences. Journal of Computational Biology 7, 203-214.

Zuker, M. (1989). Computer Prediction of RNA Structure. In Methods in Enzymology, (J.E. Dahlberg and J.N. Abelson, eds.), 180, 262-288, Academic Press, San Diego, California, USA.

Zuker, M. (1989). On Finding All Suboptimal Foldings of an RNA Molecule. Science 244, 48-52.

Zuker, M. and Stiegler, P. (1981). Optimal Computer Folding of Large RNA Sequences Using Thermodynamics and Auxiliary Information. Nucleic Acids Research 9, 133-148.

Printed: November 30, 2004 12:34 (1162)

Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.