PUBLISH

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]

 

Table of Contents

FUNCTION

DESCRIPTION

EXAMPLE

OUTPUT

RELATED PROGRAMS

CONSIDERATIONS

NUMBERING

RESTRICTIONS

LINE FORMATTING

LINE DESCRIPTIONS

COMMAND-LINE SUMMARY

LOCAL DATA FILES

PARAMETER REFERENCE


FUNCTION

[ Top | Next ]

Publish arranges sequences for publication. It creates a text file that you can modify to your own needs with a text editor.

DESCRIPTION

[ Previous | Top | Next ]

Publish creates a text file that you can customize for publication. You can choose lines that represent the sequence, its complement, a decimal scale, a three-letter translation, a one-letter translation, a numbering line with numbers at every twentieth symbol, a blank line, or a tagged line. Additional sequences can be shown either completely or with only the differences marked. A match line between any two sequence lines marks the matches. The lines can appear in any order. Publish can number each line starting from any number you choose. The line is numbered at both ends if you select the line from the menu by typing its menu letter in uppercase. Each type of line is described in detail below. The output can be blocked using two parameters: the block size, and the number of blocks per line. The ranges of translation must be chosen by you for each translation line selected. You can translate as many non-overlapping ranges as you want for each translation line you select. If you have overlapping ranges of translation, you must select two translation lines and set the ranges appropriately.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using Publish to generate the figure below. Before the session, Assemble was used to extract the two gene sequences ggamma.seq and agamma.seq from gamma.seq. Then BestFit was run with the command-line parameters -OUT and -LIMit to generate the aligned sequences ggamma.gap and agamma.gap.

 
 
     % publish
 
      PUBLISH what sequence ?  ggamma.gap
 
                       Begin (* 1 *) ?
                     End (*  1700 *) ?
 
      Please select the lines in the order in which
      you want them to appear in the figure.
 
          a) number line        :                10
          b) dot scale line     :                 .         .
          c) the sequence itself:        GAATTCACGATCGATCGTAG
          d) dash scale line    :     1  ---------+---------+   20
          e) the complement     :        CTTAAGTGCTAGCTAGCATC
          f) translation        :        GluPheThrIleAspArg
          g) translation        :        E  F  T  I  D  R
          h) tagged blank line  :   ###
          i) blank line         :
          j) 2nd sequence (diff):                C     G
          k) 2nd sequence (all) :        GAATTCACCATCGAGCGTAG
          l) match line         :        |||||||| ||||| |||||
 
      Select the lines in the order you wish them to appear
      and then press <Return>.  Use uppercase to identify the
      lines that you want numbered at the ends.
 
                           (* cDefii *) fCji
 
      What number is the first symbol in the row 2 (* 1 *) ?  2101
 
      Please enter the ranges of translation for translation line
      number 1 using the original coordinates of the sequence.
 
                       Begin (*    1 *) ?  79
                         End (* 1700 *) ?  170
 
      Get another range from this sequence (* Yes *) ?
 
                       Begin (*  170 *) ?  294
                         End (* 1700 *) ?  516
 
      Get another range from this sequence (* Yes *) ?
 
                       Begin (*  516 *) ?  1402
                         End (* 1700 *) ?  1530
 
      Get another range from this sequence (* Yes *) ?  n
 
      What is the sequence for line 3 ?  agamma.gap
 
                       Begin (* 1 *) ?
                     End (*  1700 *) ?
 
      Put this sequence range at what start (* 1 *) ?
 
      How many symbols per block (* 60 *) ?  100
 
      How many blocks do you want on each line (* 1 *) ?
 
      What should I call the output file (* ggamma.publish *) ?
 
    %

OUTPUT

[ Previous | Top | Next ]

The output from this session was the basis for the figure at the end of this entry in the Program Manual. The output file was modified by removing the left-hand numbers and adding the labels GGamma and AGamma. The gap characters were changed from periods to asterisks. The amino acid Arg, which spans the first intervening sequence, was added.

RELATED PROGRAMS

[ Previous | Top | Next ]

Red is a text formatter that creates publication-quality documents on a PostScript printer such as the Apple LaserWriter. You can use 13 different fonts, scaling each font to any size. You can also include figures and graphics from any Accelrys GCG (GCG) graphics program within the text of the document.

CONSIDERATIONS

[ Previous | Top | Next ]

Publish lets you choose any repeating set of lines numbered from any offset. There are many ways to arrange the data into groups that look absurd if you use the scaling or numbering lines that are designed to identify every tenth symbol. You should think about whether the starting offset and the blocking are compatible with the scaling and numbering lines you have chosen. Publish is very flexible; it writes a "sensible" output for any combination of lines, offset, and blocking you choose, but the output you choose may look ridiculous.

NUMBERING

[ Previous | Top | Next ]

Publish asks you for the number of the first symbol in every line that is numbered. You must also specify this offset for the scaling lines to establish the decimal intervals appropriately. The default is the beginning coordinate of the primary sequence.

You can use the command-line parameter -SKIPzero to make Publish skip the zeroth base if your numbering starts with a negative number.

RESTRICTIONS

[ Previous | Top | Next ]

You may not select more than 20 lines. Publish will not translate interrupted coding sequences correctly if there is a split codon at the boundary. You should translate ranges that do not include the split codon and add the missing amino acid when you edit the output file.

LINE FORMATTING

[ Previous | Top | Next ]

Each line can be printed out as a single block or broken into as many blocks as you wish. However, there may be no more than 150 characters in each line. Each block requires a space to separate it from the block following, so a line with ten blocks of ten symbols each would require 110 characters and would fall within the 150-character limit. There is a nine-column blank space at the left end of each line in which you may add labels for the line.

LINE DESCRIPTIONS

[ Previous | Top | Next ]

Each line from the menu can be selected as many times as you choose. In effect, you are selecting a repeat unit for the figure. Every line selected with a capital letter is numbered at each end. Each type of line is described in detail below. You should think of the figure as being based on a primary sequence that Publish identifies first. In the program's prompts, translation ranges and the overlay positions for the difference lines are in terms of the original coordinates of the primary sequence. The figure is only as long as the range chosen from the primary sequence.

Number Line

creates numbers from your offset that number every twentieth symbol. If your offset is -45, the sixth base is -40 and is numbered as such. If a number spans a block division, then the number is also divided. If you chose, for instance, blocks of five for this example, the number would appear as -4 5. The number line will not look good if you do not choose decimal blocking and adjust your offset so that the numbers align with the ends of each block.

Dot Scale Line

puts periods at every tenth position starting from the first symbol that is evenly divisible by ten after the offset.

Sequence Line

prints the range of the sequence that was chosen in the blocking format that was chosen.

Dash Scale Line

makes a scale line with dashes (-) at every symbol position. Every position that is evenly divisible by ten is marked with a plus sign (+).

Sequence Complement

presents the nucleotide complement for each symbol in the original sequence.

Translation (Three-Letter)

makes a translation of all the non-overlapping ranges you choose from the original sequence. You are prompted for each range. If no translation range is covered for any particular output line, then no line is printed and no space is left. If you wish to translate overlapping ranges, you must select two translation lines. The ranges of translation are specified in the coordinates of the original sequence as you would see them in the original sequence file.

Translation (One-Letter)

works exactly like the three-letter translation except that the single-letter amino acid symbols are used.

Tagged Blank Line

leaves a blank line that has the string ### at the beginning. The idea of the tag is to help you substitute other strings for the tag with a text editor. The tagged line is not filled with spaces.

Blank Line

leaves a blank line filled with spaces.

Second Sequence Lines

shows all of, or just the differences between, other sequences and the primary sequence (shown in the sequence line). The difference line prints a blank in all of the columns where the new sequence is the same as the primary one. You can put a character other than a blank in the columns where the second sequence is the same as the primary sequence using the -AGReement parameter. When the sequences are not in agreement, the character from the second sequence is shown. For each second sequence line you have to name a new sequence file, a range of interest within it, and the number of the base in the primary sequence where you want this new sequence to start. As in the translation lines, the number for the start of this overlay is in the coordinates of the original sequence. Publish does not attempt to make an optimal alignment of the second sequences with the primary sequence.

Match Line

puts a pipe character (|) in every column where the symbol in the row above matches the symbol in the row below (upper- and lowercase are equivalent). There is no requirement for the lines above or below the match line to be legitimate sequence lines. If you put a match line in the top or bottom row, it is left completely blank. Note that a space above and below would generate a match character in the match line.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be added to the command line. Use -CHEck to view the summary below and to specify parameters before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional.

PUBLISH does not support complete command line control.
 
Minimal Syntax: % publish [-INfile1=]gamma.seq
 
Local Data Files:
 
-TRANSlate=translate.txt   contains the genetic code
 
Optional Parameters:
 
-BEGin=2101 -END=2600      sets the range of interest
-RNA                       makes complementary sequences look like RNA
                             instead of DNA
-AGReement=.               places period at positions where second sequence
                             agrees with primary sequence
-SKIpzero                  skips the zero position on rows whose first symbol
                             is negative
[-OUTfile=]gamma.publish   names the output file
 

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Section 4, Using Data Files in the User's Guide.

The translation of codons to amino acids, the identification of potential start codons and stop codons, and the mappings of one-letter to three-letter amino acid codes are all defined in a translation table in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.

PARAMETER REFERENCE

[ Previous | Top ]

You can set the parameters listed below from the command line.

-RNA

Sets Publish to make U complement A and u complement a. This makes complementary sequences look like RNA instead of DNA.

-AGReement=.

When you select a second sequence line that shows only differences, Publish puts a space character wherever the second sequence is the same as (agrees with) the primary sequence. This parameter lets you set that agreement character to some other character -- a period in this example.

-SKIpzero

Skips the zeroth character in numbering. Geneticists seem to have a propensity to number sequences with a numbering system that has no zero in it. This parameter is a concession to them.

We recommend that gene sequences be numbered so that the first base of the first codon is numbered zero. Each publication that numbers a sequence without using zero embeds a convention in genetics that is completely inconsistent with the data processing needs of the field. It is inconceivable that a zero-free standard for the coordination of genetic sequences will be adopted.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

Printed: May 27, 2005 14:20 


 
 
                                                                                       MetGlyHisPheThrGluGluA
Ggamma   AGGAAGCACCCTTCAGCAGTTCCACACACTCGCTTCTGGAACGTCTGAGGTTATCAATAAGCTCCTAGTCCAGACGCCATGGGTCATTTCACAGAGGAGG   2200
Agamma                                                    A
 
         spLysAlaThrIleThrSerLeuTrpGlyLysValAsnValGluAspAlaGlyGlyGluThrLeuGlyAr
Ggamma   ACAAGGCTACTATCACAAGCCTGTGGGGCAAGGTGAATGTGGAAGATGCTGGAGGAGAAACCCTGGGAAGGTAGGCTCTGGTGACCAGGACAAGGGAGGG   2300
Agamma
 
                                                                                                     gLeuLeuV
Ggamma   AAGGAAGGACCCTGTGCCTGGCAAAAGTCCAGGTCGCTTCTCAGGATTTGTGGCACCTTCTGACTGTCAAACTGTTCTTGTCAATCTCACAGGCTCCTGG   2400
Agamma
 
         alValTyrProTrpThrGlnArgPhePheAspSerPheGlyAsnLeuSerSerAlaSerAlaIleMetGlyAsnProLysValLysAlaHisGlyLysLy
Ggamma   TTGTCTACCCATGGACCCAGAGGTTCTTTGACAGCTTTGGCAACCTGTCCTCTGCCTCTGCCATCATGGGCAACCCCAAAGTCAAGGCACATGGCAAGAA   2500
Agamma
 
         sValLeuThrSerLeuGlyAspAlaIleLysHisLeuAspAspLeuLysGlyThrPheAlaGlnLeuSerGluLeuHisCysAspLysLeuHisValAsp
Ggamma   GGTGCTGACTTCCTTGGGAGATGCCATAAAGCACCTGGATGATCTCAAGGGCACCTTTGCCCAGCTGAGTGAACTGCACTGTGACAAGCTGCATGTGGAT   2600
Agamma
 
         ProGluAsnPheLys
Ggamma   CCTGAGAACTTCAAGGTGAGTCCAGGAGATGTTTCAGCACTGTTGCCTTTAGTCTCGAGGCAACTTAGACAACTGAGTATTGATCTGAGCACAGCAGGGT   2700
Agamma
 
Ggamma   GTGAGCTGTTTGAAGATACTGGGGTTGGGAGTGAAGAAACTGCAGAGGACTAACTGGGCTGAGACCCAGTGGCAATGTTTTAGGGCCTAAGGAGTGCCTC   2800
Agamma
 
Ggamma   TGAAAATCTAGATGGACAACTTTGACTTTGAGAAAAGAGAGGTGGAAATGAGGAAAATGACTTTTCTTTATTAGATTTCGGTAGAAAGAACTTTCACCTT   2900
Agamma
 
Ggamma   TCCCCTATTTTTGTTATTCGTTTTAAAACATCTATCTGGAGGCAGGACAAGTATGGTCGTTAAAAAGATGCAGGCAGAAGGCATATATTGGCTCAGTCAA   3000
Agamma
 
Ggamma   AGTGGGGAACTTTGGTGGCCAAACATACATTGCTAAGGCTATTCCTATATCAGCTGGACACATATAAAATGCTGCTAATGCTTCATTACAAACTTATATC   3100
Agamma
 
Ggamma   CTTTAATTCCAGATGGGGGCAAAGTATGTCCAGGGGTGAGGAACAATTGAAACATTTGGGCTGGAGTAGATTTTGAAAGTCAGCTCTGTGTGTGTGTGTG   3200
Agamma
 
Ggamma   TGTGTGTGCGCGCGTGTGTTTGTGTGTGTGTGAGAGCGTGTGTTTCTTTTAACGTTTTCAGCCTACAGCATACAGGGTTCATGGTGGCAAGAAGATAACA   3300
Agamma   ********************            TC                     C           A                   G         G
 
Ggamma   AGATTTAAATTATGGCCAGTGACTAGTGCTGCAAGAAGAACAACTACCTGCATTTAATGGGAAAGCAAAATCTCAGGCTTTGAGGGAAGTTAACATAGGC   3400
Agamma                                 TG   GG                          G
 
Ggamma   TTGATTCTGGGTGGAAGCTTGGTGTGTAGTTATCTGGAGGCCAGGCTGGAGCTCTCAGCTCACTATGGGTTCATCTTTATTGTCTCCTTTCATCTCAACA   3500
Agamma                      G
 
          LeuLeuGlyAsnValLeuValThrValLeuAlaIleHisPheGlyLysGluPheThrProGluValGlnAlaSerTrpGlnLysMetValThrGlyVal
Ggamma   GCTCCTGGGAAATGTGCTGGTGACCGTTTTGGCAATCCATTTCGGCAAAGAATTCACCCCTGAGGTGCAGGCTTCCTGGCAGAAGATGGTGACTGGAGTG   3600
Agamma                                                                                                  C
 
         AlaSerAlaLeuSerSerArgTyrHisEnd
Ggamma   GCCAGTGCCCTGTCCTCCAGATACCACTGAGCTCACTGCCCATGATGCAGAGCTTTCAAGGATAGGCTTTATTCTGCAAGCAATACAAATAATAAATCTA   3700
Agamma                                   CTCT          T
 
Ggamma   TTCTGCTAAGAGATCACACATGGTTGTCTTCAGTTCTTTTTTTTATGTCTTTTTAAATATATGAGCCACAAAGGGTTTTATGTTGAGGGATGTGTTTATG   3800
Agamma          G              A  T       C           CA                            *     A        A    G

[Genhelp | Program Manual | User's Guide | Data Files | Databases | Release Notes ]


Technical Support: support-us@accelrys.com, support-japan@accelrys.com,
or support-eu@accelrys.com

Copyright (c) 1982-2005 Accelrys Inc. All rights reserved.

Licenses and Trademarks: Discovery Studio ®, SeqLab ®, SeqWeb ®, SeqMerge ®, GCG ® and, the GCG logo are registered trademarks of Accelrys Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

www.accelrys.com/bio