Return to Home Page

"Newick's 8:45" Tree Format Standard

Interpretation by Gary Olsen

Revision History:

Aug. 30, 1990: My interpretation from discussions and a copy of "Committee" notes.
Oct. 4, 1991: Revised to reflect discussions with Joseph Felsenstein, David Madison and David Swofford at 1991 Woods Hole MBL Molecular Evolution Workshop.
Jan. 24, 1992: Text revised.
Jan. 20, 1994: Revised to reflect discussions with David Swofford regarding quotation marks in comments (they will have no special meaning; thus, [Newick's 8:45 Tree Standard] is a legal comment).
Aug. 23, 1994: Text revised.
Oct. 16, 2003: Branch length in "Printer Plot" of tree example fixed to match value (thanks to Al Gernon). Minor text revision. HTML version of this document produced.

Conventions Used in Syntax Diagram:

   Items in { } may appear zero or more times.
   Items in [ ] are optional, they may appear once or not at all.
   All other punctuation marks (colon, semicolon, parentheses, comma and
         single quote) are required parts of the format.

Rough Syntax Diagram:

              tree ==> descendant_list [ root_label ] [ : branch_length ] ;

   descendant_list ==> ( subtree { , subtree } )

           subtree ==> descendant_list [internal_node_label] [: branch_length]
                   ==> leaf_label [: branch_length]

            root_label ==> label
   internal_node_label ==> label
            leaf_label ==> label

                 label ==> unquoted_label
                       ==> quoted_label

        unquoted_label ==> string_of_printing_characters
          quoted_label ==> ' string_of_printing_characters '

         branch_length ==> signed_number
                       ==> unsigned_number

Notes:

Unquoted labels may not contain blanks, parentheses, square brackets, single_quotes, colons, semicolons, or commas.

Underscore characters in unquoted labels are converted to blanks.

Single quote characters in a quoted label are represented by two single quotes.

Blanks or tabs may appear anywhere except within unquoted labels or branch_lengths.

Newlines may appear anywhere except within labels or branch_lengths.

Comments are enclosed in square brackets and may appear anywhere newlines are permitted.

Other notes:

PAUP (David Swofford) allows nesting of comments. My software supports this as well.

TreeAlign (Jotun Hein) writes a root node branch length (with a value of 0.0). Most other software (including my own) seems to as well.

PHYLIP (Joseph Felsenstein) requires that an unrooted tree begin with a trifurcation; it will not "uproot" a rooted tree.

Example of rooted tree:

   (((One:0.2,Two:0.3):0.3,(Three:0.5,Four:0.3):0.2):0.3,Five:0.7):0.0;

           +-+ One
        +--+
        |  +--+ Two
     +--+
     |  | +----+ Three
     |  +-+
     |    +--+ Four
     +
     +------+ Five

Addendum (October 4, 1991):

At the 1991 Woods Hole Marine Biology Laboratory Molecular Evolution Course, the following special comments were defined (by Joseph Felsenstein, David Madison, Gary Olsen and David Swofford):
      [&rooted]
      [&unrooted]
One of these two comments may precede a tree to define whether it is meant to be read as a rooted or unrooted tree. The default treatment, when neither of these comments is present, may be context and/or application specific.
      [&&ApplicationID: Application_specific_comments ]
This form permits users of the Newick 8:45 format to tag comments that are meant to be machine readable by specific programs. There is no registration of IDs, though it is expected that users of this convention will choose sufficiently descriptive IDs that coincidental conflicts are unlikely.

Other forms of comments beginning with "[&" are reserved to the "Standard".

It was also decided that names embedded within single quotes can contain any printable character and the space character. If a name is quoted, this must be done in its entirety. All compliant programs must be able to handle names of at least eight characters.

Addendum (January 20, 1994):

In response to discussions with David Swofford, quotation marks in comments will have no special meaning. Thus,
      [Newick's 8:45 Tree Standard]
is a legal comment. On the other hand,
      [('B. subtilis':0.1, 'E. coli rrnB]':0.2):0.3]
is not legal because the square bracket in the quotation marks ends the comment. Because comments can be nested, the following would be a legal comment:
      [('B. subtilis':0.1, 'E. coli [rrnB]':0.2):0.3]
Return to Home Page

Page written and maintained by Gary J. Olsen (gjoillinois.edu)