Skip to Main Content

BSCI 1511L Statistics Manual: Phylogenetic Trees

Introduction to Biological Sciences lab, second semester

Phylogenetic Trees

Inferred phylogenetic relationships can be described in simply mathematical terms such as matrices.  However, it is easiest for humans to understand the relationships if the phylogeny is represented in the form of a tree.  Organisms that have the most similar collection of characters are clustered together on the same branch of the tree.  A point which connects branches of the tree is called a node.  In many cases, the mathematical algorithm which generates the tree simply groups and connects the taxa without regard to the time order in which groups separated from each other evolutionarily.  Such a network is called an unrooted tree (Fig. 3). 

 

Fig. 3. Example of unrooted phylogenetic tree containing taxa of insects. 

 

In Fig. 3, five of the taxa are types of beetles.  The sixth group, Neuroptera, is a class of insects that are not beetles.  If one assumes that this non-beetle group diverged evolutionarily from the ancestors of all beetle taxa before the beetle groups themselves diversified, then the non-beetle group (which represents an "outgroup") can be used to "root" the tree.  Imagine that the lines connecting the nodes are like rubber bands that could be stretched and bent in any direction.  If one pulled on the line from Neuroptera so that it faced the left side of the page and pulled all of the other taxa toward the right side of the page, one would get a rooted tree like Fig. 4. 

 

Fig 4. Example of phylogenetic tree rooted by an outgroup (Neuroptera).

 

We can now place additional meaning on the structure of the rooted tree beyond simply implying which groups are most similar to each other.  In a rooted tree, the direction from left to right is assumed to represent a relative time scale.  Branching from nodes to the right of a line on the tree is assumed to have happened at a later time than branching to the left of that line.  For example, in Fig. 4, one assumes that Scarabaeidae diverged from the common ancestor of Cantharidae and Lampyridae at a time before Cantharidae and Lampyridae themselves diverged.  However, one cannot necessarily infer the sequence of events in separate branches (e.g. whether Carabidae and Cicindelidae diverged before the divergence of Cantharidae and Lampyridae or not). 

 

A portion of the tree which contains all of the taxa to the right of a particular node is called a clade.  For example Cantharidae, Lampyridae, and Scarabaeidae form a clade.  A named group of organisms which contains all of the organisms in a single clade is called monophyletic

 

It is possible to make the length of a branch proportional to the sequence difference between the nodes to the left and right of that particular branch.  If one assumes that sequences accumulate differences at a constant rate (called the assumption of a molecular clock), then these differences also represent the time between the divergences represented by the nodes.  If it is possible to calibrate the molecular clock using fossil or geological evidence, then one can place an actual linear time scale on the horizontal axis of the tree and infer when taxa diverged from their most recent common ancestor. 

 

An ideal phylogenetic tree is bifurcating, meaning that each branch always splits into exactly two branches.  This pattern is assumed based on the idea that taxa evolve by divergence from a common ancestor.  When two groups diverge enough to no longer be able to interbreed, they are considered separate taxa.  As a practical matter, most phylogenetic trees have sections showing three or more branches splitting at the same level on the tree.  This indicates that there are not enough data to tell whether one group split off from the common ancestor before the others.  Such multiply branched groups are referred to as "unresolved".  

 

Phylogenetic trees are also constructed under the assumption that once two taxa diverge from their common ancestor, that there is never gene flow from one branch of the tree to another.  However, it is clear that this kind of lateral gene transfer (also known as "horizontal gene transfer") can occur.  For example, two species may diverge because they have become geographically isolated, but yet they may retain the ability to interbreed and form hybrids if they should encounter each other at some point after their divergence.  In other cases, DNA may be transferred laterally by viruses or by phagotrophic organisms which ingest prokaryotes and inadvertently take up their genetic material.  Lateral gene transfer makes it dangerous to infer a phylogeny based on characters from a single gene.  It is better to create a consensus tree topology by examining trees created based on several genes.