Visual exploration of parameter influence on phylogenetic trees.

Visual Analytics for Biological Data

Visual Exploration of Parameter Influence on Phylogenetic Trees Martin Hess, Sebastian Bremm, Stephanie Weissgraeber, Kay Hamacher, Michael Goesele, Josef Wiemeyer, and Tatiana von Landesberger ■ Technische Universität Darmstadt

E

volutionary relationships among a set of organisms are typically represented as phylogenetic trees, in which a leaf node corresponds to a single organism and inner nodes represent the ancestry between them. Researchers infer these trees from similarities in DNA, RNA, or protein sequences by using a tree deduction method (for example, hierarchical clustering). They can calculate similarities between sequences by multiple sequence alignment Evolutionary relationships (MSA) algorithms. The MSA albetween organisms are gorithm, its parameter settings, represented as phylogenetic the evolutionary model, and the trees inferred from multiple tree-deduction method signifisequence alignments (MSAs). cantly affect the resulting trees.1 The proposed approach Owing to this wide variability, finding the optimal calculation clusters trees created with settings and therefore identifymany MSA parameterizations. ing the true phylogenetic trees It displays the chosen are difficult. Moreover, these parameters’ impact on the choices are problem-specific (that phylogenetic trees. This view is, applicable for only one dataoffers interactive parameter set) and must be readdressed in exploration and automatic every study. identification of relevant To deal with this problem, parameters. researchers have developed automatic parameter selection approaches.2 These approaches also estimate the resulting trees’ accuracy, but users have no visual feedback regarding how the selected parameters influenced the trees. Another approach is to create trees for many different parameter settings, followed by interactive visual comparison of the trees and their parameter settings. This also enables domain experts to select the best representative tree. The main challenges with these approaches are 48

g2hes.indd 48

March/April 2014

the large number of compared trees and of calculation parameters and their combinations. Unfortunately, biologists lack tools for such analysis. Moreover, current visual tree-comparison tools don’t take into account the tree calculation parameters. We’ve developed a visual analytics (VA) approach for analyzing how tree construction parameters affect the trees’ similarity for single datasets. By using hierarchical clustering with automatic cluster identification, we can reduce the dataset to a few representative trees. This facilitates efficient, reliable comparison of a very large dataset. We augment the visualization of a set of trees with information on algorithm settings and automatically identify significant parameters. Our approach enables exploration of multidimensional parameters and comparison of thousands of trees. This leads to a better understanding of the generated trees’ quality and reliability, which in turn enables more reliable selection of the true phylogenetic tree. We evaluated our approach’s usability on real datasets in close cooperation with molecular-evolution and biophysics experts.

Our Approach Combining tree comparison with tree construction parameters is a novel concept (see the sidebar). Our approach first groups similar trees to gain a visually comparable data size. We then include construction parameters in the visualization for exploring interdependency between parameter similarity and tree similarity. Our approach has three steps (see Figure 1). First, it calculates the trees’ global pairwise similarity and hierarchically clusters them into a binary tree called a superhierarchy (see Figure 1b).

Published by the IEEE Computer Society

0272-1716/14/$31.00 © 2014 IEEE

2/27/14 7:01 PM

Related Work in Visualization

O

ur research relates to two main areas of visualization and visual analytics (VA): visual analysis of model parameters and visual comparison of trees.

Visual Analysis of Model Parameters When modeling evolutionary relationships between species, analysts must determine optimal models and their optimal parameter settings. Such optimization is often cumbersome because the large set of possible models and the many parameter choices require much human input. So, researchers are developing VA approaches to support interactive exploration and refinement of parameter settings as well as assessment of input–output correspondences in modeling. One group of approaches interactively combines parameter setting and automatic calculation of the output with visualization in several modeling areas. Users can interactively choose the input parameters, examine the result, and refine it iteratively.1 However, these approaches rely on manually setting and refining the parameters. So, they don’t scale for large numbers of parameters. Another group of approaches semiautomatically analyzes the parameter space and calculates outputs for several parameter settings with interactive visualization of the input–output correspondences.2,3 Users review the results and choose the best parameter settings. These approaches focus on multivariate or image input–output data. They currently don’t consider models with trees as outputs.

Visual Comparison of Trees Approaches for visually comparing trees fall into two main groups: comparison of two trees and comparison of more than two trees. These approaches consider only the trees’ structure. They usually combine interactive visualization with algorithmic assessment of tree similarity.4 Approaches that compare two trees show the trees and details of their differences. For example, Danny Holten and Jarke van Wijk drew trees next to each other and linked the matching leaves.5 Tamara Munzner and her colleagues interactively highlighted similar subtrees using leaf set correspondence.6 All such approaches have trouble scaling with the number of trees. Analysis of n trees leads to (n × (n – 1))/2 pairs. Regarding approaches for more than two trees, we focus here on those that don’t need tree ordering. The trees-

Then, it automatically identifies groups of similar trees (clusters) in the superhierarchy using cluster quality criteria (see Figure 1c). Finally, users can visually explore the resulting tree groups (clusters) and tree construction parameters (see Figure 1d).

Superhierarchy Construction The superhierarchy was inspired by Tom Nye’s ap-

of-trees technique shows tree similarity as a meta-tree created by successive union of the underlying trees while minimizing the total distance between tree nodes.7 David Hillis and his colleagues compared multiple trees in a scatterplot; they determined a tree’s position by multidimensional scaling.8 Although these approaches allow analysis of tree similarity, they don’t reveal details about common structures and differences. Sebastian Bremm and his colleagues combined algorithmic assessment of local and global similarity among several trees and offered interactive similarity exploration in multiple linked views.9 However, their approach was limited to a small set of trees (up to 100).

References 1. S. Ingram et al., “DimStiller: Workflows for Dimensional Analysis and Reduction,” Proc. 2010 IEEE Symp. Visual Analytics Science and Technology (VAST 10), 2010, pp. 3–10. 2. A.J. Pretorius et al., “Visualization of Parameter Space for Image Analysis,” IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 12, 2011, pp. 2402–2411. 3. T. Torsney-Weir et al., “Tuner: Principled Parameter Finding for Image Segmentation Algorithms Using Visual Response Surface Exploration,” IEEE Trans. Visualization and Computer Graphics, vol. 17, no. 12, 2011, pp. 1892–1901. 4. M. Graham and J. Kennedy, “A Survey of Multiple Tree Visualisation,” Information Visualization, vol. 9, no. 4, 2009, pp. 235–252. 5. D. Holten and J. van Wijk, “Visual Comparison of Hierarchically Organized Data,” Computer Graphics Forum, vol. 27, no. 3, 2008, pp. 759–766. 6. T. Munzner et al., “TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context with Guaranteed Visibility,” ACM Trans. Graphics, vol. 22, no. 3, 2003, pp. 453–462. 7. T. Nye, “Trees of Trees: An Approach to Comparing Multiple Alternative Phylogenies,” Systematic Biology, vol. 57, no. 5, 2008, p. 785. 8. D. Hillis, T. Heath, and K. John, “Analysis and Visualization of Tree Space,” Systematic Biology, vol. 54, no. 3, 2005, p. 471. 9. S. Bremm et al., “Interactive Visual Comparison of Multiple Trees,” Proc. 2011 IEEE Conf. Visual Analytics Science and Technology (VAST 11), 2011, pp. 31–40.

proach.3 Its leaf nodes correspond to the input trees; the inner nodes represent groups of similar trees. The average similarity of the trees grouped by an inner node is represented by the edge to its child nodes. The shorter the edge, the more similar the grouped trees. The superhierarchy calculation is based on the pairwise similarity matrix of the input trees. We express these similarities by the element-based IEEE Computer Graphics and Applications

g2hes.indd 49

49

2/27/14 7:01 PM


T1

0.8

0.8

T3

0.9

0.9

T4

T2

1.0

T1 (a)

1.0

1.0

T3

1.0

T4

(b)

T3

T2

T4

C1

C2

T1

T2

0.6

0.9

0.9

0.6

+ C1

(c)

C2

C1

C2

(d)

Figure 1. Analyzing parameter influence on phylogenetic trees. (a) A set of input trees. (b) A superhierarchy representing similarities in the data. (c) Identification of clusters (the colored boxes). (d) Parameter exploration and detailed comparison of clusters using representatives.

0.7

0.7

Next step

0.9

0.9

Step 3

Step 2

Step 1

Next step 0.9

0.7 < d No cluster

Step 4

Finish

0.9 0.9 > d Cluster

No children Cluster

(a) 7 6

No. of clusters

5 4 3 2 1 0 0 (b)

0.1

0.2

0.3

0.4 0.5 Threshold

0.6

0.7

0.8

0.9

1.0

Figure 2. Cluster identification and thresholds. (a) Cluster identification based on a threshold d of 0.8. The orange elements represent the currently analyzed nodes and their corresponding values. (b) The number of clusters related to several thresholds. The highest increase in cluster size occurs with d = 0.8 and 0.9 (the red boxes). We select the smaller threshold for better clutter reduction.

similarity measure that Sebastian Bremm and his colleagues introduced.4 It discriminates structural differences between trees better than other methods (for example, a leaf-based score5). We construct the superhierarchy using agglomerative hierarchic clustering with the averagelinkage criterion. We start with the leaf nodes (input trees). We then incrementally connect the two most similar nodes Nx and Ny into a new node Nx,y. The length of the edge between Nx,y and its child nodes Nx and Ny is l = 1 − similarity(Nx, Ny). 50

g2hes.indd 50

We repeat this step until all nodes are connected into a single rooted tree.

Identification of Tree Clusters The clusters we identify serve multiple purposes. First, we can minimize visual clutter by representing groups of trees as single nodes, which reduces the number of nodes shown in the superhierarchy. Moreover, the clusters let us compare similarities of groups of trees and their construction parameters in detail (as we describe later).

March/April 2014

2/27/14 7:01 PM

(a)

(b)

Figure 3. Analysis of the 16S ribosomal RNA dataset with the ClustalW multiple sequence alignment (MSA) algorithm. (a) Comparison of all parameters. The clustering-method parameter (green) significantly influenced the trees in the clusters, whereas the endgaps (yellow) and gap distance penalty (dark blue) parameters had nearly no impact. (b) Comparison of the cluster representatives. The tree on the upper left (in the red box) is the comparison reference. Dark-blue nodes indicate highly conserved subtrees; magenta indicates a subtree’s node distributions in the other trees.

Each subtree rooted in a node in the superhierarchy represents a potential cluster, and its leaf nodes form the cluster elements. By construction (using the average-linkage criterion), subtrees rooted in nodes on the superhierarchy’s higher levels represent clusters with lower average similarity than those on lower levels. We exploit this property to find the real clusters by performing a breadth-first search (BFS) over all the superhierarchy nodes, beginning from the root node and employing a user-specified similarity threshold d. If the edge length (the average similarity) between the currently visited node and its child nodes is greater than d, or the current visited node is a leaf, the node forms a cluster. This method ensures cluster compactness (high tree similarity in a cluster). We remove all nodes in a cluster from the BFS order. Figure 2a shows a cluster identification for d = 0.8. The edge length from the root node (in orange in step 1) to its children is 0.7, which is smaller than 0.8. So, the root node forms no cluster, and we proceed with the next node in BFS order (the orange node in step two). This node fulfills the threshold criterion; the edge lengths to its children (0.9) are greater than d. So, the node forms a cluster as shown in step three (the purple box). The remaining noncluster node in BFS order is the orange node in step 3. This node is a leaf node and forms a cluster (see step 4). Because all nodes have been visited or are part of a cluster, cluster identification ends. To help users select an appropriate d, we implement automatic cluster selection analysis inspired by Stan Salvador and Philip Chan’s research.6 We calculate the number of resulting clusters for a set of thresholds uniformly sampled from the interval

0.0 to 1.0 by a step of 0.05, which users can adjust on demand. Then, we analyze the difference in the number of clusters between succeeding thresholds and select the threshold with the highest cluster increase (see Figure 2b). If several thresholds have the same maximum increase, we select the smallest threshold. This ensures a large reduction in tree size. This criterion can still lead to a large number of clusters, so users can interactively adjust the threshold.

Visual Data Exploration As we mentioned before, this step facilitates visual exploration of trees and their construction parameters. Users can also explore clusters in detail. Visual parameter analysis. Our approach enables ■ ■ ■

direct visual comparison of the parameter values, analysis of their impact on the clusters, and identification of highly characteristic or highly uncharacteristic parameters for the clusters.

Characteristic parameters are those with a narrow range of values. For each parameter, we show its value distribution for all trees in a cluster. We append the corresponding histograms below the visual node representation in the superhierarchy (see Figure 3a). As Figure 4 shows, the first parameter (green) is highly characteristic for both clusters and can serve to discriminate the corresponding trees. In contrast, parameter 3 (yellow) shows nearly the same value distribution in both clusters. This indicates that this parameter has no impact on the tree structures in the two clusters. This parameter can be ignored for future tree construction on this dataset. IEEE Computer Graphics and Applications

g2hes.indd 51

51

2/27/14 7:01 PM


(a)

(b)

(c)

(d)

Figure 4. Analysis of (a) all parameters, (b) a single parameter, (c) characteristic parameters, and (d) similar value distributions.

Interactive parameter analysis. We enhance the visualization of parameters with their automatic analysis and highlighting of the most interesting parameters. This is especially useful for a large number of parameters or clusters. Our approach offers three parameter exploration methods. First, for better comparison of parameter values among clusters, users can focus on one parameter. This parameter is highlighted; other parameters are de-emphasized (see Figure 4b). Second, our approach automatically calculates a parameter’s importance for a cluster by its value distribution entropy. By setting a threshold, users can hide histograms with low information content— uncharacteristic parameters—and thus focus on high-impact parameters (see Figure 4c). Finally, our approach allows interactive identification of clusters with similar parameter values as the user selects a specific parameter (see Figure 4d). Visual exploration of clusters. Because the screen space is limited, we denote each cluster by one representative tree, which massively reduces the items in the comparison. The representative tree has the highest average similarity to all other trees in the cluster. We then show structural similarities and differences among these representatives, together with their similarity information (that is, the hierarchic-clustering result) (see Figure 3b). This builds on Bremm and his colleagues’ approach.4 Users can choose a representative tree (a cluster of interest) to show its similarity with other cluster representatives on the subtree level. We represent the similarity score values by different colors picked from a user-selectable color map. The selected refer52

g2hes.indd 52

ence tree’s nodes are colored according to the average similarity to all other clusters. This enables identification of highly conserved structures in the whole dataset. The nonreference trees’ node colors indicate their similarity to the reference tree, which enables identification of conserved structures with respect to only the reference. Users can interact with the trees by clicking on a subtree, which shows the subtree similarity in all other trees by highlighting the corresponding nodes.

Application To evaluate our approach, we used two biologicalsequence datasets. The first comprised the 16S ribosomal RNA (rRNA) in bacteria. We chose it because 16S rRNA is the prototypical molecule for phylogenetic analysis for a broad range of—if not for all—organisms. We used the 16S rRNA sequences from Bremm and his colleagues’ research.4 The second dataset comprised the protein sequences of HCN ion channels, which are fundamental in the workings of heart cells. It let us employ two MSA algorithms, which allowed a cross-sectional comparison of them. To select the dataset sequences, we submitted the query sequence gi355749904 to BLAST (Basic Local Alignment Search Tool) for related proteins. First, we accepted all sequences with an E-value (Expect value) smaller than 0.00001. Furthermore, we filtered for those hits with the annotations “hyperpolarization” and “cyclic.” On the basis of the judgment of a biologist with domain knowledge, we deleted 13 sequences deemed evolutionarily unrelated to HCN channels. We obtained 211 valid HCN sequences.

March/April 2014

2/27/14 7:01 PM

Table 1. The three scenarios in our analysis.* Dataset and multiple-sequence-alignment algorithm 16S rRNA (ribosomal RNA) with ClustalW Tree construction

No. of trees

PhyML (default parameters)

1,344

HCN with ClustalW

HCN with Muscle

1. Phylip protdist.

1. Phylip protdist.

2. Phylip neighbor with order randomization ⇒ 100 trees.

2. Phylip neighbor with order randomization ⇒ 100 trees.

3. Phylip consense to reduce the 100 trees to 1 consensus tree.

3. Phylip consense to reduce the 100 trees to 1 consensus tree.

2,520

2,520

Parameters Clustering method

NJ, UPGMA

NJ, UPGMA

NJ, UPGMA

Weight matrix

clustalw, IUB

BLOSUM, Gonnet, PAM

clustalw

Distance measure 1

N/A

N/A

kbit20-3, kmer20-3, kmer6-6

Distance measure 2

N/A

N/A

pctidkimura, pctidlog

Yes, no

Yes, no

N/A

N/A

N/A

LE, SP

Endgaps Score Rooting method

N/A

N/A

midlongspan, minavgleafdist, pseudo

Gap open penalty

1, 2, 5, 10, 25, 50, 100

1, 2, 5, 10, 25, 50, 100

1, 2, 5, 10, 25, 50, 100

Gap extension penalty

0.05, 0.1, 0.5, 1, 2.5, 5, 7.5, 10

0.05, 0.5, 1, 5, 10

0.05, 0.5, 1, 5, 10

Gap Distance Penalty

1, 5, 10

1, 5, 10

N/A

N/A

Yes, no

N/A

Hydrophilic Gaps

* HCN = hyperpolarization-activated cyclic nucleotide-gated, NJ = neighbor joining, UPGMA = Unweighted Pair Group Method with Arithmetic Mean, BLOSUM = Blocks Substitution Matrix, PAM = Point Accepted Mutation, LE = log-expectation profile score, and SP = sum-of-pairs protein profile score.

We applied our approach to three scenarios: 16S rRNA with the ClustalW MSA, HCN with ClustalW, and HCN with the Muscle MSA (see Table 1).

no influence on the evolutionary assumptions—in strong contradiction to what we’d hope for.

HCN with ClustalW 16S rRNA with ClustalW Figure 3 shows the analysis of the 1,344 trees obtained for this scenario. Our tool revealed two interesting findings: ■

■

the uniform distribution of the endgaps and gap-distance-penalty parameters and the pronounced, peaked distribution of the clustering-method parameter.

These findings’ interpretation is straightforward. Each node in Figure 3a represents a set of trees. In each set, we found no influence of the endgaps and gap-distance-penalty parameters. At the same time, the choice of clustering method (neighbor joining or Unweighted Pair Group Method with Arithmetic Mean) showed almost always absolute dominance in the respective tree subset. We thus conclude that the outcomes didn’t depend primarily on the chosen evolutionary model but on the clustering method. This conclusion points to a problem in such studies. We could trick tree construction into a different outcome by using a different organism, whereas we have

Figure 5 shows this scenario’s analysis. The automatic identification of cluster characteristic parameters revealed that the clustering method (in green) was characteristic for all clusters. As Figure 5b shows, the same clustering method created the trees in nearly all the clusters. We also observed this phenomenon in the analysis of 16S rRNA. Again, the clustering method had a very high impact on the resulting trees. As Figure 5c shows, the weight-matrix parameter (purple) was also characteristic for some clusters. The trees in the two clusters on the superhierarchy’s left were all created with the Gonnet weight matrix. In addition, these clusters depended on a gap open penalty of 100 (red). The parameters (except for the clustering method) seem to have had no impact on these trees’ structure. The combination of the Gonnet matrix and a gap open penalty of 100 therefore led to a special tree structure and should be handled with care. As Figure 5d shows, three parameters had nearly no impact on the tree structure: the gap distance penalty (orange), endgaps (yellow), and hydrophilic gaps (blue). IEEE Computer Graphics and Applications

g2hes.indd 53

53

2/27/14 7:01 PM


(a)

(b)

(c)

(d)

Figure 5. Analysis of the HCN dataset with ClustalW. (a) All parameters. (b) The clustering-method parameter. (c) The weight-matrix parameter. (d) The gap-distance-penalty parameter. The clustering method had a very high impact on the resulting trees.

Still, the clustering method had a spurious influence on our results. However, the chosen evolutionary model (as parameterized by the chosen weight matrix) was also crucial. The Gonnet matrix led to significant distinction and therefore must be judged as unique in the HCN dataset.

HCN with Muscle Figure 6 shows this scenario’s analysis. The automatic identification of cluster characteristic parameters revealed that the distance measure 2 (yellow) and rooting (light blue) parameters had nearly no impact on the tree structures. Because the ClustalW analysis showed that the clustering method had a high impact on tree structure, we analyzed this parameter in detail for this scenario. As Figure 6b shows, the clustering method (green) only slightly affected the tree structures. As Figure 6c shows, the score parameter (red) had the highest impact. By using the interaction technique for identifying similar parameter distributions, we identified one big cluster that depended nearly exclusively on sum-of-pairs scoring (see Figure 6d). So, the tree construction 54

g2hes.indd 54

led to a distinct outcome whether we used the sum-of-pairs protein profile score (PAM200) or log-expectation profile score (VTML240).

Comments We conclude that tree construction depends largely—contrary to folk wisdom in the community—on algorithmic (clustering method) and statistical (scoring scheme) assumptions. It depends somewhat less on the detailed parameters of the underlying evolutionary model that forms the objective function to be optimized during tree construction. These results suggest that further research on improving parameters and models might be futile as long as the clustering schemes lead to contradicting results or trees.

Discussion Instead of hierarchical clustering with selection of a cluster threshold, we could use other similaritybased clustering methods such as k-means. This would affect the resulting visualization but not our approach’s underlying idea and main results. We could also enhance our simple but powerful

March/April 2014

2/27/14 7:01 PM

(a)

(b)

(c)

(d)

Figure 6. Analysis of the HCN dataset with the Muscle MSA. (a) All parameters. (b) The clustering-method parameter. (c) The score parameter. (d) Detailed analysis of the score parameter. The score parameter had the highest impact on tree structure.

threshold recommendation criteria with additional cluster quality measures. Currently, our approach displays a limited number of clusters and parameters. We alleviate this problem by a user-defined threshold allowing for a small number of clusters and by automatic identification of relevant parameters.

O

ur approach has enabled domain experts to discover hidden and unexpected information about specific parameters’ influence on trees. This information has led to a better understanding of the trees’ quality and to a more reliable selection of the most likely tree for each scenario. We’d like to extend our approach to addressing the limitations we mentioned before. We also plan to evaluate our approach on many more datasets to emphasize its usefulness. Moreover, we’d like to support deeper analysis of tree construction allowing for better understanding of the sequence alignment and tree construction algorithms. For example, we could include evolutionary graphs instead of trees.

Acknowledgments Kay Hamacher gratefully acknowledges Deutsche Forschungsgemeinschaft’s financial support under grant HA 5261/3-1. Martin Hess, Kay Hamacher, Michael Goesele, and Josef Wiemeyer are grateful for a Technische Universität Darmstadt Forum for Interdisciplinary Research grant supporting this study. The Deutsche Forschungsgemeinschaft Schwerpunktprogramm 1335 project Visual Analytics Methods for Modeling in Medical Imaging partly supported this research.

References 1. K. Hamacher, “Protein Domain Phylogenies— Information Theory and Evolutionary Dynamics,” Proc. 1st Int’l Conf. Bioinformatics, 2010, pp. 114–122. 2. J. Kececioglu and D. DeBlasio, “Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment,” J. Computational Biology, vol. 20, no. 4, 2013, pp. 259–279. 3. T. Nye, “Trees of Trees: An Approach to Comparing Multiple Alternative Phylogenies,” Systematic Biology, vol. 57, no. 5, 2008, p. 785. 4. S. Bremm et al., “Interactive Visual Comparison IEEE Computer Graphics and Applications

g2hes.indd 55

55

2/27/14 7:01 PM


of Multiple Trees,” Proc. 2011 IEEE Conf. Visual Analytics Science and Technology (VAST 11), 2011, pp. 31–40. 5. T. Munzner et al., “TreeJuxtaposer: Scalable Tree Comparison Using Focus+Context with Guaranteed Visibility,” ACM Trans. Graphics, vol. 22, no. 3, 2003, pp. 453–462. 6. S. Salvador and P. Chan, “Determining the Number of Clusters/Segments in Hierarchical Clustering/ Segmentation Algorithms,” Proc. 16th IEEE Int’l Conf. Tools with Artificial Intelligence (ICTAI 04), 2004, pp. 576–584. Martin Hess is a full-time researcher and PhD student in Technische Universität Darmstadt’s Graphics, Capture, and Massively Parallel Computing Group and Computational Biology and Simulation Group. His research interests include visual analytics of biological data, computer graphics, bioinformatics, and serious games. Hess received an MS in computer science and an MS in visual computing from TU Darmstadt. Contact him at [email protected]. tu-darmstadt.de.

NEWSLETTERS

Stay Informed on Hot Topics

Sebastian Bremm is a postdoctoral fellow in Technische Universität Darmstadt’s Interactive Graphics Systems Group. His research focuses on visual analytics, combining interactive visualization with data mining for exploratory analysis of large datasets. Bremm received a PhD in computer science from TU Darmstadt. Contact him at sebastian. [email protected]. Stephanie Weissgraeber is a PhD student in Technische Universität Darmstadt’s Computational Biology and Simulation Group. She studies membrane proteins using computer models and information theory. Weissgraeber received a Diploma in biology from TU Darmstadt. Contact her at [email protected]. Kay Hamacher is a professor of computational biology and simulation at Technische Universität Darmstadt. His research interests include general simulation methods, computational biophysics, information theory, and the mathematical modeling of general complex systems. Hamacher received a PhD in theoretical condensed-matter physics from Dortmund University. Contact him at hamacher@bio. tu-darmstadt.de. Michael Goesele is a professor of computer science at Technische Universität Darmstadt. His research interests include computer graphics, computer vision, and massively parallel computing. Goesele received a doctorate in computer science from Saarland University and the Max Planck Institute for Computer Science. Contact him at michael.goesele@gris. informatik.tu-darmstadt.de. Josef Wiemeyer is a professor of sports science at Technische Universität Darmstadt. His research interests include biomechanics, movement perception and presentation, computer science in sports, and serious games. Wiemeyer received a habilitation in sports science from Münster University. Contact him at [email protected]. Tatiana von Landesberger is the head of Technische Universität Darmstadt’s Visual Search and Analysis Group. Her main research interests are visual analytics of complex data and interactive visualization and exploration of large datasets in application areas such as biology and medicine. Von Landesberger received a PhD in computer science from TU Darmstadt. Contact her at tatiana.von.landesberger@gris. tu-darmstadt.de.

computer.org/newsletters Selected CS articles and columns are also available for free at http://ComputingNow.computer.org. 56

g2hes.indd 56

March/April 2014

2/27/14 7:01 PM

Unrealistic phylogenetic trees may improve phylogenetic footprinting.

Phylogenetic inference based on matrix representation of trees.

YBYRÁ facilitates comparison of large phylogenetic trees.

A simulation approach for change-points on phylogenetic trees.

Folding and unfolding phylogenetic trees and networks.

New substitution models for rooting phylogenetic trees.

Quantifying MCMC exploration of phylogenetic tree space.

Phylogenetic Trees and Networks Reduce to Phylogenies on Binary States: Does It Furnish an Explanation to the Robustness of Phylogenetic Trees against Lateral Transfers.

Reconstruction of phylogenetic trees of prokaryotes using maximal common intervals.

Treelink: data integration, clustering and visualization of phylogenetic trees.

Evaluation of properties over phylogenetic trees using stochastic logics.

Mapping Phylogenetic Trees to Reveal Distinct Patterns of Evolution.

Enumerating all maximal frequent subtrees in collections of phylogenetic trees.

Predicting rates of interspecific interaction from phylogenetic trees.

Inferring 'weak spots' in phylogenetic trees: application to mosasauroid nomenclature.

Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees.

Optimization strategies for fast detection of positive selection on phylogenetic trees.

Computing all hybridization networks for multiple binary phylogenetic input trees.

MacT: Apple Macintosh programs for constructing phylogenetic trees.

Influence of visual angle on pattern reversal visual evoked potentials.

Phylogenetic exploration of commonly used medicinal plants in South Africa.

Phylo.io: Interactive Viewing and Comparison of Large Phylogenetic Trees on the Web.

Visual parameter optimisation for biomedical image processing.

RidgeRace: ridge regression for continuous ancestral character estimation on phylogenetic trees.