Task-based assessment of visualization tools for the comparison of biological taxonomies

Lilliana Sancho-Chavarria; Fabian Beck; Daniel Weiskopf; Erick Mata-Montero

doi:10.3897/rio.4.e25742

Research Ideas and Outcomes : Research Article

Research Article

Task-based assessment of visualization tools for the comparison of biological taxonomies

Lilliana Sancho-Chavarria^‡, Fabian Beck^§, Daniel Weiskopf^|, Erick Mata-Montero^‡

‡ School of Computing, Costa Rica Institute of Technology, Cartago, Costa Rica

§ Institute for Computer Science and Business Information Systems, University of Duisburg-Essen, Essen, Germany

| VISUS, University of Stuttgart, Stuttgart, Germany

Corresponding author: Lilliana Sancho-Chavarria (lsancho@itcr.ac.cr), Fabian Beck (fabian.beck@wiwinf.uni-due.de), Daniel Weiskopf (daniel.weiskopf@visus.uni-stuttgart.de), Erick Mata-Montero (emata@itcr.ac.cr)

Received: 12 Apr 2018 | Published: 12 Apr 2018

This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Citation: Sancho-Chavarria L, Beck F, Weiskopf D, Mata-Montero E (2018) Task-based assessment of visualization tools for the comparison of biological taxonomies. Research Ideas and Outcomes 4: e25742. https://doi.org/10.3897/rio.4.e25742

Abstract

Maintenance and curation of large-sized biological taxonomies are complex and laborious activities. Information visualization systems use interactive visual interfaces to facilitate analytical reasoning on complex information. Several approaches such as treemaps, indented lists, cone trees, radial trees, and many others have been used to visualize and analyze a single taxonomy. In addition, methods such as edge drawing, animation, and matrix representations have been used for comparing trees. Visualizing similarities and differences between two or more large taxonomies is harder than the visualization of a single taxonomy. On one hand, less space is available on the screen to display each tree; on the other hand, differences should be highlighted. The comparison of two alternative taxonomies and the analysis of a taxonomy as it evolves over time provide fundamental information to taxonomists and global initiatives that promote standardization and integration of taxonomic databases to better document biodiversity and support its conservation. In this work we assess how ten user visualization tasks for the curation of biological taxonomies are supported by several visualization tools. Tasks include the identification of conditions such as congruent taxa, splits, merges, and new species added to a taxonomy. We consider tools that have gone beyond the prototype stage, that have been described in peer-reviewed publications, or are in current use. We conclude with the identification of challenges for future development of taxonomy comparison tools.

Keywords

Information visualization, biodiversity informatics, taxonomy, taxonomy comparison tools.

1. Introduction

Biological taxonomies are hierarchical structures that represent classifications of living organisms. Taxonomists, herbaria, natural history museums, and biodiversity initiatives worldwide classify biodiversity according to literature and other sources of information available to them, and to a choice of criteria that they recognize as valid. Consequently, it is not surprising that different classifications emerge and that there is disagreement in the scientific community about which classification is correct. To resolve these conflicts, taxonomists perform studies―called revisions―that could lead to other variants of the classifications. Taxonomists and global initiatives eventually need to reconcile these multiplicity in order to properly document biodiversity. Therefore, differences and similarities between such alternative taxonomies have to be identified. Since taxonomies can be large and the number of changes substantial, the support of software tools to carry out this endeavor becomes indispensable.

In this article we analyze information visualization tools designed to support comparison of biological taxonomies. We reviewed the tools and contrast them with ten user visualization tasks that we characterized in a previous work (Sancho-Chavarria et al. 2016). Section 2 presents a brief description of the reviewed tools and the list of ten user visualization tasks which we use as software requirements for the visual comparison of taxonomic changes. Section 3 describes the methodology used to assess the tools. Section 4 presents the assessment of the tools. Finally, Section 5 discusses future challenges and presents conclusions.

2. Background

The comparison of alternative classifications has long been a research topic in information visualization (Graham and Kennedy 2010). In this work we are interested in assessing how tools for the comparison of biological taxonomies support previously characterized user tasks.

A hierarchy comparison tool is expected to receive as input at least two hierarchies and facilitate the visualization of similarities and differences. These similarities and differences could be indicated manually by experts, inferred by the software itself, or both. We consider that the process for the comparison and curation of taxonomies involves three components as illustrated in Fig. 1. The purpose of the Inference component is to compute the differences and similarities between taxonomies. To do so, sometimes the software would require the taxonomic history of the species, that is, how a species has been classified through time; however that history is not commonly available in databases so that expert-provided inferences are required in order to be able to visualize the relationships between taxonomies. In that case, instead that the software carries out the inference on its own, the relationships will have to be manually indicated by experts. The Visualization component is responsible for the visual representation of the taxonomies and makes use of the information provided by the Inference component for the purpose of presenting the differences and similarities. After users analyze the results of the comparison, it seems natural to think about an Edition component. The objective of this component is to allow users to change taxonomies as they deem necessary. Users can then go in a visualization/edition cycle until they reach a satisfying point.

Figure 1.

Components of the process for the comparison of biological taxonomies.

We reviewed literature on tree visualization and comparison and identified information visualization tools for the comparison of biological taxonomies. Then we selected tools that have gone beyond the prototype stage and have been described in peer-reviewed publications or that are in current use. As a result, TreeJuxtaposer (Munzner et al. 2003), TaxVis (Graham and Kennedy 2007, Graham et al. 2008, Graham and Kennedy 2010), The Taxonomic Tree Tool (Lin and Wang 2013), and ProvenanceMatrix (Dang et al. 2015) were selected. Although these tools compare biological taxonomies, it is important to highlight that they do not share the same user requirements. Furthermore, in this assessment we have considered a list of ten user tasks that go beyond the original requirements of each of these tools. However, we believe these are the tools that more closely address those requirements. This comparison is critical in order to understand how the reviewed tools satisfy the requirements as well as to obtain an overview of approaches and missing functionality. The following paragraphs present a summary of features of each tool.

2.1 The tools

TreeJuxtaposer was created for the visual comparison of large trees, especially phylogenies, although it could be applied to other domains (Munzner et al. 2003). TreeJuxtaposer's main goal is the automatic detection and visualization of structural differences among hierarchies. It uses a similarity measure that computes the best corresponding node for each internal node and each leaf node. The hierarchical structure is represented by a rectilinear layout similar to a dendrogram and classifications are placed side by side, fitting the size of the screen without the use of scrolling. It relies on color to visualize similarities and differences. Fig. 2 shows an example of the representation used by TreeJuxtaposer. It shows the structural differences between two taxonomies, emphasized through colored-coded edges. TreeJuxtaposer implements an accordion-type distortion technique as the focus+context mechanism that supports the concept of guaranteed visibility of selected areas. Brushing and linking facilitate visual exploration: the selected node and the best corresponding nodes in all other trees are temporarily highlighted. Users can also apply the linked navigation option in order to compare analogous areas in each tree, so that subtrees underneath the best corresponding nodes in other trees are resized and synchronized with the selected subtree.

Figure 2.

An abstract representation of TreeJuxtaposer´s environment for taxonomy comparison.

TaxVis (Graham and Kennedy 2007) is a tool designed to explore relationships between multiple taxonomic trees through concept relationships. A concept is defined as a unique combination of (name, author, and date). TaxVis inputs are a set of hierarchies and a set of concept relations given by expert taxonomists (Graham et al. 2008). The main visualization requirements are: to track siblings and parents of a genus across hierarchies, to track children of a particular higher-level node across the hierarchies, and to compare the amount of levels across hierarchies (Graham et al. 2000). Fig. 3 illustrates the multiple tree view that holds the taxonomies being compared; this illustrates the main panel where users interact and identify relationships among taxa. A hierarchy is represented in a set-based visualization rather that in a node-link metaphor. Multiple taxonomies are placed from top to bottom within the panel and each taxonomy is also displayed in a top-down fashion with lower rank taxa drawn underneath their parent taxa, in an icicle plot style. Each taxon is represented by a rectangular box. When users select a taxon, the corresponding concepts in the alternative classifications are highlighted with color and edges; for instance, by selecting a genus, users can visualize the corresponding species of that genus as well as the location of those species in the other classifications. When users select a taxon, it is moved to the top of the displayed fragment of the classification, at the same time that the descendant taxa are displayed underneath it, and they are highlighted within the other classifications. Concept relationships such as congruent (=), non-congruent (≠), contains (⊃), is contained in (⊂), and overlaps (∩) (Graham and Kennedy 2007) are represented by color coded lines. The tool includes other views, such as a list of all taxa names within the data sets, attributes of a selected taxon, a control panel to regulate various properties of the display and interaction, and a history of selected comparisons. Complementary to TaxVis is the Concept Relationship Editor―CRE (Graham et al. 2008), a separate tool that allows the edition of concept relationships between two classifications.

Figure 3.

An abstract representation of TaxVis´s environment for taxonomy comparison.

The Taxonomic Tree Tool―TTT (Lin and Wang 2013) is a web-based application designed to compare and edit classifications. In TTT there are two types of users: general users and registered users. General users can access public information on the web site such as public classifications, public tree comparison cases, news, and statistics: registered users can additionally upload and edit their own classifications and compute tree comparisons. Tree comparisons can be visualized either through indented trees or through a node-link diagram in D3. In the indented tree layout, the comparison tree (CT) and the reference tree (RT) are placed side by side. Each tree has its own vertical scrollbar and scrolling is not synchronized between trees. At first, the trees appear compressed, that is, only the root node of each tree is displayed and a plus sign indicates that the node can be expanded to the next tree level. For each node the visualization includes rank, taxa, and a set of glyphs that depict relationships between nodes in the two compared classifications. Fig. 4 illustrates the case of comparing two fictitious taxonomies T1 (CT) and T2 (RT). The legend in the figure explains the meaning of the glyphs. The ancestor relation indicates whether the two compared taxa have the same ancestor route: 1 means same route and 0 stands for different routes. Nominal relations indicate how names are related. A green circle indicates that the name refers to exactly the same name and corresponds to the same taxon in both trees. A blue circle indicates that the corresponding taxa is a synonym, and a red circle shows no relation between both nodes. Similar to the ancestor’s case, the descendant relations evidence how related descendants are; for instance, it reveals whether the descendants´ branches are congruent (exactly the same), one branch of a tree is included in, or excluded from the other tree. Finally, the multiple links glyph indicates that a taxon in the CT has more than one relation to taxa in the RT. For example, family "Felidae" in both taxonomies indicate same ancestror, nominal and descendant relations in both hierarchies,

Figure 4.

An abstract representation of TTT´s environment for taxonomy comparison.

In the node-link layout, differences are color-coded. Users can filter by view type and visualize either overlaps, differences or both.

ProvenanceMatrix (Dang et al. 2015) is a visualization tool for exploring and analyzing the outcomes of taxonomic alignments (merges) generated with the reasoning toolkit EULER/X (Chen et al. 2014). EULER/X takes as input two taxonomies and a set of expert assertions that relate concepts between taxonomies at the leaf level. Experts express assertions through region connection calculus (RCC-5) relations such as equals, includes, is included in, overlaps, or disjoint (Franz et al. 2008). The reasoning toolkit produces alternative outcomes that can be visualized with ProvenanceMatrix. As shown in Fig. 5, ProvenanceMatrix uses a matrix to represent the relationships that may exist between elements from two taxonomies. Taxonomies are displayed along the axes of the matrix. The taxonomy on the horizontal axis can be displayed by using different orderings of the taxonomic concepts. Taxonomies can be displayed either in depth-first search, breadth-first or similarity ordering. In a similarity-based order the concepts are ordered by similarity of their articulation sets. Hierarchical parent-child relationships within each tree are drawn through lines that are painted over the matrix cells, and identation also helps to show hierarchical relations. The relationship between two taxonomic concepts are represented through glyphs within the cells of the matrix. Glyphs are color-coded circles that consist of five-piece slices that resemble a pie chart. Each slice is an indication of the relationship between concepts; for instance, a green slice indicates that concepts are equal and a blue slice indicates that a concept is included into the other one. Two concepts might hold several relationships, therefore a cell can contain several colored slices. ProvenanceMatrix is highly interactive: users can select different orderings of the hierarchies, perform brushing and linking to visualize relationships, collapse/expand sub-hierarchies, and filter the matrix by articulation type.

Figure 5.

An abstract representation of ProvenanceMatrix´s environment for taxonomy comparison.

2.2 The tasks

The expert-provided tasks for the visual comparison of taxonomic changes are organized in three categories, namely, pattern identification, query, and edit. Pattern identification tasks provide a means to recognize specific differences and similarities between alternative classifications; query tasks allow users to inspect more detailed information; and edit tasks let users make modifications to the classifications. We describe each task as follows.

Pattern identification tasks:

1. Identify congruence. Let T₁ and T₂ be alternative taxonomies. At the species level, congruence refers to equivalence of taxonomic concepts, and a concept is defined as the ordered triplet (scientific name, author, year). That is, species u in T₁ is said to be congruent to species v in T₂ when both species are identified by the same concept. At other higher level taxonomic ranks such as genus or family, a taxon p in T₁ is said to be congruent to taxon q in T₂ when both taxa have the same name.

2. Identify corrections. Differences between alternative taxonomies are due to revisions or to authors using different classification criteria. We consider the following four types of corrections: splits, merges, moves, and typos.

a) Identify splits: A split occurs when taxonomists divide one concept into two or more concepts. It is more likely that experts propose splits at lower level ranks, such as at family, genus, or species level. For instance, concept u in taxonomy T₁ can be split into concepts v₁, v₂, ... v_n in taxonomyT₂ , where either u = v₁ or u is a synonym of v₁, and v₂, ... v_n are new concepts. An inference algorithm for the identification of splits would need the taxonomic history of the involved species.

b) Identify merges: A merge occurs when taxonomists combine two or more concepts into one. It is more likely that experts propose merging concepts at lower level ranks, such as at the family, genus, or species level. For instance, concepts u₁, u₂, ... u_n in taxonomy T₁ can correspond to concept v in taxonomy T₂. At the species level, concepts u₁, u₂, ... u_n are registered as synonyms for concept v.

c) Identify moves: A concept can be identified as moved when it appears re-classified in another position of an alternative taxonomy; that is, a concept u in T₁ is re-classified as concept v in T₂ when parent(u) ≠ parent(v). An inference algorithm for the identification of moves would need the taxonomic history of the involved concepts.

d) Identify typos: A typo is a misspelling of a name.

3. Identify additions. Additions occur when new concepts are added to a taxonomy. In other words, concept v has been added to T₂ if v ∈ T₂ and v ∉ T₁.

4. Overview changes. This task presents an overview of corrections and additions as stated in items 2 and 3 above.

5. Summarize. This task refers to obtaining a numerical understanding of change between taxonomies T₁ and T₂, for example, with respect to number of species, number of split cases, number of merge cases, amount and percentage of species added.

Query tasks:

6. Find inconsistencies. Inconsistencies are due to circumstances that go beyond the different types of corrections described above and that refer to violations of rules (e.g., repeated names within one taxonomy or missing names in a newer version of a taxonomy).

7. Filter. This task refers to finding cases that satisfy certain conditions. Through filter criteria users can visualize selected pieces of information. For example, filtering by author or by date.

8. Retrieve details. The goal of his task is to retrieve the attributes of a particular concept. For example, retrieve the details of a concept with name "Passiflora coriacea" will display data such as, author, year, and its list of synonyms.

9. Focus. Navigate to an area of interest, in order to see the information in greater detail.

Edit task:

10. Edit. The goal of this task is to allow users to make changes in the classifications after analyzing the results of the comparison.

3. Methodology

Our starting point has been a list of tasks for the curation of biological taxonomies (Sancho-Chavarria et al. 2016). This task characterization resulted from a two-stage systematization process that involved literature review and interviews to experts. In the first stage experts were interviewed and provided information in order to derive a draft list of tasks. In the second stage, tasks were revised by the experts, and after a careful analysis we obtained the final list. These tasks were described in the previous section.

For this work we investigated tools for comparing biological taxonomies. As stated above, TreeJuxtaposer, TaxVis, The Taxonomic Tree Tool (TTT), and ProvenanceMatrix were selected considering that these are tools that go beyond the prototype stage, have been described in peer-reviewed publications, or are currently in use. Given that only TTT was available online, we contacted authors in order to confirm that we had suitable sources of information. Four out of the six contacted authors shared additional materials, such as links to the InfoVis 2003 Contest on Visualization and Pair Wise Comparison of Trees, users' guide of the tools, and links to source code. Since some tools were rather old and technology requirements were difficult to fulfill to make them run, publications, guides and presentations were our main sources of information.

To systematize the analysis we use the rating criteria indicated in Fig. 6. An "explicit" rate is given when the tool provides an explicit mechanism to carry out the task; an "implicit" rate is given when users can accomplish the task by doing visual exploration and navigation; and, the "not addressed" rate is given when the tool does not support the task.

Figure 6.

Rating criteria for the analysis of the evaluated tools.

We built tables that present the assessment of each tool with respect to the tasks. Afterwards, we contacted authors again, shared the draft assessment of their tool and asked them for feedback. Three out of six authors replied. Authors agreed with most of the assessment results, they commented each evaluation and explained cases that they considered required more accuracy or detail. Lastly, we incorporated authors' feedback and performed a final analysis and assessment.

4. Assessment

This section presents the results of contrasting the four reviewed tools and the list of ten user tasks. The assessment is organized by task category.

4.1 Pattern identification tasks

1. Identify congruence. As summarized in Fig. 7, most reviewed tools provide functionality to carry out this task. Brushing and linking, coloring, and the use of glyphs are common strategies for the identification of congruent relationships between alternative classifications. The identification of congruence is considered explicit in TaxVis, TTT, and ProvenanceMatrix. TaxVis relies on expert-provided information for the identification of congruence and users can visualize congruent relationships through exploration as well as brushing and linking. TTT calculates nominal relations that provide users with the information to infer congruence; nominal relations are represented through color-coded glyphs. In ProvenanceMatrix relations are given by experts and congruence is also identified by color-coded glyphs. TreeJuxtaposer does not handle attributes, so it does not consider concepts and does not have the mechanisms for the identification of congruence. Nonetheless, users can identify equal scientific names by browsing and doing visual exploration on the marked areas.

Figure 7.

Result of the assessment for task identify congruence.

2. Identify corrections. Possible types of corrections are: splits, merges, moves, and typos.

a) Identify splits

Let us recall that according to the definition of split in the list of requirements, a split occurs in a classification by decision of human experts. In order to identify that a taxon has been split, it is required to know its taxonomic history; that is, whether in the past it belonged to a more general concept. From our literature review we observed that the identification of splits was not a distinct requirement for any of the assessed tools, therefore none of the tools address this task in an explicit way (see Fig. 8). However, since TaxVis includes concept relations given by experts, this tool can allow the identification of splits in an implicit way because taxonomists provide the relations that correspond to splits that occurred in the past. When selecting a taxon, concept relations are listed and are also drawn between the alternative classifications; thus, users can recognize splits by exploring the taxonomies and inspecting the provided taxon history. ProvenanceMatrix displays concept relations equals, includes, is included in, and overlaps, hence users might discover splits through a laborious exploration of these relations, although the tool does not address splits as such. In brief, we consider splits are implicit in TaxVis and not addressed in the other reviewed tools because the taxonomic history of taxa is not included in the data.

Figure 8.

Result of the assessment for the task identify splits.

b) Identify merges

Analogous to the split subtask, the identification of merges was not a distinct requirement for the reviewed tools. A merged taxon can be determined by tracing its synonyms or when the concept relations are directly indicated. Since concept relations are given by experts in TaxVis, merges can be identified implicitly through visual exploration of correspondences between taxa. Merges do not occur in evolutionary trees, thus they are not considered in TreeJuxtaposer. TTT distinguishes synonyms by means of nominal relations but this is not enough to be able to visually find merges. Merges in ProvenanceMatrix might be discovered by a cumbersome exploration of the concept relations equals, includes, is included in, and overlaps, but the tool does not address splits as such. A summary of the ratings is presented in Fig. 9.

Figure 9.

Result of the assessment for the task identify merges.

c) Identify moves

A taxon can be moved by a taxonomist to another position within the classification when, according to expert criteria, the taxon should be re-classified. Fig. 10 summarizes the assessment of the tools for this task. TreeJuxtaposer visualises structural changes among taxonomies via color-coded edges, therefore a re-classified taxon can be implicitly recognized by visual exploration of the classifications. TaxVis also supports this task in an implicit way by means of the visual exploration of expert-introduced concept relations. Other tools do not address this task. TTT does not provide functionality for the identification of cases where species within the Reference Tree (RT) have been classified under a different taxon in the Comparison Tree (CT). ProvenanceMatrix displays concept relations: equals, includes, is included in, overlaps, and disjoint. Through laborious exploration of the data users might discover the re-classification of species, however, the tool does not address taxon moves as such.

Figure 10.

Result of the assessment for the task identify moves.

d) Identify typos

None of the tools contains functionality to support this task.

3. Identify additions

In one way or another all tools support this task. In TreeJuxtaposer, added nodes are marked in red color, the task is supported in an explicit way. In TaxVis, when selecting a non-leaf node, its corresponding descendants are highlighted within the alternative taxonomies; so, users can identify additions, in an implicit way, by performing visual identification of differences. In TTT, the red glyphs that represent no ancestor relation and the red glyps that indicate no nominal relation imply that new nodes have been added; thus we consider that TTT supports this task in an explicit way. In ProvenanceMatrix, this task is also approached in an explicit way: non-congruent nodes indicated in red color correspond to new nodes added to the taxonomy.

4. Overview changes

All tools visualize changes, but most of them do not exactly comply with the definition of task overview changes in the list of requirements (see Fig. 11). TaxVis' visualization of change is closer to this definition because users can identify splits, merges, and additions with little effort in an implicit way. TreeJuxtaposer's main focus is the visualization of topological changes, and for this it relies on color-coding, TTT visualizes several change conditions through color-coded and shape-coded glyphs, and ProvenanceMatrix uses color-coded glyphs; thus, our assessment is not addressed but with the observation that they provide similar functionality.

Figure 11.

Result of the assessment for the task overview changes.

5. Summarize.

All tools present at least some basic numerical information in an explicit way. TreeJuxtaposer approaches this task through the find function and displays the amount of named nodes within a selected taxonomy. TaxVis displays the amount of subtaxa of a selected taxa and its percentage relationship with respect to a compared classification. TTT provides a wide range of statistics on taxons and on structure. Statistics on taxa are amount of orders, classes, families, genera, and species. Statistics on structure also indicate amount of equal taxa, amount of overlapped taxa, and amount of unmatched taxa for each taxonomic rank. ProvenanceMatrix provides a bubble chart that visualizes the proportion of each type of articulation.

4.2 Query tasks

6. Find inconsistencies

None of the tools supports this task.

7. Filter

TaxVis and ProvenanceMatrix support this task in an explicit way, whereas TreeJuxtaposer and TTT do not support it at all. TaxVis has a filter relations menu that allows the visualization of congruent, contains, included, overlaps, and is not congruent with relations. TreeJuxtaposer does not handle attributes, therefore it cannot filter attributes. ProvenanceMatrix provides filtering through different types of articulations.

8. Retrieve details

All tools support this task in an explicit way. TreeJuxtaposer includes a search by name mechanism. TaxVis retrieves the information of a node after selecting it. TTT provides a search-by scientific name feature. ProvenanceMatrix retrieves the data through a mouse-over operation and also pictures are retrieved from Wikipedia.

9. Focus

All tools except TTT feature focus mechanisms explicitly. TreeJuxtaposer achieves this task through features such as guaranteed visibility, accordion, lens and mouse pointer. In TaxVis this tasks is performed through increasing the size of selection. ProvenanceMatrix approaches this task by collapse, expand, resize, and order by features.

4.3 Edit task

10. Edit

The comparison through visualization provides users with an understanding of differences and similarities between taxonomies, and afterwards editing becomes necessary. Only TTT provides functionality for this task, and it is explicit. Although TaxVis and ProvenanceMatrix do not approach this task, they both have complementary tools for this functionality. TaxVis edition complementary tool is the Concept Relationship Editor (CRE) and ProvenanceMatrix´s is EULER/X ‒ a multi-taxonomy alignment tool. The assessment is presented in Fig. 12.

Figure 12.

Result of the assessment for the task edit.

5. Conclusions

The assessment of the four biological taxonomies comparison tools reveals distinct approaches among tools, as well as different levels of support for the defined tasks. Considering approaches, TreeJuxtaposer uses dendrograms and focuses on visualizing structural differences; TaxVis uses an adjacent set-based type layout and its main focus is on visualizing a genus-corresponding species within the other classifications; TTT uses indented lists and concentrates on visualizing similarities and differences through ancestor, descendant, and nominal relations; and ProvenanceMatrix implements a matrix approach to visualize the correspondences between taxonomies after applying expert assertions to relate concepts. All tools take advantage of color to highlight similarities and diferences.

Regarding the defined tasks, Fig. 13 presents a summary that contrasts tasks and tools. All tools allow users to retrieve the available details of a taxon (task 8), and all tools present some sort of numerical summary of the data (task 5). Identify congruence (task 1), identify additions (task 3), and focus (task 9) are supported by most tools. It should be mentioned that the identification of congruence also reveals the identification of its complement, that is, the identification of non-congruency, which consequently exposes differences as well. Identify corrections (task 2) is either not addressed at all or implicitly addressed by the tools; thereby, identify splits and identify merges are implicitly addressed by one tool, identify moves is implicitly addressed by two tools, and identify typos is not addressed at all. We noticed that the most common differences visualized by the tools correspond to nodes added and non-congruent nodes. All tools visualize changes, but most of them do not comply with the definition of the task overview changes (task 4). Regarding the task find inconsistencies (task 6), none of the tools incorporates functionality to address it. Filter (task 7) is provided by two tools. Edit is explicitly addressed in one tool whereas two other tools have complementary software to accomplish this task.

Figure 13.

Summary of the assessment, contrasting tasks and tools.

As mentioned before, the assessed tools respond to requirements established at the time they were conceived. However, given that they are hierarchy comparison tools, they have certainly a correspondence with several of the expert-provided tasks described above, although they also present several gaps that are indicated in Fig. 13. First, tasks for the identification of specific types of change ―such as splits and merges― are supported (partially) only by TaxVis, in spite of the importance pointed out by interviewed taxonomists; thus, the consideration of these tasks in future work is mandatory. Second, the task overview changes consists of an integral overview of differences between the compared taxonomies but given the lack of support for the specific types of change discussed above, we conclude that it is not supported by any of the tools. Third, it is surprising that neither identify typos nor find inconsistencies are supported by the assessed tools. A reason may be that these tasks are more on the side of data cleaning than on taxonomy comparison; nevertheless, they are included in our list of tasks since users need to have reliable data in order to accomplish taxonomic comparison and analysis. Fourth, the task filter is regularly encountered in information visualization tools (Graham and Kennedy 2010), yet, it is absent in several of the assessed tools. Fifth, edit, which is a task that would allow users to modify a taxonomy after analyzing the results of the comparison, is only included in one of the tools and considered by two other tools in a separate piece of software. Sixth, all tools present restrictions to perform the automatic identification of similarities and differences. Given that taxon names are not enough to establish differences and similarities, attributes (such as author's name, year or synonyms) are required for the identification of congruence, splits or merges, and to perform filtering and edition. Furthermore, most tools have to rely on expert-provided relationships between taxa since databases usually lack of data about the taxonomic history of concepts. This data limitation represents a challenge for future work.

In summary, this work reveals lack of support for certain tasks in the assessed tools; specifically for identify splits, identify merges, identify moves, identify inconsistencies and edit. Given the importance of these tasks for taxonomic work, this clearly suggests that future work should investigate suitable visualization approaches to fill these gaps.

Acknowledgements

The authors would like to thank Nico Franz, Martin Graham, Jessie Kennedy, and Tamara Munzner for their feedback on available sources of information and on our draft assessment of the tools.

Funding program

Grant title

Hosting institution

Ethics and security

Author contributions

Conflicts of interest

References

Chen M, Yu S, Franz N, Bowers S, Ludaescher B (2014)

Euler/X: a toolkit for logic-based taxonomy integration

arXiv. 2014;1402:1992.

URL: http://arxiv.org/abs/1402.1992

Dang T, Franz N, Ludäscher B, Forbes AG (2015)

ProvenanceMatrix: A Visualization Tool for Multi-Taxonomy Alignments

vol. 1456

CEUR Workshop Proceedings

13-24

pp.

Franz N, Peet R, Weakley A (2008)

On the Use of Taxonomic Concepts in Support of Biodiversity Research and Taxonomy

Systematics Association Special Volumes

61-84

pp. https://doi.org/10.1201/9781420008562.ch5

Graham M, Kennedy J, Hand C (2000)

A Comparison of Set-Based and Graph-Based Visualisations of Overlapping Classification Hierarchies

Proceedings of the Working Conference on Advanced Visual Interfaces 2000

ACM Press

Palermo, Italy

41-50

pp. https://doi.org/10.1145/345513.345243

Graham M, Kennedy J (2007)

Visual exploration of alternative taxonomies through concepts

Ecological Informatics

(

248

‑

261

. https://doi.org/10.1016/j.ecoinf.2007.07.004

Graham M, Craig P, Kennedy J (2008)

Visualisation to Aid Biodiversity Studies through Accurate Taxonomic Reconciliation

. In: A. Gray KJaJS (Ed.)

Lecture Notes in Computer Science

vol. 5071

Proc. of British National Conference on Database Systems: Sharing Data, Information and Knowledge

Cardiff, United Kingdom

280-291

pp. https://doi.org/10.1007/978-3-540-70504-8_29

Graham M, Kennedy J (2010)

A Survey of Multiple Tree Visualisation

Information Visualization

(

235

‑

252

. https://doi.org/10.1057/ivs.2009.29

Lin C, Wang J (2013)

Taxonomic Tree Tool

. http://ttt.biodinfo.org/. Accessed on: 2017-8-30.

Munzner T, Guimbretière F, Tasiran S, Zhang L, Zhou Y (2003)

TreeJuxtaposer: Scalable Tree Comparison using Focus+Context with Guaranteed Visibility

ACM Transactions on Graphics

(

453-462

. https://doi.org/10.1145/882262.882291

Sancho-Chavarria L, Beck F, Mata-Montero E, Weiskopf D (2016)

Visual Comparison of Biological Taxonomies: A Task Characterization

Poster session presented at EuroVis 2016

Supplementary material

Endnotes