Comparing Clustering Results

03/23/2023

New DFG-funded project “CoCo” on visual cluster analysis, joint research with the Luxembourg Institute of Science and Technology (LIST).

CoCo: Comparative and Collaborative Visual Analysis of Clustering and Co-clustering Ensembles

Funded by: Deutsche Forschungsgemeinschaft (DFG) in collaboration with Fonds National de la Recherche (FNR, Luxembourg)

Funding period: 2023–2026

Project partners: Fabian Beck (University of Bamberg) and Mohammad Ghoniem (LIST)

Abstract

Clustering automatically groups complex data items and creates abstractions based on data similarity. However, different criteria to compute entity similarity as well as different clustering algorithms are available, all producing a different grouping of the entities. The resulting variety of clustering ensembles is usually tried to be minimized, however, we see value and opportunities in the diversity. Only human experts can judge the quality of the results and choose among the multiple truths reflected in them. They can consider the specifics of the respective use case and assemble an overarching grouping based on various results. Collaboration of multiple experts might be commendable to embrace a broader set of expertise.

To support human experts, we propose novel methods to visualize and visually compare such ensembles of clustering results. Our methods go beyond state-of-the-art approaches for visual comparison by addressing more complex scenarios regarding the characteristics of cluster structures, their contextual embedding in other data, and collaborative analysis. We want to investigate not only simple partitions of items, but overlapping groups and clusters embedded into contextual network structures. A specific focus are co-clusters, which also structure the data variables along the entities and thereby build groupings that are easier to understand. For instance, distinct topics can be identified from a set of scientific documents and, together with included terms, form co-clusters, which can be further contextualized by considering their underlying citation network.

The specific objectives of the project include exploring the design space of modeling such ensembles of clusters, and then investigating means to visually encode them for effective and contextualized comparison. While making technical contributions in the form of new visualization techniques, we will also develop guidelines, task taxonomies, and systematic literature reviews. Application examples centered around the analysis of document collections and biological omics data ground the techniques in real-world tasks and allow working with different application experts. A complementing focus is facilitating the collaborative visual analysis of the clustering ensembles, for instance, displayed on a wall-sized screen enabled with multi-user interaction support.