PK-Clustering: Integrating Prior Knowledge in Mixed-Initiative Social Network Clustering

Alexis Pister, Paolo Buono, Jean-Daniel Fekete, Catherine Plaisant and Paola Valdivia

Time and Place: Friday, 02.07., 14:20–14:40, Room 1
Session: Software Demos

Keywords: Social network analysis; Network visualization; Clustering; Mixed-initiative; Prior knowledge; User interface

We present a web application that implements PK-Clustering, a new approach that helps social scientists create meaningful clusters in social networks. Clustering algorithms can help social scientists to perform high level analysis on their data or test hypothesis about how people connect. Most of the current clustering systems do not provide any guidance in choosing the best algorithm, or evaluating the results. Furthermore, very often the Prior Knowledge (PK) the user has on the data is not considered in the process. PK-Clustering allows users to interactively build clusters based on their knowledge.

The process of PK-Clustering is the following: Initially, the user specifies their PK in the form of partial groups. Then, a dozen of clustering algorithms are run on the graph data. The results are then summarized and shown ranked by how well they match the prior knowledge. After that, the user selects a set of algorithms to use in consolidation phase.

During the consolidation phase, the user validates the clusters that include the PK and extra clusters created by the system. To do so, PK-clustering shows the results of every selected algorithm by aligning clusters as columns and nodes as rows; in the intersection, a diamond colored by cluster is added. Next to this grid, the graph is shown using the PAOH representation. Users can interactively build a consolidated partition based on the consensus of the algorithms, the graph information and their knowledge of the data.

Once a satisfactory partition is found, a summary table and a summary report is generated. The table provides the results of the selected algorithms for every person, along with the final consolidated labels and their validation information. The report contains statistics on the number of executed and selected algorithms and on the validation of results.This allows analysts to report their results in a transparent manner.

We gave PK-Clustering to a fellow historian who is studying a social network of merchants in the XVIIth century. She had the hypothesis that the network was divided into three groups. She entered 3 partial groups as PK and 9 out of 13 algorithms produced a perfect match (the 3 partial groups were subsets of 3 larger groups). She selected such 9 algorithms and consolidated the 3 groups, mostly using the consensus of the algorithms and the shape of the graph. She then found and consolidated a fourth group reviewing more in depth the results of one of the algorithms (ilouvain_time). The final validated partition was constituted of 4 groups and was satisfactory to her.

PK-clustering provides a systematic approach for building “a clustering that is supported by algorithms and validated, fully or partially, by social scientists according to their prior knowledge”.PK-Clustering is a web application integrated in the PAOHVis system released with a BSD 3-Clause License. Details on how to use PK-Clustering can be found at