Data, code, and demos

Data

Career network data for PhDs in computer science
Description: This dataset comprises two anonymized networks from our study on post-PhD careers in computing. The first is a weighted, directed, temporal network that represents transitions between employers. The second is a bipartite graph connecting employees and employers.
Reference: Career Transitions and Trajectories: A Case Study in Computing. Tara Safavi, Maryam Davoodi, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2018 [PDF]

Code

CPGNN code
Description: CPGNN is a novel GNN framework that introduces a learnable compatibility matrix to model heterophily or homophily in graphs. This allows it to go beyond the homophily assumption of existing GNNs. Theoretically, it reduces to GCN for homophily graphs. CPGNN achieves state-of-the-art results in heterophily settings while maintaining performance in homophily settings, demonstrating effectiveness with less training data compared to previous works.
Reference: Graph Neural Networks with Heterophily. Jiong Zhu, Ryan Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen Ahmed, Danai Koutra. In AAAI, 2022
H2GCN code
Description: H2GCN is a graph neural network (GNN) designed to address the limitations of existing GNNs in semi-supervised node classification tasks under heterophily or low homophily conditions.
Reference: Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs. Jiong Zhu, Junchen Jin, Donald Loveland, Michael Schaub, Danai Koutra. In NeurIPS, 2020
GraphSSL code
Description: The research analyzes the effectiveness of self-supervised learning (SSL) methods, particularly contrastive learning (CL), on graph datasets with discrete and non-Euclidean nature. It investigates the data-centric properties of generic graph augmentations (GGAs) used in graph SSL, revealing their limitations and need for task-relevant augmentations. The study introduces a synthetic data generation process enabling control over task-relevant information and optimal augmentations, serving as a benchmark. Through analysis and evaluation, it demonstrates that GGAs do not induce task-relevant invariances on common datasets, leading to marginal gains over naive baselines. The synthetic benchmark helps identify limitations in advanced augmentation techniques, contextualizing the effects of data-centric properties on augmentation strategies for graph SSL.
Reference: Analyzing Data - Centric Properties for conservative Learning on Graphs. Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, Jay Thiagarajan. In Advances in Neural Information Processing Systems(NeurIPS), 2022
GCLAugmentation code
Description: Unsupervised graph representation learning is critical for applications with scarce labels. The study probes the quality of representations learned by popular graph contrastive learning (CL) frameworks using domain-agnostic graph augmentations (DAGAs). It finds that DAGAs can destroy task-relevant information and harm the ability to learn discriminative representations. On small benchmark datasets, the inductive bias of graph neural networks can significantly compensate for this weak discriminability. The work proposes sanity checks to assess the quality of learned representations and a strategy for designing task-aware augmentations amenable to graph CL, demonstrating its efficacy on large-scale, complex graph applications, improving accuracy by up to 20% in graph-based document classification.
Reference: Augumentations in Graph Conservative Learning: Current Methodological Flaws and Towards Better Practices. Puja Trivedi, Ekdeep Singh Lubana, Yujun Yan, Yaoqing Yang, Danai Koutra. In ACM The Web Conference(formerly WWW), 2022
TRG code
Description: Temporal graph embedding has been widely studied due to its superiority in tasks such as prediction and recommendation. Despite the advancement in algorithms and novel frameworks such as deep learning, there has been relatively little work on systematically studying the properties of temporal network models and their cornerstones, the graph time-series representations that are used in these approaches. This paper aims to fill this gap by introducing a general framework that extends an arbitrary existing static embedding approach to handle dynamic tasks, and conducting a systematic study of seven base static embedding methods and six temporal network models.
Reference: On Generalizing Static Node Embedding to Dynamic Settings. Di Jin, Ryan Rossi, Sungchul Kim, Danai Koutra. In The Fifteenth International Conference on Web Search and Data Mining (WSDM), October 2021 [PDF]
TRW code
Description: In-depth study on the scalable graph learning algorithm based on temporal random walk, which operates on dynamic input graphs and has attracted less attention in the architecture community compared to GCN. We propose high-performance CPU and GPU implementations of two important graph learning tasks, that cover a broad class of applications, using random walks on continuous-time dynamic graphs: link prediction and node classification. We show that the resulting workload exhibits distinct characteristics, measured in terms of irregularity, core and memory utilization, and cache hit rates, compared to graph traversals, deep learning, and GCN. We further conduct an in-depth performance analysis focused on both algorithm and hardware to guide future software optimization and architecture exploration. The algorithm-focused study presents a rich trade-off space between algorithmic performance and runtime complexity to identify optimization opportunities.
Reference: A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning. Nishil Talati, Di Jin, Haojie Ye, Ajay Brahmakshatriya, Saman Amarasinghe, Trevor Mudge, Danai Koutra, Ronald Dreslinski. In The the 2021 IEEE International Symposium on Workload Characterization (IISWC), November 2021 [PDF]
RefiNA code
Description: Unsupervised method for refining initial alignments between the nodes of two graphs. RefiNA’s goal is to increase the matched neighborhood consistency of the alignment solution produced by any of the many existing methods for network alignment, thereby improving its quality.
Reference: Refining Network Alignment to Improve Matched Neighborhood Consistency. Mark Heimann, Xiyuan Chen, Fatemeh Vahedian, Danai Koutra. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), April 2021 [PDF]
PENminer code
Description: We propose the problem of mining activity that persists through time in continually evolving networks. We extend the notion of temporal motifs to capture activity among specific nodes, in what we call activity snippets, which are small sequences of edge-updates that reoccur. We propose axioms and properties that a measure of persistence should satisfy, and develop such a persistence measure. Then, we propose PENminer, an efficient framework for mining activity snippets’ Persistence in Evolving Networks, and design both offline and streaming algorithms.
Reference: Mining Persistent Activity in Continually Evolving Networks. Caleb Belth, Carol Xinyi Zheng, Danai Koutra. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), May 2020 [PDF]
KGist code
Description: We introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing.
Reference: What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. Caleb Belth, Carol Xinyi Zheng, Jilles Vreeken, Danai Koutra. In The WEB Conference, February 2020 [PDF]
GLIMPSE code
Description: We propose personalized summaries of large encyclopedic knowledge graphs containing the facts most relevant to individual users’ interests.
Reference: Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Müller, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2019 [PDF]
RGM code
Description: Given any set of node embeddings for a graph, we propose a scalable, principled method for computing a feature descriptor for the entire graph that captures the distribution of its nodes’ embeddings in vector space.
Reference: Distribution of Node Embeddings as Multiresolution Features for Graphs. Mark Heimann, Tara Safavi, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2019 [PDF]
node2bits code
Description: We propose an efficient framework that represents multi-dimensional features of node contexts with binary hashcodes to handle the task of visitor stitching, i.e., identifying and matching various online references to the same user in real-world web services.
Reference: node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching. Di Jin, Mark Heimann, Ryan Rossi, Danai Koutra. In Proceedings of the ECML/PKDD European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2019 [PDF]
MultiLENS code
Description: An inductive framework that derives representation independent of graph sizes while retaining the ability to compute node embeddings on the fly.
Reference: Latent Network Summarization: Bridging Network Embedding and Summarization. Di Jin, Ryan Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, Danai Koutra. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]
EMBER code
Description: A fast node embedding method that incorporates both graph directionality and edge weights. We show its application on inferring professional hierarchy of employees across companies.
Reference: Smart Roles: Inferring Professional Roles in Email Networks. Di Jin*, Mark Heimann*, Tara Safavi, Mengdi Wang, Wei Lee, Lindsay Snider, Danai Koutra. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]
GroupINN code
Description: In this work, we have developed a graph neural network model that can provide interpretable results beyond fast and accurate graph classification.
Reference: GroupINN: Grouping-based Interpretable Neural Network for Classification of Limited, Noisy Brain Data. Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]
REGAL code
Description: We develop a scalable algorithm that uses learned structural node embeddings to match nodes across multiple graphs.
Reference: REGAL: Representation Learning-based Graph Alignment. Mark Heimann, Haoming Shen, Tara Safavi, Danai Koutra. In ACM Conference on Information and Knowledge Management (CIKM), October 2018 [PDF]
HashAlign code
Description: We propose a locality-sensitive hashing (LSH) framework for matching nodes across multiple undirected, weighted, and/or attributed graphs.
Reference: HashAlign: Hash-based Alignment of Multiple Graphs. Mark Heimann, Wei Lee, Shengjie Pan, Kuan-Yu Chen, Danai Koutra. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 2018 [PDF]
FlowR code
Description: We introduce a two-step divide-and-conquer approach to solving linear systems in settings where many queries need to be handled. Our parallelizable method, FlowR, uses a one-time message exchange between subproblems. We further speed up our proposed method by extending our formulation to carefully designed overlapping subproblems (FlowR-OV) and by leveraging the strengths of iterative methods (FlowR-Hyb).
Reference: Fast Flow-based Random Walk with Restart in a Multi-query Setting. Yujun Yan, Mark Heimann, Di Jin, Danai Koutra. In Proceedings of the 2018 SIAM International Conference on Data Mining (SDM), February 2018 [PDF]
EAGLE code
Description: We introduce EAGLE (Exploratory Analysis of Graphs with domain knowLEdge), a novel method that creates interpretable, feature-based, and domain-specific graph summaries in a fully automatic way.
Reference: Exploratory Analysis of Graph Data by Leveraging Domain Knowledge. Di Jin, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2017 [PDF]
ABC-LSH code
Description: We propose a fast network discovery approach from time series based on ABC, a new locality-sensitive hashing (LSH) family, which randomly selects and matches time series subsequences.
Reference: Scalable Hashing-based Network Discovery. Tara Safavi, Chandra Sripada, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2017 [PDF]
Perseus-Hub code
Description: Perseus-Hub is an interactive graph summarization and anomaly detection system designed to help practitioners understand their data.
Reference: PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs. Di Jin, Aristotelis Leventidis, Haoming Shen, Ruowang Zhang, Junyue Wu, Danai Koutra. In Informatics, June 2017 [PDF]
VoG: Vocabulary-based summarization of Graphs code
Description: We propose VoG, which summarizes a graph via the subgraphs that describe it best. To do so, we use the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph.
Reference: VOG: Summarizing and Understanding Large Graphs. Danai Koutra, U Kang, Jilles Vreeken, Christos Faloutsos. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM), April 2014 [PDF]

Demos

ConDeNSe demo
Description: We propose ConDeNSe (Conditional Diversified Network Summarization), a Minimum Description Length-based method that summarizes a given graph with approximate ‘supergraphs’ conditioned on diverse, predefined structural patterns.
Reference: Reducing large graphs to small supergraphs: a unified approach. Yike Liu, Tara Safavi, Neil Shah, Danai Koutra. In Social Network Analysis and Mining (SNAM), February 2018 [PDF]