Data

  • Career network data for PhDs in computer science
    Description: This dataset comprises two anonymized networks from our study on post-PhD careers in computing. The first is a weighted, directed, temporal network that represents transitions between employers. The second is a bipartite graph connecting employees and employers.
    Reference: Career Transitions and Trajectories: A Case Study in Computing. Tara Safavi, Maryam Davoodi, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2018 [PDF]

Code

  • GLIMPSE code
    Description: We propose personalized summaries of large encyclopedic knowledge graphs containing the facts most relevant to individual users’ interests.
    Reference: Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Müller, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2019 [PDF]

  • RGM code
    Description: Given any set of node embeddings for a graph, we propose a scalable, principled method for computing a feature descriptor for the entire graph that captures the distribution of its nodes’ embeddings in vector space.
    Reference: Distribution of Node Embeddings as Multiresolution Features for Graphs. Mark Heimann, Tara Safavi, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2019 [PDF]

  • node2bits code
    Description: We propose an efficient framework that represents multi-dimensional features of node contexts with binary hashcodes to handle the task of visitor stitching, i.e., identifying and matching various online references to the same user in real-world web services.
    Reference: node2bits: Compact Time- and Attribute-aware Node Representations for User Stitching. Di Jin, Mark Heimann, Ryan Rossi, Danai Koutra. In Proceedings of the ECML/PKDD European Conference on Principles and Practice of Knowledge Discovery in Databases, September 2019 [PDF]

  • MultiLENS code
    Description: An inductive framework that derives representation independent of graph sizes while retaining the ability to compute node embeddings on the fly.
    Reference: Latent Network Summarization: Bridging Network Embedding and Summarization. Di Jin, Ryan Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]

  • EMBER code
    Description: A fast node embedding method that incorporates both graph directionality and edge weights. We show its application on inferring professional hierarchy of employees across companies.
    Reference: Smart Roles: Inferring Professional Roles in Email Networks. Di Jin*, Mark Heimann*, Tara Safavi, Mengdi Wang, Wei Lee, Lindsay Snider, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]

  • GroupINN code
    Description: In this work, we have developed a graph neural network model that can provide interpretable results beyond fast and accurate graph classification.
    Reference: GroupINN: Grouping-based Interpretable Neural Network for Classification of Limited, Noisy Brain Data. Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada, Danai Koutra. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2019 [PDF]

  • REGAL code
    Description: We develop a scalable algorithm that uses learned structural node embeddings to match nodes across multiple graphs.
    Reference: REGAL: Representation Learning-based Graph Alignment. Mark Heimann, Haoming Shen, Tara Safavi, Danai Koutra. In ACM Conference on Information and Knowledge Management (CIKM), October 2018 [PDF]

  • HashAlign code
    Description: We propose a locality-sensitive hashing (LSH) framework for matching nodes across multiple undirected, weighted, and/or attributed graphs.
    Reference: HashAlign: Hash-based Alignment of Multiple Graphs. Mark Heimann, Wei Lee, Shengjie Pan, Kuan-Yu Chen, Danai Koutra. In Proceedings of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 2018 [PDF]

  • FlowR code
    Description: We introduce a two-step divide-and-conquer approach to solving linear systems in settings where many queries need to be handled. Our parallelizable method, FlowR, uses a one-time message exchange between subproblems. We further speed up our proposed method by extending our formulation to carefully designed overlapping subproblems (FlowR-OV) and by leveraging the strengths of iterative methods (FlowR-Hyb).
    Reference: Fast Flow-based Random Walk with Restart in a Multi-query Setting. Yujun Yan, Mark Heimann, Di Jin, Danai Koutra. In Proceedings of the 2018 SIAM International Conference on Data Mining (SDM), February 2018 [PDF]

  • EAGLE code
    Description: We introduce EAGLE (Exploratory Analysis of Graphs with domain knowLEdge), a novel method that creates interpretable, feature-based, and domain-specific graph summaries in a fully automatic way.
    Reference: Exploratory Analysis of Graph Data by Leveraging Domain Knowledge. Di Jin, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2017 [PDF]

  • ABC-LSH code
    Description: We propose a fast network discovery approach from time series based on ABC, a new locality-sensitive hashing (LSH) family, which randomly selects and matches time series subsequences.
    Reference: Scalable Hashing-based Network Discovery. Tara Safavi, Chandra Sripada, Danai Koutra. In IEEE International Conference on Data Mining (ICDM), November 2017 [PDF]

  • Perseus-Hub code
    Description: Perseus-Hub is an interactive graph summarization and anomaly detection system designed to help practitioners understand their data.
    Reference: PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs. Di Jin, Aristotelis Leventidis, Haoming Shen, Ruowang Zhang, Junyue Wu, Danai Koutra. In Informatics, June 2017 [PDF]

  • VoG: Vocabulary-based summarization of Graphs code
    Description: We propose VoG, which summarizes a graph via the subgraphs that describe it best. To do so, we use the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph.
    Reference: VOG: Summarizing and Understanding Large Graphs. Danai Koutra, U Kang, Jilles Vreeken, Christos Faloutsos. In Proceedings of the 2014 SIAM International Conference on Data Mining (SDM), April 2014 [PDF]

Demos

  • ConDeNSe demo
    Description: We propose ConDeNSe (Conditional Diversified Network Summarization), a Minimum Description Length-based method that summarizes a given graph with approximate ‘supergraphs’ conditioned on diverse, predefined structural patterns.
    Reference: Reducing large graphs to small supergraphs: a unified approach. Yike Liu, Tara Safavi, Neil Shah, Danai Koutra. In Social Network Analysis and Mining (SNAM), February 2018 [PDF]