NSF CAREER Project #1845491

modified last on Apr 23, 2022.

Principal Investigator: Danai Koutra

NSF Project Website: Timely Insights: Interpretable, Multi-scale Summarization of Networks over Time

Evolving network data occur in almost all disciplines. For example, knowledge or facts are often structured into knowledge graphs, brain activity is represented via functional networks, and neural networks can be seen as evolving graph structures. This project aims to develop computational methods and models to summarize, explain, and provide insights into massive data (and their underlying dynamic processes) at multiple scales in a broad range of domains. Focusing on knowledge graphs makes it possible to achieve on-device and privacy-preserving analytics (e.g., on intelligent assistants). Modeling neural networks is expected to give insights into their interpretability and reduce their massive training computational cost. Through collaborations with experts in neuroscience, this research will contribute to decoding the brain, with a potential impact on mental development and disease detection. A significant part of this project is a plan for integrating research with education. Its overarching theme is to increase diversity in computer and data science, and engage students in graph mining research and its real-life applications via: introducing undergraduate and graduate data mining classes; mentoring students on data science projects for social good; organizing a workshop to attract undergraduates from diverse backgrounds to graduate school; and organizing a high-school data science summer camp centered around social media and networks, a theme that is a successful introduction to network science.

Summarization Task

Network summarization, which identifies structure and meaning in large-scale data, so far has mostly focused on non-complex, static data. This project aims to bridge the gap between network summarization research and real-world problems by introducing novel problem formulations in summarization (including for tasks that have not been previously viewed as graph problems) as well as theoretical analyses, unifying theories, and a suite of new, interpretable methods and scalable algorithms. It pursues three research tasks related to network evolution at different scales. At the network scale, the first task focuses on efficient, supervised or semi-supervised summarization of evolving and semantically-rich graph data (e.g., heterogeneous). At the multi-network scale, the second task introduces interpretable methods for modeling and understanding collections of evolving networks and their joint underlying physical processes, which is an under-studied problem in data mining. Via academic and industrial collaborations, the third task explores new applications in knowledge graphs, neuroscience, deep neural networks, and social sciences. The project is expected to advance the foundations of exploratory analysis of evolving data. Its outcomes will be disseminated through publications, tutorials, workshops, as well as open-source tools, code and datasets.

This proposal aims to bridge the gap between network summarization research and real-world problems by introducing novel problem formulations in summarization that reflect the requirements of high-impact applications, and by complementing them with a suite of interpretable and scalable methods.

Students

Caleb Belth (PhD)
Jiong Zhu (PhD)
Jing Zhu (PhD)
Donald Loveland (PhD)
Puja Trivedi (PhD)

Alumni

Arya Farahi (PostDoc)
Fatemeh Vahedian (PostDoc)
Mark Heimann (PhD)
Di Jin (PhD)
Alican Buyukcakir (PhD)
Tara Safavi (PhD)
Yujun Yan (PhD)
Xinyi (Carol) Zheng (UG)
Xiyuan Chen (UG)
Xingyu Lu (UG)
Ruiyu Li (UG)

Code

For detailed explanations of the projects, please refer to Data and Code Section. Click on the project name below to go to each project’s GitHub repository.

Publications

Puja Trivedi, Ekdeep Singh Lubana, Mark Heimann, Danai Koutra, Jayaraman J. Thiagarajan. Analyzing Data-Centric Properties for Graph Contrastive Learning. International Conference on Neural Information Processing Systems (NeurIPS’22), 8 pages, December 2022.
Fatemeh Vahedian, Ruiyu Li, Puja Trivedi, Di Jin, Danai Koutra. Leveraging the Graph Structure of Neural Network Training Dynamics. ACM International Conference on Information and Knowledge Management (CIKM’22), 4 pages, 2022.
Jing Zhu, Danai Koutra, Mark Heimann. CAPER: Coarsen, Align, Project, Refine - A General Multilevel Framework for Network Alignment. ACM International Conference on Information and Knowledge Management (CIKM’22), 4 pages, 2022.
Yujun Yan, Milad Hashemi, Kevin Swersky, Yaoqing Yang, Danai Koutra. Two Sides of the Same Coin: Heterophily and Oversmoothing in Graph Convolutional Neural Networks IEEE International Conference on Data Mining (ICDM’22), 6 pages, 2022.
Jiong Zhu, Junchen Jin, Donald Loveland, Michael T. Schaub, Danai Koutra. How does Heterophily Impact the Robustness of Graph Neural Networks?: Theoretical Connections and Practical Implications ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’22), 9 pages + 1 page reproducibility appendix, 2022.
Puja Trivedi, Ekdeep Singh Lubana, Yujun Yan, Yaoqing Yang and Danai Koutra. The Role of Augmentations in Graph Contrastive Learning: Current Methodological Flaws & Improved Practices The Web Conference (The WebConf), 2022.
Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra. Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation. In Proceedings of the VLDB Endowment (VLDB), 2022.
Di Jin, Sungchul Kim, Ryan A. Rossi, Danai Koutra. On Generalizing Static Node Embedding to Dynamic Settings. ACM International Conference on Web Search and Data Mining (WSDM), 2022.
Junchen Jin, Mark Heimann, Di Jin, Danai Koutra. Towards Understanding and Evaluating Structural Node Embeddings. The ACM Transactions on Knowledge Discovery from Data (TKDD), 2022.
Marlena Duda, Danai Koutra, Chandra Sripada. Validating dynamicity in resting state fMRI with activation-informed temporal segmentation. Human Brain Mapping 1;42(17):5718-5735, Dec 2021.
Tara Safavi, Danai Koutra. Relational World Knowledge Representation in Contextual Language Models: A Review. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Tara Safavi, Jing Zhu, Danai Koutra. NEGATER: Unsupervised Discovery of Negatives in Commonsense Knowledge Bases. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Nishil Talati, Di Jin, Haojie Ye, Ajay Brahmakshatriya, Ganesh Dasika, Saman Amarasinghe, Trevor Mudge, Danai Koutra, Ronald Dreslinski. A Deep Dive Into Understanding The Random Walk-Based Temporal Graph Learning. IEEE International Symposium on Workload Characterization (IISWC), 2021.
Mark Heimann, Xiyuan Chen, Fatemeh Vahedian, and Danai Koutra. Refining Network Alignment to Improve Matched Neighborhood Consistency. SIAM International Conference on Data Mining (SDM), 2021.
Jing Zhu*, Xingyu Lu*, Mark Heimann, and Danai Koutra. Node Proximity is All You Need: A Unified Framework for Proximity-Preserving and Structural Node and Graph Embedding. SIAM International Conference on Data Mining (SDM), 2021.
Jiong Zhu, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Nesreen K. Ahmed, Danai Koutra. Graph Neural Networks with Heterophily. AAAI Conference on Artificial Intelligence (AAAI’21), February 2021.
Mark Heimann. Unsupervised Structural Embedding Methods for Efficient Collective Network Mining. Doctoral Dissertation, University of Michigan. 2020.
Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu, Danai Koutra. Generalizing Graph Neural Networks Beyond Homophily. International Conference on Neural Information Processing Systems (NeurIPS’20), December 2020.
Tara Safavi, Danai Koutra, Edgar Meij. Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction. Conference on Empirical Methods in Natural Language Processing (EMNLP’20), November 2020. (long paper)
Tara Safavi, Danai Koutra. CoDEx: A Comprehensive Knowledge Graph Completion Benchmark. Conference on Empirical Methods in Natural Language Processing (EMNLP’20), November 2020. (long paper)
Caleb Belth, Alican Büyükcakir, Danai Koutra. A Hidden Challenge of Link Prediction: Which Pairs to Check? IEEE International Conference on Data Mining (ICDM’20), November 2020. (long paper, acceptance rate 9.8%) Best paper candidate
Josh Gardner, Jawad Mroueh, Natalia Jenuwine, Noah Weaverdyck, Samuel Krassenstein, Arya Farahi, Danai Koutra. Modeling and Predicting Multidimensional Patterns in Fleet Maintenance Data Towards Better Municipal Vehicle Management. Data Science and Advanced Analytics (DSAA’20), October 2020.
Xiyuan Chen, Mark Heimann, Fatemeh Vahedian, Danai Koutra. Consistent Network Alignment with Node Embedding. ACM International Conference on Information and Knowledge Management (CIKM’20), October 2020.
Wenjie Feng, Shenghua Liu, Danai Koutra, Huawei Shen, Xueqi Cheng. SpecGreedy: Unified Dense Subgraph Detection. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’20), September 2020 (acceptance rate 19%) Data Mining Best Student Paper Award
Caleb Belth, Xinyi (Carol) Zheng, Danai Koutra. Mining Persistent Activity in Continually Evolving Networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 2020 (acceptance rate 17%)
Ryan A. Rossi, Di Jin, Sungchul Kim, Nesreen K. Ahmed, Danai Koutra, John Boaz Lee. On Proximity and Structural Role-based Embeddings in Networks: Misconceptions, Techniques, and Applications. Transactions on Knowledge Discovery from Data (TKDD), April 2020.
Caleb Belth, Xinyi (Carol) Zheng, Jilles Vreeken, Danai Koutra. What is normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. The Web Conference (WWW), April 2020.
Tara Safavi, Adam Fourney, Robert Sim, Marcin Juraszek, Shane Williams, Ned Friend, Danai Koutra, Paul Bennett. Toward Activity Discovery in the Personal Web. ACM International Conference on Web Search and Data Mining (WSDM), 2020.
Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Müller, Danai Koutra. Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. IEEE International Conference on Data Mining (ICDM), 10 pages, November 2019.
Mark Heimann, Tara Safavi, Danai Koutra. Distribution of Node Embeddings as Multiresolution Features for Graphs. IEEE International Conference on Data Mining (ICDM), 10 pages, November 2019. Best student paper award
Caleb Belth, Fahad Kamran, Donna Tjandra, Danai Koutra. When to Remember Where You Came from: Node Representation Learning in Higher-order Networks. IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM), 4 pages, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs
Di Jin, Mark Heimann, Ryan Rossi, Danai Koutra. node2bits: Compact Time- and Attribute-aware Node Representations. ECML/PKDD European Conference on Principles and Practice of Knowledge Discovery in Databases, 16 pages, September 2019.
Di Jin, Ryan A. Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, Danai Koutra. Latent Network Summarization: Bridging Network Embedding and Summarization. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 2 pages reproducibility appendix, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs.
Di Jin*, Mark Heimann*, Tara Safavi, Mengdi Wang, Wei Lee, Lindsay Snider, Danai Koutra. Smart Roles: Inferring Professional Roles in Email Networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 1 page reproducibility appendix, August 2019.
Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada, Danai Koutra. GroupINN: Grouping-based Interpretable Neural Network-based Classification of Limited, Noisy Brain Data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 1 page reproducibility appendix, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs.
Yike Liu, Linhong Zhu, Pedro Szekely, Aram Galstyan, Danai Koutra. Coupled Clustering of Time-Series and Networks. SIAM International Conference on Data Mining (SDM), 9 pages (+4 pages supplementary material), May 2019.

Tutorials

Network Embedding for Role Discovery: Concepts, Tools, and Applications, April 2022. SIAM SDM 2022. (with Mark Heimann, and Junchen Jin)
Interpretable Network Representations, April 2022. The Web Conference 2022. (with Shengmin Jin, and Reza Zafarani)