Principal Investigator: Danai Koutra
NSF Project Website: Timely Insights: Interpretable, Multi-scale Summarization of Networks over Time
Evolving network data occur in almost all disciplines. For example, knowledge or facts are often structured into knowledge graphs, brain activity is represented via functional networks, and neural networks can be seen as evolving graph structures. This project aims to develop computational methods and models to summarize, explain, and provide insights into massive data (and their underlying dynamic processes) at multiple scales in a broad range of domains. Focusing on knowledge graphs makes it possible to achieve on-device and privacy-preserving analytics (e.g., on intelligent assistants). Modeling neural networks is expected to give insights into their interpretability and reduce their massive training computational cost. Through collaborations with experts in neuroscience, this research will contribute to decoding the brain, with a potential impact on mental development and disease detection. A significant part of this project is a plan for integrating research with education. Its overarching theme is to increase diversity in computer and data science, and engage students in graph mining research and its real-life applications via: introducing undergraduate and graduate data mining classes; mentoring students on data science projects for social good; organizing a workshop to attract undergraduates from diverse backgrounds to graduate school; and organizing a high-school data science summer camp centered around social media and networks, a theme that is a successful introduction to network science.
Network summarization, which identifies structure and meaning in large-scale data, so far has mostly focused on non-complex, static data. This project aims to bridge the gap between network summarization research and real-world problems by introducing novel problem formulations in summarization (including for tasks that have not been previously viewed as graph problems) as well as theoretical analyses, unifying theories, and a suite of new, interpretable methods and scalable algorithms. It pursues three research tasks related to network evolution at different scales. At the network scale, the first task focuses on efficient, supervised or semi-supervised summarization of evolving and semantically-rich graph data (e.g., heterogeneous). At the multi-network scale, the second task introduces interpretable methods for modeling and understanding collections of evolving networks and their joint underlying physical processes, which is an under-studied problem in data mining. Via academic and industrial collaborations, the third task explores new applications in knowledge graphs, neuroscience, deep neural networks, and social sciences. The project is expected to advance the foundations of exploratory analysis of evolving data. Its outcomes will be disseminated through publications, tutorials, workshops, as well as open-source tools, code and datasets.
This proposal aims to bridge the gap between network summarization research and real-world problems by introducing novel problem formulations in summarization that reflect the requirements of high-impact applications, and by complementing them with a suite of interpretable and scalable methods.
For detailed explanations of the projects, please refer to Data and Code Section. Click on the project name below to go to each project’s GitHub repository.
Caleb Belth, Xinyi (Carol) Zheng, Jilles Vreeken, Danai Koutra. What is normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization. The Web Conference (WWW), April 2020.
Tara Safavi, Caleb Belth, Lukas Faber, Davide Mottin, Emmanuel Müller, Danai Koutra. Personalized Knowledge Graph Summarization: From the Cloud to Your Pocket. IEEE International Conference on Data Mining (ICDM), 10 pages, November 2019.
Tara Safavi, Adam Fourney, Robert Sim, Marcin Juraszek, Shane Williams, Ned Friend, Danai Koutra, Paul Bennett. Toward Activity Discovery in the Personal Web. ACM International Conference on Web Search and Data Mining (WSDM), 2020.
Di Jin, Ryan A. Rossi, Eunyee Koh, Sungchul Kim, Anup Rao, Danai Koutra. Latent Network Summarization: Bridging Network Embedding and Summarization. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 2 pages reproducibility appendix, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs.
Yujun Yan, Jiong Zhu, Marlena Duda, Eric Solarz, Chandra Sripada, Danai Koutra. GroupINN: Grouping-based Interpretable Neural Network-based Classification of Limited, Noisy Brain Data. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 1 page reproducibility appendix, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs.
Mark Heimann, Tara Safavi, Danai Koutra. Distribution of Node Embeddings as Multiresolution Features for Graphs. IEEE International Conference on Data Mining (ICDM), 10 pages, November 2019. Best student paper award
Di Jin, Mark Heimann, Ryan Rossi, Danai Koutra. node2bits: Compact Time- and Attribute-aware Node Representations. ECML/PKDD European Conference on Principles and Practice of Knowledge Discovery in Databases, 16 pages, September 2019.
Yike Liu, Linhong Zhu, Pedro Szekely, Aram Galstyan, Danai Koutra. Coupled Clustering of Time-Series and Networks. SIAM International Conference on Data Mining (SDM), 9 pages (+4 pages supplementary material), May 2019.
Caleb Belth, Fahad Kamran, Donna Tjandra, Danai Koutra. When to Remember Where You Came from: Node Representation Learning in Higher-order Networks. IEEE/ACM International Conference on Social Networks Analysis and Mining (ASONAM), 4 pages, August 2019. Also accepted for presentation at the 15th SIGKDD International Workshop on Mining and Learning with Graphs
Di Jin, Mark Heimann, Tara Safavi, Mengdi Wang, Wei Lee, Lindsay Snider, Danai Koutra. Smart Roles: Inferring Professional Roles in Email Networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 9 pages + 1 page reproducibility appendix, August 2019.