## Abstract

In many scientific domains, graphs are the objects of choice to represent structured data: from molecules to social networks, power grids, the internet, and so on. The exploitation of graph data represents a major scientific and industrial challenge. Graph Machine Learning (Graph ML) is thus a fast-growing field, with so-called Graph Neural Networks (GNN) at the forefront.

However, in sharp contrast with traditional ML, the field of GML has somewhat jumped from early methods to deep learning, without the decades-long development of well-established notions to compare, analyze and improve algorithms. As a result, GNNs have limitations, both practical and theoretical, and it is not clear how to address them. Practical results may vary wildly depending on the architecture and datasets, with no guidelines on how to design reliable GNNs in each case. Overall, these are the symptoms of a major issue: Graph ML is somewhat lacking fundamental theory.

The ambition of project MALAGA is to develop such a theory. Solving the crucial limitations of the current theory is highly challenging: fundamental mathematical tools in cannot analyze the learning capabilities of Graph ML methods in a unified way (e.g., graph nodes are not iid), existing *statistical graph models* do not faithfully represent the many characteristics of modern graph data (especially node features and their relationship with graph structure in homophilic and heterophilic graphs), and computational complexity may become problematic on large graphs. MALAGA will develop a radically new understanding of GML problems, and of the strengths and limitations of a large panel of algorithms.

However, in sharp contrast with traditional ML, the field of GML has somewhat jumped from early methods to deep learning, without the decades-long development of well-established notions to compare, analyze and improve algorithms. As a result, GNNs have limitations, both practical and theoretical, and it is not clear how to address them. Practical results may vary wildly depending on the architecture and datasets, with no guidelines on how to design reliable GNNs in each case. Overall, these are the symptoms of a major issue: Graph ML is somewhat lacking fundamental theory.

The ambition of project MALAGA is to develop such a theory. Solving the crucial limitations of the current theory is highly challenging: fundamental mathematical tools in cannot analyze the learning capabilities of Graph ML methods in a unified way (e.g., graph nodes are not iid), existing *statistical graph models* do not faithfully represent the many characteristics of modern graph data (especially node features and their relationship with graph structure in homophilic and heterophilic graphs), and computational complexity may become problematic on large graphs. MALAGA will develop a radically new understanding of GML problems, and of the strengths and limitations of a large panel of algorithms.

## Jobs

Many job offers (PhD and post-docs) will be posted here in a near-future, in the meantime do not hesitate to contact me if you are interested in the topic!

## Associated papers

- N. Keriven, S. Vaiter.
**What functions can Graph Neural Networks compute on random graphs? The role of Positional Encoding***NeurIPS*2023. Pdf - N. Keriven.
**Not too little, not too much: a theoretical analysis of graph (over)smoothing***NeurIPS*2022. Pdf - N. Keriven, A. Bietti, S. Vaiter.
**On the Universality of Graph Neural Networks on Large Random Graphs***NeurIPS*2021. Pdf - N. Keriven, A. Bietti, S. Vaiter.
**Convergence and Stability of Graph Convolutional Network Networks on Large Random Graphs***NeurIPS (Spotlight)*2020. Pdf