# Publications

Check out our works

## List of Publications

Our 48 publications sorted by most recent.

^{*}Equal contribution.

Graph Deep Learning for Time Series Forecasting

Preprint
Spatiotemporal graphs
Forecasting

##### Graph Deep Learning for Time Series Forecasting

Graph-based deep learning methods have become popular tools to process collections of correlated time series. Differently from traditional multivariate forecasting methods, neural graph-based predictors take advantage of pairwise relationships by conditioning forecasts on a (possibly dynamic) graph spanning the time series collection. The conditioning can take the form of an architectural inductive bias on the neural forecasting architecture, resulting in a family of deep learning models called spatiotemporal graph neural networks. Such relational inductive biases enable the training of global forecasting models on large time-series collections, while at the same time localizing predictions w.r.t. each element in the set (i.e., graph nodes) by accounting for local correlations among them (i.e., graph edges). Indeed, recent theoretical and practical advances in graph neural networks and deep learning for time series forecasting make the adoption of such processing frameworks appealing and timely. However, most of the studies in the literature focus on proposing variations of existing neural architectures by taking advantage of modern deep learning practices, while foundational and methodological aspects have not been subject to systematic investigation. To fill the gap, this paper aims to introduce a comprehensive methodological framework that formalizes the forecasting problem and provides design principles for graph-based predictive models and methods to assess their performance. At the same time, together with an overview of the field, we provide design guidelines, recommendations, and best practices, as well as an in-depth discussion of open challenges and future research directions.

Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting

Preprint
Spatiotemporal graphs
Forecasting
Time series clustering
Pooling

##### Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting

Existing relationships among time series can be exploited as inductive biases in learning effective forecasting models. In hierarchical time series, relationships among subsets of sequences induce hard constraints (hierarchical inductive biases) on the predicted values. In this paper, we propose a graph-based methodology to unify relational and hierarchical inductive biases in the context of deep learning for time series forecasting. In particular, we model both types of relationships as dependencies in a pyramidal graph structure, with each pyramidal layer corresponding to a level of the hierarchy. By exploiting modern - trainable - graph pooling operators we show that the hierarchical structure, if not available as a prior, can be learned directly from data, thus obtaining cluster assignments aligned with the forecasting objective. A differentiable reconciliation stage is incorporated into the processing architecture, allowing hierarchical constraints to act both as an architectural bias as well as a regularization element for predictions. Simulation results on representative datasets show that the proposed method compares favorably against the state of the art.

Feudal Graph Reinforcement Learning

Preprint
Reinforcement learning
Relational inductive biases
Graph neural networks

##### Feudal Graph Reinforcement Learning

We focus on learning composable policies to control a variety of physical agents with possibly different structures. Among state-of-the-art methods, prominent approaches exploit graph-based representations and weight-sharing modular policies based on the message-passing framework. However, as shown by recent literature, message passing can create bottlenecks in information propagation and hinder global coordination. This drawback can become even more problematic in tasks where high-level planning is crucial. In fact, in similar scenarios, each modular policy - e.g., controlling a joint of a robot - would request to coordinate not only for basic locomotion but also achieve high-level goals, such as navigating a maze. A classical solution to avoid similar pitfalls is to resort to hierarchical decision-making. In this work, we adopt the Feudal Reinforcement Learning paradigm to develop agents where control actions are the outcome of a hierarchical (pyramidal) message-passing process. In the proposed Feudal Graph Reinforcement Learning (FGRL) framework, high-level decisions at the top level of the hierarchy are propagated through a layered graph representing a hierarchy of policies. Lower layers mimic the morphology of the physical system and upper layers can capture more abstract sub-modules. The purpose of this preliminary work is to formalize the framework and provide proof-of-concept experiments on benchmark environments (MuJoCo locomotion tasks). Empirical evaluation shows promising results on both standard benchmarks and zero-shot transfer learning settings.

Relational Inductive Biases for Object-Centric Image Generation

Preprint
Relational inductive biases
Image generation
Graph neural networks

##### Relational Inductive Biases for Object-Centric Image Generation

Conditioning image generation on specific features of the desired output is a key ingredient of modern generative models. Most existing approaches focus on conditioning the generation based on free-form text, while some niche studies use scene graphs to describe the content of the image to be generated. This paper explores novel methods to condition image generation that are based on object-centric relational representations. In particular, we propose a methodology to condition the generation of a particular object in an image on the attributed graph representing its structure and associated style. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process and allow for regularizing the training procedure. The proposed framework is implemented by means of a neural network architecture combining convolutional operators that operate on both the underlying graph and the 2D grid that becomes the output image. The resulting model learns to generate multi-channel masks of the object that can be used as a soft inductive bias in the downstream generative task. Empirical results show that the proposed approach compares favorably against relevant baselines on image generation conditioned on human poses.

Graph Kalman Filters

Preprint
Spatiotemporal graphs
State-space models
Graph learning

##### Graph Kalman Filters

The well-known Kalman filters model dynamical systems by relying on state-space representations with the next state updated, and its uncertainty controlled, by fresh information associated with newly observed system outputs. This paper generalizes, for the first time in the literature, Kalman and extended Kalman filters to discrete-time settings where inputs, states, and outputs are represented as attributed graphs whose topology and attributes can change with time. The setup allows us to adapt the framework to cases where the output is a vector or a scalar too (node/graph level tasks). Within the proposed theoretical framework, the unknown state-transition and the readout functions are learned end-to-end along with the downstream prediction task.

Taming Local Effects in Graph-based Spatiotemporal Forecasting

Advances in Neural Information Processing Systems, 2023

Spatiotemporal graphs
Forecasting
Embeddings

##### Taming Local Effects in Graph-based Spatiotemporal Forecasting

Spatiotemporal graph neural networks have shown to be effective in time series forecasting applications, achieving better performance than standard univariate predictors in several settings. These architectures take advantage of a graph structure and relational inductive biases to learn a single (global) inductive model to predict any number of the input time series, each associated with a graph node. Despite the gain achieved in computational and data efficiency w.r.t. fitting a set of local models, relying on a single global model can be a limitation whenever some of the time series are generated by a different spatiotemporal stochastic process. The main objective of this paper is to understand the interplay between globality and locality in graph-based spatiotemporal forecasting, while contextually proposing a methodological framework to rationalize the practice of including trainable node embeddings in such architectures. We ascribe to trainable node embeddings the role of amortizing the learning of specialized components. Moreover, embeddings allow for 1) effectively combining the advantages of shared message-passing layers with node-specific parameters and 2) efficiently transferring the learned model to new node sets. Supported by strong empirical evidence, we provide insights and guidelines for specializing graph-based models to the dynamics of each time series and show how this aspect plays a crucial role in obtaining accurate predictions.

Where and How to Improve Graph-based Spatio-Temporal Predictors

Preprint
Spatiotemporal graphs
Residual analysis

##### Where and How to Improve Graph-based Spatio-Temporal Predictors

This paper introduces a novel residual correlation analysis, called AZ-analysis, to assess the optimality of spatio-temporal predictive models. The proposed AZ-analysis constitutes a valuable asset for discovering and highlighting those space-time regions where the model can be improved with respect to performance. The AZ-analysis operates under very mild assumptions and is based on a spatio-temporal graph that encodes serial and functional dependencies in the data; asymptotically distribution-free summary statistics identify existing residual correlation in space and time regions, hence localizing time frames and/or communities of sensors, where the predictor can be improved.

Peak shaving in distribution networks using stationary energy storage systems: A Swiss case study

Sustainable Energy, Grids and Networks, Volume 34, 2023

Forecasting
Energy analytics

##### Peak shaving in distribution networks using stationary energy storage systems: A Swiss case study

Grid operators are charged not only by their total energy demand, but also by their highest power demand from the superior grid level. The maximum demand charge is usually imposed on the peak power point of the monthly load profile, hence, shaving demand at peak times is of main concern for the aforesaid stakeholders. In this paper, we present an approach for peak shaving in a distribution grid using a battery energy storage. The developed algorithm is applied and tested with data from a real stationary battery installation at a Swiss utility. This paper proposes a battery storage control scheme that can be used for peak shaving of the total grid load under realistic conditions. Particularly, a rule-based approach combined with a deep-learning load forecasting model is developed and its performance is compared with the theoretical optimum based on real data from the field. The analysis includes both technical and economical results from a simulated storage operation and significant outcomes are given for the application of this method.

Graph State-Space Models

Preprint
Spatiotemporal graphs
State-space models
Graph learning

##### Graph State-Space Models

State-space models constitute an effective modeling tool to describe multivariate time series and operate by maintaining an updated representation of the system state from which predictions are made. Within this framework, relational inductive biases, e.g., associated with functional dependencies existing among signals, are not explicitly exploited leaving unattended great opportunities for effective modeling approaches. The manuscript aims, for the first time, at filling this gap by matching state-space modeling and spatio-temporal data where the relational information, say the functional graph capturing latent dependencies, is learned directly from data and is allowed to change over time. Within a probabilistic formulation that accounts for the uncertainty in the data-generating process, an encoder-decoder architecture is proposed to learn the state-space model end-to-end on a downstream task. The proposed methodological framework generalizes several state-of-the-art methods and demonstrates to be effective in extracting meaningful relational information while achieving optimal forecasting performance in controlled environments.

Scalable Spatiotemporal Graph Neural Networks

Proceedings of the AAAI conference on artificial intelligence, 2023

Spatiotemporal graphs
Forecasting
Reservoir computing

##### Scalable Spatiotemporal Graph Neural Networks

Neural forecasting of spatiotemporal time series drives both research and industrial innovation in several relevant application domains. Graph neural networks (GNNs) are often the core component of the forecasting architecture. However, in most spatiotemporal GNNs, the computational complexity scales up to a quadratic factor with the length of the sequence times the number of links in the graph, hence hindering the application of these models to large graphs and long temporal sequences. While methods to improve scalability have been proposed in the context of static graphs, few research efforts have been devoted to the spatiotemporal case. To fill this gap, we propose a scalable architecture that exploits an efficient encoding of both temporal and spatial dynamics. In particular, we use a randomized recurrent neural network to embed the history of the input time series into high-dimensional state representations encompassing multi-scale temporal dynamics. Such representations are then propagated along the spatial dimension using different powers of the graph adjacency matrix to generate node embeddings characterized by a rich pool of spatiotemporal features. The resulting node embeddings can be efficiently pre-computed in an unsupervised manner, before being fed to a feed-forward decoder that learns to map the multi-scale spatiotemporal representations to predictions. The training procedure can then be parallelized node-wise by sampling the node embeddings without breaking any dependency, thus enabling scalability to large networks. Empirical results on relevant datasets show that our approach achieves results competitive with the state of the art, while dramatically reducing the computational burden.

Graph Neural Networks for High-Level Synthesis Design Space Exploration

ACM Transactions on Design Automation of Electronic Systems, 2022

Graph neural networks
Design space exploration
Relational inductive biases

##### Graph Neural Networks for High-Level Synthesis Design Space Exploration

High-level Synthesis (HLS) Design-Space Exploration (DSE) aims at identifying Pareto-optimal synthesis configurations whose exhaustive search is unfeasible due to the design-space dimensionality and the prohibitive computational cost of the synthesis process. Within this framework, we address the design automation problem by proposing graph neural networks that jointly predict acceleration performance and hardware costs of a synthesized behavioral specification given optimization directives. Learned models can be used to rapidly approach the Pareto curve by guiding the DSE, taking into account performance and cost estimates. The proposed method outperforms traditional HLS-driven DSE approaches, by accounting for the arbitrary length of computer programs and the invariant properties of the input. We propose a novel hybrid control and dataflow graph representation that enables training the graph neural network on specifications of different hardware accelerators. Our approach achieves prediction accuracy comparable with that of state-of-the-art simulators without having access to analytical models of the HLS compiler. Finally, the learned representation can be exploited for DSE in unexplored configuration spaces by fine-tuning on a small number of samples from the new target domain. The outcome of the empirical evaluation of this transfer learning shows strong results against state-of-the-art baselines in relevant benchmarks.

Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations

Advances in Neural Information Processing Systems, 2022

Spatiotemporal graphs
Imputation

##### Learning to Reconstruct Missing Data from Spatiotemporal Graphs with Sparse Observations

Modeling multivariate time series as temporal signals over a (possibly dynamic) graph is an effective representational framework that allows for developing models for time series analysis. In fact, discrete sequences of graphs can be processed by autoregressive graph neural networks to recursively learn representations at each discrete point in time and space. Spatiotemporal graphs are often highly sparse, with time series characterized by multiple, concurrent, and even long sequences of missing data, e.g., due to the unreliable underlying sensor network. In this context, autoregressive models can be brittle and exhibit unstable learning dynamics. The objective of this paper is, then, to tackle the problem of learning effective models to reconstruct, i.e., impute, missing data points by conditioning the reconstruction only on the available observations. In particular, we propose a novel class of attention-based architectures that, given a set of highly sparse discrete observations, learn a representation for points in time and space by exploiting a spatiotemporal diffusion architecture aligned with the imputation task. Representations are trained end-to-end to reconstruct observations w.r.t. the corresponding sensor and its neighboring nodes. Compared to the state of the art, our model handles sparse data without propagating prediction errors or requiring a bidirectional model to encode forward and backward time dependencies. Empirical results on representative benchmarks show the effectiveness of the proposed method.

AZ-whiteness test: a test for uncorrelated noise on spatio-temporal graphs

Advances in Neural Information Processing Systems, 2022

Spatiotemporal graphs
Residual analysis

##### AZ-whiteness test: a test for uncorrelated noise on spatio-temporal graphs

We present the first whiteness test for graphs, i.e., a whiteness test for multivariate time series associated with the nodes of a dynamic graph. The statistical test aims at finding serial dependencies among close-in-time observations, as well as spatial dependencies among neighboring observations given the underlying graph. The proposed test is a spatio-temporal extension of traditional tests from the system identification literature and finds applications in similar, yet more general, application scenarios involving graph signals. The AZ-test is versatile, allowing the underlying graph to be dynamic, changing in topology and set of nodes, and weighted, thus accounting for connections of different strength, as is the case in many application scenarios like transportation networks and sensor grids. The asymptotic distribution --- as the number of graph edges or temporal observations increases --- is known, and does not assume identically distributed data. We validate the practical value of the test on both synthetic and real-world problems, and show how the test can be employed to assess the quality of spatio-temporal forecasting models by analyzing the prediction residuals appended to the graphs stream.

Sparse Graph Learning from Spatiotemporal Time Series

Journal of Machine Learning Research, 2023

Spatiotemporal graphs
Graph learning
Forecasting

##### Sparse Graph Learning from Spatiotemporal Time Series

Outstanding achievements of graph neural networks for spatiotemporal time series analysis show that relational constraints introduce an effective inductive bias into neural forecasting architectures. Often, however, the relational information characterizing the underlying data-generating process is unavailable and the practitioner is left with the problem of inferring from data which relational graph to use in the subsequent processing stages. We propose novel, principled - yet practical - probabilistic score-based methods that learn the relational dependencies as distributions over graphs while maximizing end-to-end the performance at task. The proposed graph learning framework is based on consolidated variance reduction techniques for Monte Carlo score-based gradient estimation, is theoretically grounded, and, as we show, effective in practice. In this paper, we focus on the time series forecasting problem and show that, by tailoring the gradient estimators to the graph learning problem, we are able to achieve state-of-the-art performance while controlling the sparsity of the learned graph and the computational scalability. We empirically assess the effectiveness of the proposed method on synthetic and real-world benchmarks, showing that the proposed solution can be used as a stand-alone graph identification procedure as well as a graph learning component of an end-to-end forecasting architecture.

Deep learning for graphs (special session at ESANN 2022)

European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 2022

Graph neural networks

##### Deep learning for graphs (special session at ESANN 2022)

The flourishing field of deep learning for graphs relies on the layered computation of representations from graph-structured data given as input. A widely used strategy for processing graphs is via message passing, based on exchanging information among the connected nodes of the graph. Subsequently, node representations are employed to address tasks associated with the nodes and edges of a graph, or even entire graphs. The present tutorial paper reviews fundamental concepts and open challenges of deep learning for graphs and summarizes the contributed papers of the ESANN 2022 special session on the topic.

Spatio-Temporal Graph Neural Networks for Aggregate Load Forecasting

IEEE International Joint Conference on Neural Networks, 2022

Spatiotemporal graphs
Forecasting
Energy analytics

##### Spatio-Temporal Graph Neural Networks for Aggregate Load Forecasting

Accurate forecasting of electricity demand is a core component in many systems within the modern electricity infrastructure. Several approaches exist that tackle this problem by exploiting modern deep learning tools. However, most previous works focus on predicting the total load as a univariate time series forecasting task, ignoring all fine-grained information captured by the smart meters distributed across the power grid. We introduce a methodology to account for this information in the graph neural network framework. In particular, we consider spatio-temporal graphs where each node is associated with the aggregate load of a cluster of smart meters, and a global graph-level attribute indicates the total load on the grid. We propose two novel spatio-temporal graph neural network models to process this representation and take advantage of both the finer-grained information and the relationships existing between the different clusters of meters. We compare these models on a widely used, openly available, benchmark against a competitive baseline which only accounts for the total load profile. Within these settings, we show that the proposed methodology improves forecasting accuracy.

Graph iForest: Isolation of anomalous and outlier graphs

IEEE International Joint Conference on Neural Networks, 2022

Anomaly detection

##### Graph iForest: Isolation of anomalous and outlier graphs

We present an anomaly and outlier detection method for graph data. The method relies on the consideration that anomalies and outliers are more easily isolated by certain incremental partitionings of the data space. Specifically, we build upon the isolation forest method and introduce a new incremental partitioning of the space of graphs that makes the isolation forest method applicable to generic attributed graphs, i.e., graphs where both nodes and edges can be associated with attributes. Within the considered general setup, the topology and the number of nodes can change from graph to graph, and a node correspondence between different graphs can be absent or unknown. Examples of applications of what proposed include the identification of frauds and fake news in communication networks, and breakage of systems monitored by sensor networks. The main novel contribution of the paper is a graph space partitioning which we prove to be expressive enough to identify anomalies and outlier graphs in a given dataset. An empirical analysis on synthetic and real-world graphs validates the effectiveness of the proposed method.

Understanding Catastrophic Forgetting of Gated Linear Networks in Continual Learning

IEEE International Joint Conference on Neural Networks, 2022

Nonstationary environments
Continual learning

##### Understanding Catastrophic Forgetting of Gated Linear Networks in Continual Learning

In this paper, we consider the recently proposed family of continual learning models, called Gated Linear Networks (GLNs), and study two crucial aspects impacting on the amount of catastrophic forgetting affecting gated linear networks, namely, data standardization and gating mechanism. Data standardization is particularly challenging in the online/continual learning setting because data from future tasks is not available beforehand. The results obtained using an online standardization method show a considerably higher amount of forgetting compared to an offline --static-- standardization. Interestingly, with the latter standardization, we observe that GLNs show almost no forgetting on the considered benchmark datasets. Secondly, for an effective GLNs, it is essential to tailor the hyperparameters of the gating mechanism to the data distribution. In this paper, we propose a gating strategy based on a set of prototypes and the resulting Voronoi tessellation. The experimental assessment shows that the proposed approach is more robust to different data standardizations compared to the original one, based on a halfspace gating mechanism, and shows improved predictive performance.

Understanding Pooling in Graph Neural Networks

IEEE Transactions on Neural Networks and Learning Systems, 2022

Graph neural networks
Pooling

##### Understanding Pooling in Graph Neural Networks

Inspired by the conventional pooling layers in convolutional neural networks, many recent works in the field of graph machine learning have introduced pooling operators to reduce the size of graphs. The great variety in the literature stems from the many possible strategies for coarsening a graph, which may depend on different assumptions on the graph structure or the specific downstream task. In this paper we propose a formal characterization of graph pooling based on three main operations, called selection, reduction, and connection, with the goal of unifying the literature under a common framework. Following this formalization, we introduce a taxonomy of pooling operators and categorize more than thirty pooling methods proposed in recent literature. We propose criteria to evaluate the performance of a pooling operator and use them to investigate and contrast the behavior of different classes of the taxonomy on a variety of tasks.

Deep Learning for Time Series Forecasting: The Electric Load Case

CAAI Transactions on Intelligence Technology, 2022

Forecasting
Energy analytics

##### Deep Learning for Time Series Forecasting: The Electric Load Case

Management and efficient operations in critical infrastructure such as Smart Grids take huge advantage of accurate power load forecasting which, due to its nonlinear nature, remains a challenging task. Recently, deep learning has emerged in the machine learning field achieving impressive performance in a vast range of tasks, from image classification to machine translation. Applications of deep learning models to the electric load forecasting problem are gaining interest among researchers as well as the industry, but a comprehensive and sound comparison among different architectures is not yet available in the literature. This work aims at filling the gap by reviewing and experimentally evaluating on two real-world datasets the most recent trends in electric load forecasting, by contrasting deep learning architectures on short term forecast (one day ahead prediction). Specifically, we focus on feedforward and recurrent neural networks, sequence to sequence models and temporal convolutional neural networks along with architectural variants, which are known in the signal processing community but are novel to the load forecasting one.

Seizure localisation with attention-based graph neural networks

Expert Systems with Applications, 2022

##### Seizure localisation with attention-based graph neural networks

Graph neural networks (GNNs) and the attention mechanism are two of the most significant advances in artificial intelligence methods over the past few years. The former are neural networks able to process graph-structured data, while the latter learns to selectively focus on those parts of the input that are more relevant for the task at hand. In this paper, we propose a methodology for seizure localisation which combines the two approaches. Our method is composed of several blocks. First, we represent brain states in a compact way by computing functional networks from intracranial electroencephalography recordings, using metrics to quantify the coupling between the activity of different brain areas. Then, we train a GNN to correctly distinguish between functional networks associated with interictal and ictal phases. The GNN is equipped with an attention-based layer which automatically learns to identify those regions of the brain (associated with individual electrodes) that are most important for a correct classification. The localisation of these regions is fully unsupervised, meaning that it does not use any prior information regarding the seizure onset zone. We report results both for human patients and for simulators of brain activity. We show that the regions of interest identified by the GNN strongly correlate with the localisation of the seizure onset zone reported by electroencephalographers. We also show that our GNN exhibits uncertainty on those patients for which the clinical localisation was also unsuccessful, highlighting the robustness of the proposed approach.

Learning dynamical systems using dynamical systems: the reservoir computing approach

PhD Thesis, Università della Svizzera italiana, 2022

Reservoir computing

##### Learning dynamical systems using dynamical systems: the reservoir computing approach

Dynamical systems have been used to describe a vast range of phenomena, including physical sciences, biology, neurosciences, and economics just to name a few. The development of a mathematical theory for dynamical systems allowed researchers to create precise models of many phenomena, predicting their behaviors with great accuracy. For many challenges of dynamical systems, highly accurate models are notably hard to produce due to the enormous number of variables involved and the complexity of their interactions. Yet, in recent years the availability of large datasets has driven researchers to approach these complex systems with machine learning techniques. These techniques are valuable in settings where no model can be formulated explicitly, but not rarely the working principles of these models are obscure and their optimization is driven by heuristics. In this context, this work aims at advancing the field by “opening the black-box” of data-driven models developed for dynamical systems. We focus on Recurrent Neural Networks (RNNs), one of the most promising and yet less understood approaches. In particular, we concentrate on a specific neural architecture that goes under the name of Reservoir Computing (RC). We address three problems: (1) how the learning procedure of these models can be understood and improved, (2) how these systems encode a representation of the inputs they receive, and (3) how the dynamics of these systems affect their performance. We make use of various tools taken from the theory of dynamical systems to explain how we can better understand the working principles of RC in dynamical systems, aiming at developing new guiding principles to improve their design.

Anomaly and Change Detection in Sequences of Graphs

PhD Thesis, Università della Svizzera italiana, 2021

Nonstationary environments
Anomaly detection
Change detection

##### Anomaly and Change Detection in Sequences of Graphs

We are experiencing a significant increase in the amount of data collected by sensor and social networks, thanks to advances in technology and the spread of social platforms. Doing inference on such huge data streams is a crucial task both to academia and industry. As data streams coming from sensors --either physical or virtual-- generally share functional dependencies, graphs emerge as rich structures able to model both information at the sensor/entity level and the complex relationships existing among entities. In turn, this graph representation enables us to do inference on graph streams, e.g., through graph neural networks and geometric deep learning. However, in general, such processing techniques assume the stationarity hypothesis, which is not always granted, e.g., whenever we experience sensor aging, time variance or changes in the users' preferences on social platforms. In this doctoral thesis, we address the problem of identifying changes in stationarity caused by unknown phenomena occurring in the underlying data-generating process and emerging in the sequence of graphs. Scientific outcomes permit us to also address the anomaly detection problem that, indeed, represents a valuable follow-up of the research. We consider a general family of attributed graphs with non-identified vertices in order to cover the broadest spectrum of applications. The major contribution of the thesis resides in a methodology for processing a sequence of graphs to detect unexpected events (changes in stationarities and/or occurrence of anomalies) in the data generating process. The methodology relies on the design of novel graph-level embeddings and change detection methods supported by a solid theoretical framework.

Graph neural networks: operators and architectures

PhD Thesis, Università della Svizzera italiana, 2021

Graph neural networks

##### Graph neural networks: operators and architectures

This thesis explores the field of graph neural networks, a class of deep learning models designed to learn representations of graphs. We organise the work into two parts. In the first part, we focus on the essential building blocks of graph neural networks. We present three novel operators for learning graph representations: one graph convolutional layer and two methods for pooling. We put particular emphasis on the topic of pooling, introducing a universal and modular framework to describe pooling operators, a taxonomy to organise the literature, and a set of evaluation criteria to assess an operator’s performance. The second part focuses on specific graph neural network architectures and their applications to cutting-edge problems in dynamical systems and computational biology. We present three main contributions. First, we introduce an autoencoder architecture for learning graph representations in non-Euclidean spaces. We apply our model to the tasks of molecule generation and change detection in graph sequences. Second, we propose a graph neural network designed to be interpretable, specifically to solve the problem of seizure localisation in subjects with epilepsy. Finally, we discuss the design of autoregressive models for sequences of graphs.

Learning Graph Cellular Automata

Advances in Neural Information Processing Systems, 2021

##### Learning Graph Cellular Automata

Cellular automata (CA) are a class of computational models that exhibit rich dynamics emerging from the local interaction of cells arranged in a regular lattice. In this work we focus on a generalised version of typical CA, called graph cellular automata (GCA), in which the lattice structure is replaced by an arbitrary graph. In particular, we extend previous work that used convolutional neural networks to learn the transition rule of conventional CA and we use graph neural networks to learn a variety of transition rules for GCA. First, we present a general-purpose architecture for learning GCA, and we show that it can represent any arbitrary GCA with finite and discrete state space. Then, we test our approach on three different tasks: 1) learning the transition rule of a GCA on a Voronoi tessellation; 2) imitating the behaviour of a group of flocking agents; 3) learning a rule that converges to a desired target state.

Gaussian Approximation for Bias Reduction in Q-Learning

Journal of Machine Learning Research, 2021

Reinforcement learning

##### Gaussian Approximation for Bias Reduction in Q-Learning

Temporal-Difference off-policy algorithms are among the building blocks of reinforcement learning (RL). Within this family, Q-Learning is arguably the most famous one, which has been widely studied and extended. The update rule of Q-learning involves the use of the maximum operator to estimate the maximum expected value of the return. However, this estimate is positively biased, and may hinder the learning process, especially in stochastic environments and when function approximation is used. We introduce the Weighted Estimator as an effective solution to mitigate the negative effects of overestimation in Q-Learning. The Weighted Estimator estimates the maximum expected value as a weighted sum of the action values, with the weights being the probabilities that each action value is the maximum. In this work, we study the problem from the statistical perspective of estimating the maximum expected value of a set of random variables and provide bounds to the bias and the variance of the Weighted Estimator, showing its advantages over other estimators present in literature. Then, we derive algorithms to enable the use of the Weighted Estimator, in place of the Maximum Estimator, in online and batch RL, and we introduce a novel algorithm for deep RL. Finally, we empirically evaluate our algorithms in a large set of heterogeneous problems, encompassing discrete and continuous, low and high dimensional, deterministic and stochastic environments. Experimental results show the effectiveness of the Weighted Estimator in controlling the bias of the estimate, resulting in better performance than representative baselines and robust learning w.r.t. a large set of diverse environments.

Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks

International Conference on Learning Representations, 2022

Spatiotemporal graphs
Imputation

##### Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks

Dealing with missing values and incomplete time series is a labor-intensive, tedious, inevitable task when handling data coming from real-world applications. Effective spatio-temporal representations would allow imputation methods to reconstruct missing temporal data by exploiting information coming from sensors at different locations. However, standard methods fall short in capturing the nonlinear time and space dependencies existing within networks of interconnected sensors and do not take full advantage of the available - and often strong - relational information. Notably, most state-of-the-art imputation methods based on deep learning do not explicitly model relational aspects and, in any case, do not exploit processing frameworks able to adequately represent structured spatio-temporal data. Conversely, graph neural networks have recently surged in popularity as both expressive and scalable tools for processing sequential data with relational inductive biases. In this work, we present the first assessment of graph neural networks in the context of multivariate time series imputation. In particular, we introduce a novel graph neural network architecture, named GRIN, which aims at reconstructing missing data in the different channels of a multivariate time series by learning spatio-temporal representations through message passing. Empirical results show that our model outperforms state-of-the-art methods in the imputation task on relevant real-world benchmarks with mean absolute error improvements often higher than 20%.

Deep learning for graphs (special session at ESANN 2021)

European Symposium on Artificial Neural Networks, 2021

Graph neural networks

##### Deep learning for graphs (special session at ESANN 2021)

Deep learning for graphs encompasses all those models endowed with multiple layers of abstraction, which operate on data represented as graphs. The most common building blocks of these models are graph encoding layers, which compute a vector embedding for each node in a graph based on a sum of messages received from its neighbors. However, the family also includes architectures with decoders from vectors to graphs and models that process time-varying graphs and hypergraphs. In this paper, we provide an overview of the key concepts in the field, point towards open questions, and frame the contributions of the ESANN 2021 special session into the broader context of deep learning for graphs.

Precise Agriculture: Effective Deep Learning Strategies to Detect Pest Insects

Journal of Automatica Sinica, 2021

##### Precise Agriculture: Effective Deep Learning Strategies to Detect Pest Insects

Pest insect monitoring and control is crucial toensure a safe and profitable crop growth in all plantation types, aswell as guarantee food quality and limited use of pesticides. Weaim at extending traditional monitoring by means of traps, byinvolving the general public in reporting the presence of insectsby using smartphones. This includes the largely unexploredproblem of detecting insects in images that are taken in non-controlled conditions. Furthermore, pest insects are, in manycases, extremely similar to other species that are harmless.Therefore, computer vision algorithms must not be fooled bythese similar insects, not to raise unmotivated alarms. In thiswork, we study the capabilities of state-of-the-art (SoA) objectdetection models based on convolutional neural networks (CNN)for the task of detecting beetle-like pest insects on non-homogeneous images taken outdoors by different sources.Moreover, we focus on disambiguating a pest insect from similarharmless species. We consider not only detection performance ofdifferent models, but also required computational resources. Thisstudy aims at providing a baseline model for this kind of tasks.Our results show the suitability of current SoA models for thisapplication, highlighting how FasterRCNN with a MobileNetV3backbone is a particularly good starting point for accuracy andinference execution latency. This combination provided a meanaverage precision score of 92.66% that can be consideredqualitatively at least as good as the score obtained by otherauthors that adopted more specific models.

Graph Edit Networks

International Conference on Learning Representations, 2021

Graph neural networks

##### Graph Edit Networks

While graph neural networks have made impressive progress in classification and regression on graphs, few approaches to date perform time series prediction on graphs and those that do are mostly limited to edge changes. We suggest that graph edits are a more natural interface for graph-to-graph learning. In particular, graph edits are general enough to describe any graph-to-graph change, not only edge changes, they are sparse, making them easier to understand for humans and more efficient computationally, and they are local, avoiding the need for pooling layers in graph neural networks. In this paper, we propose a simple linear layer - the graph edit network - which takes node embeddings as input and generates a sequence of graph edits that transform the input graph to the output graph. Theoretically, we show that a mapping between the node sets of two graphs is sufficient to construct training data for a graph edit network and that an optimal mapping yields edit scripts that are almost as short as the graph edit distance between the graphs. We further provide a proof-of-concept.

Graph Neural Networks in TensorFlow and Keras with Spektral

IEEE Computational Intelligence Magazine, 2021

Graph neural networks
Library

##### Graph Neural Networks in TensorFlow and Keras with Spektral

Graph neural networks have enabled the application of deep learning to problems that can be described by graphs, which are found throughout the different fields of science, from physics to biology, natural language processing, telecommunications or medicine. In this paper we present Spektral, an open-source Python library for building graph neural networks with TensorFlow and the Keras application programming interface. Spektral implements a large set of methods for deep learning on graphs, including message-passing and pooling operators, as well as utilities for processing graphs and loading popular benchmark datasets. The purpose of this library is to provide the essential building blocks for creating graph neural networks, focusing on the guiding principles of user-friendliness and quick prototyping on which Keras is based. Spektral is, therefore, suitable for absolute beginners and expert deep learning practitioners alike. In this work, we present an overview of Spektral's features and report the performance of the methods implemented by the library in scenarios of node classification, graph classification, and graph regression.

Learn to Synchronize, Synchronize to Learn

Chaos: An Interdisciplinary Journal of Nonlinear Science, 2021

Reservoir computing

##### Learn to Synchronize, Synchronize to Learn

In recent years, the machine learning community has seen a continuous growing interest in research aimed at investigating dynamical aspects of both training procedures and perfected models. Of particular interest among recurrent neural networks, we have the Reservoir Computing (RC) paradigm for its conceptual simplicity and fast training scheme. Yet, the guiding principles under which RC operates are only partially understood. In this work, we study the properties behind learning dynamical systems with RC and propose a new guiding principle based on Generalized Synchronization (GS) granting its feasibility. We show that the well-known Echo State Property (ESP) implies and is implied by GS, so that theoretical results derived from the ESP still hold when GS does. However, by using GS one can profitably study the RC learning procedure by linking the reservoir dynamics with the readout training. Notably, this allows us to shed light on the interplay between the input encoding performed by the reservoir and the output produced by the readout optimized for the task at hand. In addition, we show that - as opposed to the ESP - satisfaction of the GS can be measured by means of the Mutual False Nearest Neighbors index, which makes effective to practitioners theoretical derivations.

Input-to-State Representation in Linear Reservoir Dynamics

IEEE Transactions on Neural Networks and Learning Systems, 2021

Reservoir computing

##### Input-to-State Representation in Linear Reservoir Dynamics

Reservoir computing is a popular approach to design recurrent neural networks, due to its training simplicity and its approximation performance. The recurrent part of these networks is not trained (e.g. via gradient descent), making them appealing for analytical studies, raising the interest of a vast community of researcher spanning from dynamical systems to neuroscience. It emerges that, even in the simple linear case, the working principle of these networks is not fully understood and the applied research is usually driven by heuristics. A novel analysis of the dynamics of such networks is proposed, which allows one to express the state evolution using the controllability matrix. Such a matrix encodes salient characteristics of the network dynamics: in particular, its rank can be used as an input-indepedent measure of the memory of the network. Using the proposed approach, it is possible to compare different architectures and explain why a cyclic topology achieves favourable results.

Graph Neural Networks with Convolutional ARMA Filters

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

Graph neural networks

##### Graph Neural Networks with Convolutional ARMA Filters

We propose a novel graph convolutional layer based on auto-regressive moving average (ARMA) filters that, compared to the polynomial ones, provide a more flexible response thanks to a rich transfer function that accounts for the concept of state. We implement the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, is localized in the node space and can be applied to graphs with different topologies.

Deep Reinforcement Learning with Weighted Q-Learning

The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022

Reinforcement learning

##### Deep Reinforcement Learning with Weighted Q-Learning

Overestimation of the maximum action-value is a well-known problem that hinders Q-Learning performance, leading to suboptimal policies and unstable learning. Among several Q-Learning variants proposed to address this issue, Weighted Q-Learning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action-values, where the weights correspond to the probability of each action-value being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep Q-Learning algorithm reduces the bias wrt relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

IEEE Transactions on Neural Networks and Learning Systems, 2020

Graph neural networks
Pooling

##### Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling

We propose Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarsened versions of a graph by leveraging on its topology only. During training, the GNN learns new representations for the vertices and fits them to a pyramid of coarsened graphs, which is computed in a pre-processing step. As theoretical contributions, we first demonstrate the equivalence between the MAXCUT partition and the node decimation procedure on which NDP is based. Then, we propose a procedure to sparsify the coarsened graphs for reducing the computational complexity in the GNN; we also demonstrate that it is possible to drop many edges without significantly altering the graph spectra of coarsened graphs.

Graph Random Neural Features for Distance-Preserving Graph Representations

International Conference on Machine Learning, 2020

Graph neural networks
Embeddings

##### Graph Random Neural Features for Distance-Preserving Graph Representations

Graph Random Neural Features (GRNF) is a novel embedding method from graph-structured data to real vectors based on a family of graph neural networks. GRNF can be used within traditional processing methods or as a training-free input layer of a graph neural network. The theoretical guarantees that accompany GRNF ensure that the considered graph distance is metric, hence allowing to distinguish any pair of non-isomorphic graphs, and that GRNF approximately preserves its metric structure.

Spectral Clustering with Graph Neural Networks for Graph Pooling

International Conference on Machine Learning, 2020

Graph neural networks
Pooling

##### Spectral Clustering with Graph Neural Networks for Graph Pooling

We propose a graph clustering approach that addresses some limitations of the spectral clustering algorithm. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNN-based implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on out-of-sample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of state-of-the-art graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.

Cluster-based Aggregate Load Forecasting with Deep Neural Networks

IEEE International Joint Conference on Neural Networks, 2020

Forecasting
Energy analytics

##### Cluster-based Aggregate Load Forecasting with Deep Neural Networks

Highly accurate power demand forecasting represents one of key challenges of Smart Grid applications. In this setting, a large number of Smart Meters produces huge amounts of data that need to be processed to predict the load requested by the grid. Due to the high dimensionality of the problem, this often results in the adoption of simple aggregation strategies for the power that fail in capturing the relational information existing among the different types of user. A possible alternative, known as Cluster-based Aggregate Forecasting, consists in clustering the load profiles and, on top of that, building predictors of the aggregate at the cluster-level. In this work we explore the technique in the context of predictors based on deep recurrent neural networks and address the scalability issues presenting neural architectures adequate to process cluster-level aggregates. The proposed methods are finally evaluated both on a publicly available benchmark and a heterogenous dataset of Smart Meter data from an entire, medium-sized, Swiss town.

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

Nature Scientific Reports, 2019

Reservoir computing

##### Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

We propose a model of echo state networks that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behavior in phase space characterized by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.

Autoregressive Models for Sequences of Graphs

IEEE International Joint Conference on Neural Networks, 2019

Graph neural networks

##### Autoregressive Models for Sequences of Graphs

This paper proposes an autoregressive (AR) model for sequences of graphs, which generalizes traditional AR models. A first novelty consists in formalizing the AR model for a very general family of graphs, characterized by a variable topology, and attributes associated with nodes and edges. A graph neural network is also proposed to learn the AR function associated with the graph-generating process, and subsequently predict the next graph in a sequence.

Adversarial Autoencoders with Constant-Curvature Latent Manifolds

Applied Soft Computing, 2019

##### Adversarial Autoencoders with Constant-Curvature Latent Manifolds

We introduce the constant-curvature manifold adversarial autoencoder (CCM-AAE), a probabilistic generative model trained to represent a data distribution on a constant-curvature Riemannian manifold (CCM). Our method works by matching the aggregated posterior of the CCM-AAE with a probability distribution defined on a CCM, so that the encoder implicitly learns to represent data on the CCM to fool the discriminator network. The geometric constraint is also explicitly imposed by jointly training the CCM-AAE to maximize the membership degree of the embeddings to the CCM.

Change-Point Methods on a Sequence of Graphs

IEEE Transactions on Signal Processing, 2019

Nonstationary environments
Change detection

##### Change-Point Methods on a Sequence of Graphs

Given a finite sequence of graphs, we propose a methodology to identify possible changes in stationarity in the stochastic process that generated such graphs. We consider a general family of attributed graphs for which both topology (vertices and edges) and associated attributes are allowed to change over time, without violating the stationarity hypothesis. Novel Change-Point Methods (CPMs) are proposed that map graphs onto vectors, apply a suitable statistical test in vector space and detect changes –if any– according to a user-defined confidence level; an estimate for the change point is provided as well. We ground our methods on theoretical results that show how the inference in the numerical vector space is related to the one in graph domain, and vice-versa.

Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds

IEEE Transactions on Neural Networks and Learning Systems, 2019

Nonstationary environments
Change detection

##### Change Detection in Graph Streams by Learning Graph Embeddings on Constant-Curvature Manifolds

We focus on the problem of detecting changes in stationarity in a stream of attributed graphs. To this end, we introduce a novel change detection framework based on neural networks and Constant-curvature manifolds (CCMs), that takes into account the non-Euclidean nature of graphs. Our contribution in this work is twofold. First, via a novel approach based on adversarial learning, we compute graph embeddings by training an autoencoder to represent graphs on CCMs. Second, we introduce two novel change detection tests operating on CCMs.

A Characterization of the Edge of Criticality in Binary Echo State Networks

IEEE International Workshop on Machine Learning for Signal Processing, 2018

Reservoir computing

##### A Characterization of the Edge of Criticality in Binary Echo State Networks

We propose binary echo state networks (ESNs), which are architecturally equivalent to standard ESNs but consider binary activation functions and binary recurrent weights. For these networks, we derive a closed-form expression for the edge of criticality (EoC) in the autonomous case and perform simulations in order to assess their behavior in the case of noisy neurons and in the presence of a signal. We propose a theoretical explanation for the fact that the variance of the input plays a major role in characterizing the EoC.

Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

IEEE International Joint Conference on Neural Networks, 2018

Nonstationary environments
Anomaly detection
Change detection

##### Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

We investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams.

Concept Drift and Anomaly Detection in Graph Streams

IEEE Transactions on Neural Networks and Learning Systems, 2018

Nonstationary environments
Anomaly detection
Change detection

##### Concept Drift and Anomaly Detection in Graph Streams

We consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results.

Detecting Changes in Sequences of Attributed Graphs

IEEE Symposium Series on Computational Intelligence, 2017

Nonstationary environments
Change detection

##### Detecting Changes in Sequences of Attributed Graphs

We consider a methodology for detecting changes in sequences of graphs. Changes are recognized by embedding each graph into a vector space, where conventional change detection procedures exist and can be easily applied. We introduce the methodology and focus on expanding experimental evaluations on controlled yet relevant examples involving geometric graphs and Markov chains.