Graph Neural Networks

Graph Random Neural Features for DistancePreserving Graph Representations.
D. Zambon, C. Alippi, L. Livi
International Conference on Machine Learning (2020)
Graph Random Neural Features (GRNF) is a novel embedding method from graphstructured data to real vectors based on a family of graph neural networks. GRNF can be used within traditional processing methods or as a trainingfree input layer of a graph neural network.
The theoretical guarantees that accompany GRNF ensure that the considered graph distance is metric, hence allowing to distinguish any pair of nonisomorphic graphs, and that GRNF approximately preserves its metric structure.

Spectral Clustering with Graph Neural Networks for Graph Pooling.
F. M. Bianchi, D. Grattarola, C. Alippi
International Conference on Machine Learning (2020)
We propose a graph clustering approach that addresses some limitations of the spectral clustering algorithm. We formulate a continuous relaxation of the normalized minCUT problem and train a GNN to compute cluster assignments that minimize this objective. Our GNNbased implementation is differentiable, does not require to compute the spectral decomposition, and learns a clustering function that can be quickly evaluated on outofsample graphs. From the proposed clustering method, we design a graph pooling operator that overcomes some important limitations of stateoftheart graph pooling techniques and achieves the best performance in several supervised and unsupervised tasks.

Hierarchical Representation Learning in Graph Neural Networks with Node Decimation Pooling.
F. M. Bianchi, D. Grattarola, L. Livi, C. Alippi
IEEE Transactions on Neural Networks and Learning Systems (2020)
We propose Node Decimation Pooling (NDP), a pooling operator for GNNs that generates coarsened versions of a graph by leveraging on its topology only. During training, the GNN learns new representations for the vertices and fits them to a pyramid of coarsened graphs, which is computed in a preprocessing step. As theoretical contributions, we first demonstrate the equivalence between the MAXCUT partition and the node decimation procedure on which NDP is based. Then, we propose a procedure to sparsify the coarsened graphs for reducing the computational complexity in the GNN; we also demonstrate that it is possible to drop many edges without significantly altering the graph spectra of coarsened graphs.

Graph Neural Networks with Convolutional ARMA Filters.
F. M. Bianchi, D. Grattarola, L. Livi, C. Alippi
IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
We propose a novel graph convolutional layer based on autoregressive moving average (ARMA) filters that, compared to the polynomial ones, provide a more flexible response thanks to a rich transfer function that accounts for the concept of state. We implement the ARMA filter with a recursive and distributed formulation, obtaining a convolutional layer that is efficient to train, is localized in the node space and can be applied to graphs with different topologies.
Learning in NonStationary Environments

Graph Edit Networks.
B. Paassen, D. Grattarola, D.Zambon, C. Alippi, B. Hammer
International Conference on Learning Representations (2021)
While graph neural networks have made impressive progress in classification and regression on graphs, few approaches to date perform time series prediction on graphs and those that do are mostly limited to edge changes. We suggest that graph edits are a more natural interface for graphtograph learning. In particular, graph edits are general enough to describe any graphtograph change, not only edge changes, they are sparse, making them easier to understand for humans and more efficient computationally, and they are local, avoiding the need for pooling layers in graph neural networks. In this paper, we propose a simple linear layer  the graph edit network  which takes node embeddings as input and generates a sequence of graph edits that transform the input graph to the output graph. Theoretically, we show that a mapping between the node sets of two graphs is sufficient to construct training data for a graph edit network and that an optimal mapping yields edit scripts that are almost as short as the graph edit distance between the graphs. We further provide a proofofconcept.

ChangePoint Methods on a Sequence of Graphs.
D. Zambon, C. Alippi, L. Livi
IEEE Transactions on Signal Processing (2019)
Given a finite sequence of graphs, we propose a methodology to identify possible changes in stationarity in the stochastic process that generated such graphs. We consider a general family of attributed graphs for which both topology (vertices and edges) and associated attributes are allowed to change over time, without violating the stationarity hypothesis. Novel ChangePoint Methods (CPMs) are proposed that map graphs onto vectors, apply a suitable statistical test in vector space and detect changes –if any– according to a userdefined confidence level; an estimate for the change point is provided as well. We ground our methods on theoretical results that show how the inference in the numerical vector space is related to the one in graph domain, and viceversa.

Deep Learning for Time Series Forecasting: The Electric Load Case.
Alberto Gasparin, Slobodan Lukovic, C. Alippi
Preprint (2019)
We review and experimentally evaluate on two realworld datasets the most recent trends in electric load forecasting, by contrasting deep learning architectures on short term forecast (one day ahead prediction). Specifically, we focus on feedforward and recurrent neural networks, sequence to sequence models and temporal convolutional neural networks along with architectural variants.

Autoregressive Models for Sequences of Graphs.
D. Zambon, D. Grattarola, L. Livi, C. Alippi
International Joint Conference on Neural Networks (2019)
This paper proposes an autoregressive (AR) model for sequences of graphs, which generalizes traditional AR models. A first novelty consists in formalizing the AR model for a very general family of graphs, characterized by a variable topology, and attributes associated with nodes and edges. A graph neural network is also proposed to learn the AR function associated with the graphgenerating process, and subsequently predict the next graph in a sequence.

Change Detection in Graph Streams by Learning Graph Embeddings on ConstantCurvature Manifolds.
D. Grattarola, D. Zambon, L. Livi, C. Alippi
IEEE Transactions on Neural Networks and Learning Systems (2019)
We focus on the problem of detecting changes in stationarity in a stream of attributed graphs. To this end, we introduce a novel change detection framework based on neural networks and Constantcurvature manifolds (CCMs), that takes into account the nonEuclidean nature of graphs. Our contribution in this work is twofold. First, via a novel approach based on adversarial learning, we compute graph embeddings by training an autoencoder to represent graphs on CCMs. Second, we introduce two novel change detection tests operating on CCMs.

Anomaly and Change Detection in Graph Streams through ConstantCurvature Manifold Embeddings.
D. Zambon, L. Livi, C. Alippi
IEEE International Joint Conference on Neural Networks (2018)
We investigate how embedding graphs on constantcurvature manifolds (hyperspherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams.

Concept Drift and Anomaly Detection in Graph Streams.
D. Zambon, C. Alippi, L. Livi
IEEE Transactions on Neural Networks and Learning Systems (2018)
We consider stochastic processes generating graphs and propose a methodology for detecting changes in stationarity of such processes. The methodology acts by embedding every graph of the stream into a vector domain, where a conventional multivariate change detection procedure can be easily applied. We ground the soundness of our proposal by proving several theoretical results.

Detecting Changes in Sequences of Attributed Graphs.
D. Zambon, L. Livi, C. Alippi
IEEE Symposium Series on Computational Intelligence (2017)
We consider a methodology for detecting changes in sequences of graphs. Changes are recognized by embedding each graph into a vector space, where conventional change detection procedures exist and can be easily applied. We introduce the methodology and focus on expanding experimental evaluations on controlled yet relevant examples involving geometric graphs and Markov chains.
Reinforcement Learning

Deep Reinforcement Learning with Weighted QLearning.
A. Cini, C. D'Eramo, J. Peter, C. Alippi
Preprint (2020)
Overestimation of the maximum actionvalue is a wellknown problem that hinders QLearning performance, leading to suboptimal policies and unstable learning. Among several QLearning variants proposed to address this issue, Weighted QLearning (WQL) effectively reduces the bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated actionvalues, where the weights correspond to the probability of each actionvalue being the maximum; however, the computation of these probabilities is only practical in the tabular settings. In this work, we provide the methodological advances to benefit from the WQL properties in Deep Reinforcement Learning (DRL), by using neural networks with Dropout Variational Inference as an effective approximation of deep Gaussian processes. In particular, we adopt the Concrete Dropout variant to obtain calibrated estimates of epistemic uncertainty in DRL. We show that model uncertainty in DRL can be useful not only for action selection, but also action evaluation. We analyze how the novel Weighted Deep QLearning algorithm reduces the bias wrt relevant baselines and provide empirical evidence of its advantages on several representative benchmarks.
Dynamics of RNNs

Learn to Synchronize, Synchronize to Learn.
P. Verzelli, C. Alippi, L. Livi
Preprint (2020)
In recent years, the machine learning community has seen a continuous growing interest in research aimed at investigating dynamical aspects of both training procedures and perfected models. Of particular interest among recurrent neural networks, we have the Reservoir Computing (RC) paradigm for its conceptual simplicity and fast training scheme. Yet, the guiding principles under which RC operates are only partially understood. In this work, we study the properties behind learning dynamical systems with RC and propose a new guiding principle based on Generalized Synchronization (GS) granting its feasibility. We show that the wellknown Echo State Property (ESP) implies and is implied by GS, so that theoretical results derived from the ESP still hold when GS does. However, by using GS one can profitably study the RC learning procedure by linking the reservoir dynamics with the readout training. Notably, this allows us to shed light on the interplay between the input encoding performed by the reservoir and the output produced by the readout optimized for the task at hand. In addition, we show that  as opposed to the ESP  satisfaction of the GS can be measured by means of the Mutual False Nearest Neighbors index, which makes effective to practitioners theoretical derivations.

Input Representation in Recurrent Neural Networks Dynamics.
P. Verzelli, C. Alippi, L. Livi, P. Tino
Preprint (2020)
Reservoir computing is a popular approach to design recurrent neural networks, due to its training simplicity and its approximation performance. The recurrent part of these networks is not trained (e.g. via gradient descent), making them appealing for analytical studies, raising the interest of a vast community of researcher spanning from dynamical systems to neuroscience. It emerges that, even in the simple linear case, the working principle of these networks is not fully understood and the applied research is usually driven by heuristics. A novel analysis of the dynamics of such networks is proposed, which allows one to express the state evolution using the controllability matrix. Such a matrix encodes salient characteristics of the network dynamics: in particular, its rank can be used as an inputindepedent measure of the memory of the network. Using the proposed approach, it is possible to compare different architectures and explain why a cyclic topology achieves favourable results.

Echo State Networks with SelfNormalizing Activations on the HyperSphere.
P. Verzelli, C. Alippi, L. Livi
Nature Scientific Reports (2019)
We propose a model of echo state networks that eliminates critical dependence on hyperparameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behavior in phase space characterized by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are richenough to approximate many common nonlinear systems used for benchmarking.

A Characterization of the Edge of Criticality in Binary Echo State Networks.
P. Verzelli, L. Livi, C. Alippi
IEEE International Workshop on Machine Learning for Signal Processing (2018)
We propose binary echo state networks (ESNs), which are architecturally equivalent to standard ESNs but consider binary activation functions and binary recurrent weights. For these networks, we derive a closedform expression for the edge of criticality (EoC) in the autonomous case and perform simulations in order to assess their behavior in the case of noisy neurons and in the presence of a signal. We propose a theoretical explanation for the fact that the variance of the input plays a major role in characterizing the EoC.
Others

Adversarial Autoencoders with ConstantCurvature Latent Manifolds.
D. Grattarola, L. Livi, C. Alippi
Applied Soft Computing (2019)
We introduce the constantcurvature manifold adversarial autoencoder (CCMAAE), a probabilistic generative model trained to represent a data distribution on a constantcurvature Riemannian manifold (CCM). Our method works by matching the aggregated posterior of the CCMAAE with a probability distribution defined on a CCM, so that the encoder implicitly learns to represent data on the CCM to fool the discriminator network. The geometric constraint is also explicitly imposed by jointly training the CCMAAE to maximize the membership degree of the embeddings to the CCM.