Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Thesis in progress de

Thesis in progress
Group : Learning and Optimization

Generative modeling: statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows

Starts on 01/10/2017
Advisor : FURTLEHNER, Cyril

Funding : contrat doctoral du Ministère
Affiliation : Université Paris-Saclay
Laboratory : LRI - AO

Defended on 09/03/2022, committee :
Directeur de thèse :
- Cyril Furtlehner, Inria Saclay

Co-directeur :
- Aurélien Decelle, Universidad Complutense de Madrid

- Alexandre Allauzen, École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris
- Carlo Baldassi, Bocconi University
- Andrew Saxe, University of Oxford

- Muneki Yasuda, Yamagata University
- Martin Weigt, Sorbonne Université

- Pierfrancesco Urbani, CNRS, IPhT

Research activities :

Abstract :
Neural network models able to approximate and sample high dimensional probability distributions are known as generative models.

In recent years this class of models has received tremendous attention due to their potential in automatically learning meaningful representations of the vast amount of data that we produce and consume daily.

This thesis presents theoretical and algorithmic results pertaining to generative models and it is divided in two parts.

In the first part, we focus on the Restricted Boltzmann Machine (RBM) and its statistical physics formulation.

Historically, statistical physics has played a central role in studying the theoretical foundations and providing inspiration for neural network models.

The first neural implementation of an associative memory (Hopfield, 1982) is a seminal work in this context.

The RBM can be regarded as a development of the Hopfield model, and it is of particular interest due to its role at the forefront of the deep learning revolution (Hinton et al. 2006).

Exploiting its statistical physics formulation, we derive a mean-field theory of the RBM that allows us to characterize both its functioning as a generative model and the dynamics of its training procedure.

This analysis proves useful in deriving a robust mean-field imputation strategy that makes it possible to use the RBM to learn empirical distributions in the challenging case in which the dataset to model is only partially observed and presents high percentages of missing information.

In the second part we consider a class of generative models known as Normalizing Flows (NF), whose distinguishing feature is the ability to model complex high-dimensional distributions by employing invertible transformations of a simple tractable distribution.

The invertibility of the transformation allows expressing the probability density through a change of variables, whose optimization by Maximum Likelihood (ML) is rather straightforward but computationally expensive.

The common practice is to impose architectural constraints on the class of transformations used for NF, in order to make the ML optimization efficient.

Proceeding from geometrical considerations, we propose a stochastic gradient descent optimization algorithm that exploits the matrix structure of fully connected neural networks without imposing any constraints on their structure other than the fixed dimensionality required by invertibility.

This algorithm is computationally efficient and can scale to very high dimensional datasets.

We demonstrate its effectiveness in training a multilayer nonlinear architecture employing fully connected layers.

Ph.D. dissertations & Faculty habilitations


The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.