Français Anglais
Accueil Annuaire Plan du site
Accueil > Production scientifique > Thèses et habilitations
Production scientifique
Doctorat de

Equipe : Apprentissage et Optimisation

Apprentissage par renforcement et réseaux de neurones : approches dynamiques

Début le 01/10/2016
Direction : OLLIVIER, Yann

Ecole doctorale : ED STIC 580
Etablissement d'inscription : Université Paris-Sud

Lieu de déroulement : LRI - AO

Soutenue le 07/10/2019 devant le jury composé de :
Directeur de thèse :

Rapporteurs :
- M. Joan BRUNA, Université de New York
- M. Pascal VINCENT, Université de Montréal

Examinateurs :
- Mme Anne VILNAT, Université Paris-Sud
- M. Francis BACH, École Normale Supérieure
- M. Jean-Philippe VERT, Mines ParisTech

Activités de recherche :

Résumé :
An intelligent agent immerged in its environment must be able to both
understand and interact with the world. Understanding the environment requires
processing sequences of sensorial inputs. Interacting with the environment
typically involves issuing actions, and adapting those actions to strive
towards a given goal, or to maximize a notion of reward. This view of a two
parts agent-environment interaction motivates the two parts of this thesis: recurrent
neural networks are powerful tools to make sense of complex and diverse
sequences of inputs, such as those resulting from an agent-environment
interaction; reinforcement learning is the field of choice to direct the
behavior of an agent towards a goal. This thesis aim is to provide theoretical
and practical insights in those two domains. In the field of recurrent
networks, this thesis contribution is twofold: we introduce two new,
theoretically grounded and scalable learning algorithms that can be used online.
Besides, we advance understanding of gated recurrent networks, by examining their
invariance properties. In the field of reinforcement learning, our main
contribution is to provide guidelines to design time discretization robust
algorithms. All these contributions are theoretically grounded, and backed up
by experimental results.