Ph.D

Group : Learning and Optimization

*Amélioration du MCTS par des techniques d'apprentissage supervisé et applications dans le domaine de l'énergie*
Starts on 01/09/2010

Advisor : TEYTAUD, Olivier

**Funding :**
**Affiliation :** Université Paris-Sud

**Laboratory :** LRI-TAO

**Defended** on 30/09/2013, committee :

Philippe Preux, Professeur Univ. Lille 3, rapporteur

- Nicolas Sabouret, professeur Supelec, LIMSI/AMI

- Tristan Cazenave, Professeur Univ. Paris Dauphine, examinateur

- Olivier Buffet, CR Inria, LORIA-Maia, Vandœuvre-lès-Nancy, examinateur

- Simon Lucas, Professor, Essex University, examinateur

- Marc Schoenauer, DR Inria, LRI/Tao, examinateur

- Louis Wehenkel, Professeur, Univ. Montefiore, Belgique, rapporteur

**Research activities :**
**Abstract :**
We consider the class of problems where an agent is making a series of actions, navigating from state to state, and gathering rewards on the way. We look at the general case, where actions and states can be infinite, and where the transition associated to a given couple action-state can be stochastic.

One of the motivating application is the problem of energy stock management. This problem is the one faced by an agent operating energy production facilities, along with energy stocks (batteries, water stocks...), under uncertainty (fuel prices, energy demand, weather, etc). State of the art methods need a simplified version of the model to run properly.

MCTS does not need to simplify the model of this problem, and could thus be a direction to improve existing solutions to the energy stock management problem.

In this work, we extend the vanilla version of MCTS (the one that has been successful on the game of Go, with finite domain and deterministic transitions) to make it consistent on continuous domains and stochastic transitions.

We also present ways to increase the convergence speed of continuous MCTS, and show some empirical results on simulations of the stock management problem.

We show some other applications of continuous MCTS, to POMDP (with an application to Mine Sweeper), and in a meta-bandit framework.

Finally we introduce a framework to easily include expert knowledge in MCTS, making it a way to improve existing policies.

Continuous MCTS is interesting in the sense that its main restriction is that it is a model based method (it requires a generative model of the problem). Many other methods require things that MCTS does not need, and that are not always easy to guarantee. These assumptions include convexity assumption (of the action space and/or of the cost function), explicit knowledge of the action space (to discretize it for example), Markovian random process, etc.

Continuous MCTS is also an anytime algorithm (it can be interrupted at any time and still return a suboptimal but valid solution).