Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Group : Learning and Optimization

Amélioration du MCTS par des techniques d'apprentissage supervisé et applications dans le domaine de l'énergie

Starts on 01/09/2010
Advisor : TEYTAUD, Olivier

Funding :
Affiliation : Université Paris-Sud
Laboratory : LRI-TAO

Defended on 30/09/2013, committee :
Philippe Preux, Professeur Univ. Lille 3, rapporteur
- Nicolas Sabouret, professeur Supelec, LIMSI/AMI
- Tristan Cazenave, Professeur Univ. Paris Dauphine, examinateur
- Olivier Buffet, CR Inria, LORIA-Maia, Vandœuvre-lès-Nancy, examinateur
- Simon Lucas, Professor, Essex University, examinateur
- Marc Schoenauer, DR Inria, LRI/Tao, examinateur
- Louis Wehenkel, Professeur, Univ. Montefiore, Belgique, rapporteur

Research activities :

Abstract :
We consider the class of problems where an agent is making a series of actions, navigating from state to state, and gathering rewards on the way. We look at the general case, where actions and states can be infinite, and where the transition associated to a given couple action-state can be stochastic.

One of the motivating application is the problem of energy stock management. This problem is the one faced by an agent operating energy production facilities, along with energy stocks (batteries, water stocks...), under uncertainty (fuel prices, energy demand, weather, etc). State of the art methods need a simplified version of the model to run properly.
MCTS does not need to simplify the model of this problem, and could thus be a direction to improve existing solutions to the energy stock management problem.

In this work, we extend the vanilla version of MCTS (the one that has been successful on the game of Go, with finite domain and deterministic transitions) to make it consistent on continuous domains and stochastic transitions.
We also present ways to increase the convergence speed of continuous MCTS, and show some empirical results on simulations of the stock management problem.
We show some other applications of continuous MCTS, to POMDP (with an application to Mine Sweeper), and in a meta-bandit framework.
Finally we introduce a framework to easily include expert knowledge in MCTS, making it a way to improve existing policies.

Continuous MCTS is interesting in the sense that its main restriction is that it is a model based method (it requires a generative model of the problem). Many other methods require things that MCTS does not need, and that are not always easy to guarantee. These assumptions include convexity assumption (of the action space and/or of the cost function), explicit knowledge of the action space (to discretize it for example), Markovian random process, etc.
Continuous MCTS is also an anytime algorithm (it can be interrupted at any time and still return a suboptimal but valid solution).

Ph.D. dissertations & Faculty habilitations
The original manuscript conceptualizes the recent rise of digital platforms along three main dimensions: their nature of coordination devices fueled by data, the ensuing transformations of labor, and the accompanying promises of societal innovation. The overall ambition is to unpack the coordination role of the platform and where it stands in the horizon of the classical firm – market duality. It is also to precisely understand how it uses data to do so, where it drives labor, and how it accommodates socially innovative projects. I extend this analysis to show continuity between today’s society dominated by platforms and the “organizational society”, claiming that platforms are organized structures that distribute resources, produce asymmetries of wealth and power, and push social innovation to the periphery of the system. I discuss the policy implications of these tendencies and propose avenues for follow-up research.