Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Group : Learning and Optimization

Multi-Objective Sequential Decision Making

Starts on 23/09/2010
Advisor : SEBAG, Michèle

Funding :
Affiliation : Université Paris-Sud
Laboratory : LRI

Defended on 11/07/2014, committee :
Directrice de Thèse :

- Michèle Sebag, DR CNRS, Université Paris-Sud, LRI/TAO

Examinateurs :

- Yann Chevaleyre, Professeur, Université Paris 13, LIPN
- Cécile Germain-Renaud, Professeur, Université Paris-Sud, LRI/TAO
- Dominique Gouyou-Beauchamps, Professeur, Université Paris-Sud, LRI

Rapporteurs :

- Jin-Kao Hao, Professeur, Université d'Angers, LERIA
- Philippe Preux, Professeur, Université de Lille 3, INRIA Lille

Research activities :

Abstract :
This thesis is concerned with multi-objective sequential decision making (MOSDM).
The motivation is twofold. On the one hand, many decision problems in the domains of e.g., robotics, scheduling or games, involve the optimization of sequences of decisions. On the other hand, many real-world applications are most naturally formulated in terms
of multi-objective optimization (MOO).

The proposed approach extends the well-known Monte-Carlo tree search (MCTS) framework to the MOO setting, with the goal of discovering several optimal sequences of decisions through growing a single search tree. The main challenge is to propose a new reward, able to guide the exploration of the tree although the MOO setting does not enforce a total order among solutions.

The main contribution of the thesis is to propose and experimentally study two such rewards, inspired from the MOO literature and assessing a solution with respect to the archive of previous solutions (Pareto archive): the hypervolume indicator and the Pareto dominance reward.

The study shows the complementarity of these two criteria. The hypervolume indicator su ers from its known computational complexity; however the proposed extension thereof provides ne-grained information abut the quality of solutions with respect to the current archive. Quite the contrary, the Pareto-dominance reward is linear but it provides increasingly rare information.

Proofs of principle of the approach are given on arti cial problems and challenges, and con firm the merits of the approach. In particular, MOMCTS is able to discover policies lying in non-convex regions of the Pareto front, contrasting with the state of the art: existing Multi-Objective Reinforcement Learning algorithms are based on linear scalarization and thus fail to sample such non-convex regions.

Finally MOMCTS honorably competes with the state of the art on the 2013 MOPTSP competition.

Ph.D. dissertations & Faculty habilitations
The original manuscript conceptualizes the recent rise of digital platforms along three main dimensions: their nature of coordination devices fueled by data, the ensuing transformations of labor, and the accompanying promises of societal innovation. The overall ambition is to unpack the coordination role of the platform and where it stands in the horizon of the classical firm – market duality. It is also to precisely understand how it uses data to do so, where it drives labor, and how it accommodates socially innovative projects. I extend this analysis to show continuity between today’s society dominated by platforms and the “organizational society”, claiming that platforms are organized structures that distribute resources, produce asymmetries of wealth and power, and push social innovation to the periphery of the system. I discuss the policy implications of these tendencies and propose avenues for follow-up research.