Selected Publications (last updated nov. 2013)

Michele Sebag

Some of my recent favorite papers are listed below. At the moment I am interested in the following questions:

How to define an "intrinsic motivation" for autonomous agents, without any ground truth (e.g. when there is no robot simulator), and how to tackle underspecified problems ?
How to tackle meta-learning, i.e. algorithm/heuristic selection in an algorithm portfolio, and how to tune hyper-parameters depending on the problem instance at hand ?
How to handle some sequential decision making problems in machine learning and optimmization (feature selection, active learning, surrogate learning) ?

Rewards for autonomous agents

APRIL: Active Preference-learning based Reinforcement Learning
Riad Akrour; Marc Schoenauer; Michele Sebag
ECML PKDD 2012, Springer Verlag LNCS 7524, pp. 116-131.
In reinforcement learning, the expert might define a reward function; or demonstrate the target behaviors (inverse reinforcement learning); or give preference feedback on the behaviors demonstrated by the agent. Active learning is used to minimize the requested preference queries.
Sustainable cooperative coevolution with a multi-armed bandit
Francois-Michel De Rainville, Michele Sebag, Christian Gagné, Marc Schoenauer, Denis Laurendeau.
GECCO 2013: 1517-1524
When two populations co-evolve, they should have commensurate computational budgets.
Open-Ended Evolutionary Robotics: An Information Theoretic Approach
Pierre Delarboulas, Marc Schoenauer, Michele Sebag.
In Parallel Problem Solving from Nature 2010 Springer Verlag LNCS, p. 334-343
The robot computes and optimizes a criterion on-board, without any ground truth: the quantity of information in the robotic log.

Collaborative hyperparameter tuning
Remi Bardenet; Mathias Brendel; Balazs Kegl; Michele Sebag
Int. Conf. on Machine Learning, JMLR Workshop and Conference Proceedings, 28, pp. 199-207
Rank-based learning is used to learn the performance as a function of the hyper-parameter values.
Bandit-based Search for Constraint Programming
Manuel Loth; Michele Sebag; Youssef Hamadi; Marc Schoenauer
Int. Conf. on Principles and Practice of Constraint Programming, Springer Verlag LNCS 8124, pp. 464-480
A multi-armed bandit is used to select the variable values during the CP search.
Extreme Value Based Adaptive Operator Selection
Alvaro Fialho, Luis Da Costa, Marc Schoenauer, and Michele Sebag.
Parallel Problem Solving From Nature 2008, Springer Verlag, pages 175--184, 2008.
How to adaptively adjust online the probability of variation operators ?

Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy
Ilya Loshchilov, Marc Schoenauer, Michele Sebag.
GECCO 2012: 321-328
The invariance properties w.r.t. monotonous transformation of the objective function and affine transformations of the solution space are preserved by tightly coupling CMA-ES, Ranking-SVM and the online optimization of Ranking-SVM hyper-parameters.
Feature Selection as a One-Player Game
Romaric Gaudel, Michele Sebag.
Int. Conf. on Machine Learning 2010 359-366 Feature selection is formalized as an (intractable) reinforcement learning problem, and Monte-Carlo tree search is used to approximate the corresponding optimal policy.
Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm
Philippe Rolet, Michele Sebag, Olivier Teytaud.
ECML PKDD 2009: 302-317 Active learning is formalized as an (intractable) reinforcement learning problem and Monte-Carlo tree search is used to approximate the corresponding optimal policy.