Some of my recent favorite papers are listed below. At the moment I am interested in the following questions:

- How to define an "intrinsic motivation" for autonomous agents, without any ground truth (e.g. when there is no robot simulator), and how to tackle underspecified problems ?
- How to tackle meta-learning, i.e. algorithm/heuristic selection in an algorithm portfolio, and how to tune hyper-parameters depending on the problem instance at hand ?
- How to handle some sequential decision making problems in machine learning and optimmization (feature selection, active learning, surrogate learning) ?

**APRIL: Active Preference-learning based Reinforcement Learning**

Riad Akrour; Marc Schoenauer; Michele Sebag

ECML PKDD 2012, Springer Verlag LNCS 7524, pp. 116-131.

In reinforcement learning, the expert might define a reward function; or demonstrate the target behaviors (inverse reinforcement learning); or give preference feedback on the behaviors demonstrated by the agent. Active learning is used to minimize the requested preference queries.**Sustainable cooperative coevolution with a multi-armed bandit**

Francois-Michel De Rainville, Michele Sebag, Christian Gagné, Marc Schoenauer, Denis Laurendeau.

GECCO 2013: 1517-1524

When two populations co-evolve, they should have commensurate computational budgets.**Open-Ended Evolutionary Robotics: An Information Theoretic Approach**

Pierre Delarboulas, Marc Schoenauer, Michele Sebag.

In Parallel Problem Solving from Nature 2010 Springer Verlag LNCS, p. 334-343

The robot computes and optimizes a criterion on-board, without any ground truth: the quantity of information in the robotic log.

**Collaborative hyperparameter tuning**

Remi Bardenet; Mathias Brendel; Balazs Kegl; Michele Sebag

Int. Conf. on Machine Learning, JMLR Workshop and Conference Proceedings, 28, pp. 199-207

Rank-based learning is used to learn the performance as a function of the hyper-parameter values.**Bandit-based Search for Constraint Programming**

Manuel Loth; Michele Sebag; Youssef Hamadi; Marc Schoenauer

Int. Conf. on Principles and Practice of Constraint Programming, Springer Verlag LNCS 8124, pp. 464-480

A multi-armed bandit is used to select the variable values during the CP search.**Extreme Value Based Adaptive Operator Selection**

Alvaro Fialho, Luis Da Costa, Marc Schoenauer, and Michele Sebag.

Parallel Problem Solving From Nature 2008, Springer Verlag, pages 175--184, 2008.

How to adaptively adjust online the probability of variation operators ?

**Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy**

Ilya Loshchilov, Marc Schoenauer, Michele Sebag.

GECCO 2012: 321-328

The invariance properties w.r.t. monotonous transformation of the objective function and affine transformations of the solution space are preserved by tightly coupling CMA-ES, Ranking-SVM and the online optimization of Ranking-SVM hyper-parameters.**Feature Selection as a One-Player Game**

Romaric Gaudel, Michele Sebag.

Int. Conf. on Machine Learning 2010 359-366 Feature selection is formalized as an (intractable) reinforcement learning problem, and Monte-Carlo tree search is used to approximate the corresponding optimal policy.**Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm**

Philippe Rolet, Michele Sebag, Olivier Teytaud.

ECML PKDD 2009: 302-317 Active learning is formalized as an (intractable) reinforcement learning problem and Monte-Carlo tree search is used to approximate the corresponding optimal policy.