Some of my recent favorite papers are listed below. At the moment I am interested in the following questions:
- How to define an "intrinsic motivation" for autonomous agents, without any ground truth (e.g. when there is no robot simulator), and how to tackle underspecified problems ?
- How to tackle meta-learning, i.e. algorithm/heuristic selection in an algorithm portfolio, and how to tune hyper-parameters depending on the problem instance at hand ?
- How to handle some sequential decision making problems in machine learning and optimmization (feature selection, active learning, surrogate learning) ?
Rewards for autonomous agents
- APRIL: Active Preference-learning based Reinforcement Learning
Riad Akrour; Marc Schoenauer; Michele Sebag
ECML PKDD 2012, Springer Verlag LNCS 7524, pp. 116-131.
In reinforcement learning, the expert might define a reward function; or demonstrate the target behaviors (inverse reinforcement learning); or give preference feedback on the behaviors demonstrated by the agent. Active learning is used to minimize the requested preference queries.
- Sustainable cooperative coevolution with a multi-armed bandit
Francois-Michel De Rainville, Michele Sebag, Christian Gagné, Marc Schoenauer, Denis Laurendeau.
GECCO 2013: 1517-1524
When two populations co-evolve, they should have commensurate computational budgets.
- Open-Ended Evolutionary Robotics: An Information Theoretic Approach
Pierre Delarboulas, Marc Schoenauer, Michele Sebag.
In Parallel Problem Solving from Nature 2010 Springer Verlag LNCS, p. 334-343
The robot computes and optimizes a criterion on-board, without any ground truth: the quantity of information in the robotic log.
Algorithm/heuristic selection and hyper-parameter tuning
- Collaborative hyperparameter tuning
Remi Bardenet; Mathias Brendel; Balazs Kegl; Michele Sebag
Int. Conf. on Machine Learning, JMLR Workshop and Conference Proceedings, 28, pp. 199-207
Rank-based learning is used to learn the performance as a function of the hyper-parameter values.
- Bandit-based Search for Constraint Programming
Manuel Loth; Michele Sebag; Youssef Hamadi; Marc Schoenauer
Int. Conf. on Principles and Practice of Constraint Programming, Springer Verlag LNCS 8124, pp. 464-480
A multi-armed bandit is used to select the variable values during the CP search.
- Extreme Value Based Adaptive Operator Selection
Alvaro Fialho, Luis Da Costa, Marc Schoenauer, and
Solving From Nature 2008, Springer Verlag, pages 175--184, 2008.
How to adaptively adjust online the probability of variation operators ?
Sequential decision making in machine learning and optimization
- Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy
Ilya Loshchilov, Marc Schoenauer, Michele Sebag.
GECCO 2012: 321-328
The invariance properties w.r.t. monotonous transformation of the objective function and affine transformations of the solution space are preserved by tightly coupling CMA-ES, Ranking-SVM and the online optimization of Ranking-SVM hyper-parameters.
- Feature Selection as a One-Player Game
Romaric Gaudel, Michele Sebag.
Int. Conf. on Machine Learning 2010 359-366
Feature selection is formalized as an (intractable) reinforcement learning problem, and Monte-Carlo tree search is used to approximate the corresponding optimal policy.
- Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm
Philippe Rolet, Michele Sebag, Olivier Teytaud.
ECML PKDD 2009: 302-317
Active learning is formalized as an (intractable) reinforcement learning problem and Monte-Carlo tree search is used to approximate the corresponding optimal policy.