Publication record of Olivier Teytaud

Publi 1. Grid coevolution for adaptive simulations; application to the building of opening books in the game of Go

2009
Audouard, Pierre
Chaslot, Guillaume
Hoock, Jean-Baptiste
Rimmel, Arpad
Perez, J.
Teytaud, Olivier

Abstract: Résumé : This paper presents a successful application of parallel (grid) coevolution applied to the building of an opening book (OB) in 9x9 Go. Known sayings around the game of Go are refound by the algorithm, and the resulting program was also able to credibly comment openings in professional games of 9x9 Go. Interestingly, beyond the application to the game of Go, our algorithm can be seen as a ”meta”-level for the UCT-algorithm: ”UCT applied to UCT” (instead of ”UCT applied to a random player” as usual), in order to build an OB. It is generic and could be applied as well for analyzing a given situation of a Markov Decision Process.

PDF file

Bibtex (click on "Export this paper")

Publi 2.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 3. A Statistical Learning Perspective of Genetic Programming}

2009
Amil, Merve
Bredeche, Nicolas
Gagné, Christian
Gelly, Sylvain
Schoenauer, Marc
Teytaud, Olivier

Abstract: Résumé : This paper proposes a theoretical analysis of Genetic Programming (GP) from the perspective of statistical learning theory, a well grounded mathematical toolbox for machine learning. By computing the Vapnik-Chervonenkis dimension of the family of programs that can be inferred by a specific setting of GP, it is proved that a parsimonious fitness ensures universal consistency. This means that the empirical error minimization allows convergence to the best possible error when the number of test cases goes to infinity. However, it is also proved that the standard method consisting in putting a hard limit on the program size still results in programs of infinitely increasing size in function of their accuracy. It is also shown that cross-validation or hold-out for choosing the complexity level that optimizes the error rate in generalization also leads to bloat. So a more complicated modification of the fitness is proposed in order to avoid unnecessary bloat while nevertheless preserving universal consistency.

PDF file

Bibtex (click on "Export this paper")

Publi 4.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 5. On the parallel speed-up of Estimation of Multivariate Normal Algorithm and Evolution Strategies

2009
Teytaud, Fabien
Teytaud, Olivier

Abstract: Résumé : Motivated by parallel optimization, we experiment EDA-like adaptation-rules in the case of $\lambda$ large. The rule we use, essentially based on estimation of multivariate normal algorithm, is (i) compliant with all families of distributions for which a density estimation algorithm exists (ii) simple (iii) parameter-free (iv) better than current rules in this framework of $\lambda$ large. The speed-up as a function of $\lambda$ is consistent with theoretical bounds.

PDF file

Bibtex (click on "Export this paper")

Publi 6.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 7. Why one must use reweighting in Estimation Of Distribution Algorithms

2009
Teytaud, Fabien
Teytaud, Olivier

Abstract: Résumé : We study the update of the distribution in Estimation of Distribution Algorithms, and show that a simple modification leads to unbiased estimates of the optimum. The simple modification (based on a proper reweighting of estimates) leads to a strongly improved behavior in front of premature convergence.

PDF file

Bibtex (click on "Export this paper")

Publi 8.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 9. Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

2008
Chatriot, Louis
Fiter, Christophe
Chaslot, Guillaume
Gelly, Sylvain
Hoock, Jean-Baptiste
Perez, J.
Rimmel, Arpad
Teytaud, Olivier

Abstract: Résumé : Nous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ficiel à 4 échelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisés pour accélérer l'explora- tion mais sont ensuite peu à peu laissés de côté à mesure que des informations plus fines sont disponibles ; – apprentissage hors-ligne, par fouille de données de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque élément séparément. Nous mettons en évidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une très forte amélioration par calage des paramètres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – offline learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information.

PDF file

Bibtex (click on "Export this paper")

Publi 10.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 11. Introduction de connaissances expertes en Bandit-Based Monte-Carlo Planning avec application au Computer-Go

2008
Chatriot, Louis
Gelly, Sylvain
Hoock, Jean-Baptiste
Pérez, Julien
Rimmel, Arpad
Teytaud, Olivier

Abstract: Résumé : Nous ajoutons différentes astuces d'expertise Go dans un programmation de planification Monte-Carlo à partir de bandits, via des simulations virtuelles ajoutées aux statistiques de bandits.

PDF file

Bibtex (click on "Export this paper")

Publi 12.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 13. On the Parallelization of Monte-Carlo planning

2008
Gelly, Sylvain
Hoock, Jean-Baptiste
Rimmel, Arpad
Teytaud, Olivier
Kalemkarian, Yann

Abstract: Résumé : We provide a parallelization with and without shared-memory for Bandit-Based Monte-Carlo Planning algorithms, applied to the game of Go. The resulting algorithm won the first non-blitz game against a professionnal human player in 9x9 Go.

PDF file

Bibtex (click on "Export this paper")

Publi 14.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 15. When does quasi-random work ?

2008
Teytaud, Olivier

Abstract: Résumé : We experiment the efficiency of quasi-random mutations in evolution strategies in continuous domains from various points of views: (i) non-convexity (ii) convergence rate (iii) non-asymptotic behavior (iv) noise. We conclude that quasi-random mutations are great.

PDF file

Bibtex (click on "Export this paper")

Publi 16.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 17. Lower bounds for evolution strategies using VC-dimension

2008
Teytaud, Olivier
Fournier, Hervé

Abstract: Résumé : We derive lower bounds for comparison-based or selection-based algorithms, improving existing results in the continuous setting, and extending them to non-trivial results in the discrete case. We introduce for that the use of the VC-dimension of the level sets of the fitness functions; results are then obtained through the use of Sauer's lemma. In the special case of optmization of the sphere function, improved lower bounds are obtained by bounding the possible number of sign conditions realized by some systems of equations. The results include several applications to the parametrization of sequential or parallel algorithms of type $\mul$-ES.

PDF file

Bibtex (click on "Export this paper")

Publi 18.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 19. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

2007
Hartland, Cédric
Baskiotis, Nicolas
Gelly, Sylvain
Sebag, Michele
Teytaud, Olivier

Abstract: Résumé : Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms. The paper also investigates the sensitivity of the proposed algorithm with respect to the number of available options.

PDF file

Bibtex (click on "Export this paper")

Publi 20.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 21. Boosting Active Learning to Optimality: a Tractable Monte-Carlo, Billiard-based Algorithm

2009
Rolet, Philippe
Sebag, Michèle
Teytaud, Olivier

Abstract: Résumé : Abstract. This paper focuses on Active Learning with a limited num- ber of queries; in application domains such as Numerical Engineering, the size of the training set might be limited to a few dozen or hundred exam- ples due to computational constraints. Active Learning under bounded resources is formalized as a finite horizon Reinforcement Learning prob- lem, where the sampling strategy aims at minimizing the expectation of the generalization error. A tractable approximation of the optimal (in- tractable) policy is presented, the Bandit-based Active Learner (BAAL) algorithm. Viewing Active Learning as a single-player game, BAAL com- bines UCT, the tree structured multi-armed bandit algorithm proposed by Kocsis and Szepesv´ri (2006), and billiard algorithms. A proof of a principle of the approach demonstrates its good empirical convergence toward an optimal policy and its ability to incorporate prior AL crite- ria. Its hybridization with the Query-by-Committee approach is found to improve on both stand-alone BAAL and stand-alone QbC.

PDF file

Bibtex (click on "Export this paper")

Publi 22.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 23. Adding expert knowledge and exploration in Monte-Carlo Tree Search

2009
Chaslot, Guillaume
Fiter, Christophe
Hoock, Jean-Baptiste
Rimmel, Arpad
Teytaud, Olivier

Abstract: Résumé : We present a new exploration term, more efficient than clas- sical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classi- cal online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo: { We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. { We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. { Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien.

PDF file

Bibtex (click on "Export this paper")

Publi 24.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 25. A Novel Ontology for Computer Go Knowledge Management

2009
Lee, Chang-Shing
Mei-Hui, Wang
Hong, Tzung-Pei
Chaslot, Guillaume
Hoock, Jean-Baptiste
Rimmel, Arpad
Teytaud, Olivier
Kuo, Yau-Hwang

Abstract: Résumé : In order to stimulate the development and research in computer Go, several Taiwanese Go players, including three professional Go players and four amateur Go players, were invited to play against the famous computer Go program, MoGo, in the Taiwan Open 2009. The MoGo program combines the online game values, offline values extracted from databases, and expert rules defined by Go expert that shows an excellent performance in the games. The results reveal that MoGo can reach the level of 3 Dan in Taiwan amateur Go environment. But there are still some drawbacks for MoGo that should be solved, for example, the weaknesses in semeai and how to flexibly practice the human knowledge through the embedded opening books. In this paper, a new game record ontology for computer Go knowledge management is proposed to solve the problems that MoGo is facing. It is hoped that the advances in intelligent agent and ontology model can provide much more knowledge to make a progress in computer Go and achieve as much as computer chess or Chinese chess in the future.

PDF file

Bibtex (click on "Export this paper")

Publi 26.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 27. Optimizing Low-Discrepancy Sequences with an Evolutionary Algorithm

2009
Rainville, François-Michel De
Gagné, Christian
Teytaud, Olivier
Laurendeau, Denis

Abstract: Résumé : Many elds rely on some stochastic sampling of a given com- plex space. Low-discrepancy sequences are methods aim- ing at producing samples with better space-lling properties than uniformly distributed random numbers, hence allow- ing a more ecient sampling of that space. State-of-the-art methods like nearly orthogonal Latin hypercubes and scram- bled Halton sequences are congured by permutations of in- ternal parameters, where permutations are commonly done randomly. This paper proposes the use of evolutionary al- gorithms to evolve these permutations, in order to optimize a discrepancy measure. Results show that an evolution- ary method is able to generate low-discrepancy sequences of signicantly better space-lling properties compared to sequences congured with purely random permutations.

PDF file

Bibtex (click on "Export this paper")

Publi 28.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 29. Creating an Upper-Confidence-Tree program for Havannah

2009
Teytaud, Fabien
Teytaud, Olivier

Abstract: Résumé : Monte-Carlo Tree Search and Upper Confidence Bounds pro- vided huge improvements in computer-Go. In this paper, we test the generality of the approach by experimenting on another game, Havannah, which is known for being especially difficult for computers. We show that the same results hold, with slight differences related to the absence of clearly known patterns for the game of Havannah, in spite of the fact that Havannah is more related to connection games like Hex than to territory games like Go.

PDF file

Bibtex (click on "Export this paper")

Publi 30.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 31. On the huge benefit of quasi-random mutations for multimodal optimization with application to grid-based tuning of neurocontrollers

2009
Chaslot, Guillaume
Hoock, Jean-Baptiste
Teytaud, Fabien
Teytaud, Olivier

Abstract: Résumé : In this paper, we study the optimization of a neural network used for controlling a Monte-Carlo Tree Search (MCTS/UCT) algorithm. The main results are: (i) the specification of a new multimodal benchmark function; this function has been defined in particular in agreement with [1] which has pointed out that most multimodal functions are not satisfactory for some real-world multimodal scenarios (section 2); (ii) experimentation of Evolution Strategies on this new multimodal benchmark function, showing the great efficiency of quasi-random mutations in this framework (section 3); (iii) the proof-of-concept of the application of ES for grid-based tuning Neural Networks for controlling MCTS/UCT (see section 3).

PDF file

Bibtex (click on "Export this paper")

Publi 32.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 33. Optimal robust expensive optimization is tractable

2009
Rolet, Philippe
Sebag, Michele
Teytaud, Olivier

Abstract: Résumé : Following a number of recent papers investigating the possibility of optimal comparison-based optimization algorithms for a given distribution of probability on fitness functions, we (i) discuss the comparison-based constraints (ii) choose a setting in which theoretical tight bounds are known (iii) develop a careful implementation using billiard algorithms, Upper Confidence trees and (iv) experimentally test the tractability of the approach. The results, on still very simple cases, show that the approach, yet still preliminary, could be tested successfully until dimension 10 and horizon 50 iterations within a few hours on a standard computer, with convergence rate far better than the best algorithms.

PDF file

Bibtex (click on "Export this paper")

Publi 34.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 35. Continuous lunches are free plus the design of optimal optimization algorithms

2009
Auger, Anne
Teytaud, Olivier

Abstract: Résumé : This paper analyses extensions of No-Free-Lunch (NFL) theorems to countably infinite and uncountable infinite domains and investigates the design of optimal optimization algorithms. The original NFL theorem due to Wolpert and Macready states that, for finite search domains, all search heuristics have the same performance when averaged over the uniform distribution over all possible functions. For infinite domains the extension of the concept of distribution over all possible functions involves measurability issues and stochastic process the- ory. For countably infinite domains, we prove that the natural extension of NFL theorems, for the current formalization of probability, does not hold, but that a weaker form of NFL does hold, by stating the existence of non-trivial distributions of fitness leading to equal performances for all search heuristics. Our main result is that for continuous domains, NFL does not hold. This free-lunch theorem is based on the formalization of the concept of random fitness functions by means of random fields. We also consider the design of optimal optimization algorithms for a given random field, in a black-box setting, namely, a complexity measure based solely on the number of requests to the fitness function. We derive an optimal algorithm based on Bellman's decomposition principle, for a given number of iterates and a given distribution of fitness functions. We also approximate this algorithm thanks to a Monte-Carlo planning algorithm close to the UCT (Upper Confidence Trees) algorithm, and provide experimental results.

PDF file

Bibtex (click on "Export this paper")

Publi 36.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 37. Upper Confidence Trees and Billiards for Optimal Active Learning

2009
Rolet, Philippe
Sebag, Michèle
Teytaud, Olivier

Abstract: Résumé : This paper focuses on Active Learning (AL) with bounded compu- tational resources. AL is formalized as a finite horizon Reinforcement Learning problem, and tackled as a single-player game. An approximate optimal AL strat- egy based on tree-structured multi-armed bandit algorithms and billiard-based sampling is presented together with a proof of principle of the approach.

PDF file

Bibtex (click on "Export this paper")

Publi 38.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 39. The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments

2009
Lee, Chang-Shing
Wang, Mei-Hui
Chaslot, Guillaume
Hoock, Jean-Baptiste
Rimmel, Arpad
Teytaud, Olivier
Tsai, Shang-Rong
Hsu, Shun-Chin
Hong, Tzung-Pei

Abstract: Résumé : In order to promote computer Go and stimulate further development and research in the field, the event activities, “Computational Intelligence Forum” and “World 99 Computer Go Championship,” were held in Taiwan. This study focuses on the invited games played in the tournament, “Taiwanese Go players versus the computer program MoGo,” held at National University of Tainan (NUTN). Several Taiwanese Go players, including one 9-Dan professional Go player and eight amateur Go players, were invited by NUTN to play against MoGo from August 26 to October 4, 2008. The MoGo program combines All Moves As First (AMAF)/Rapid Action Value Estimation (RAVE) values, online “UCT-like” values, offline values extracted from databases, and expert rules. Additionally, four properties of MoGo are analyzed including: (1) the weakness in corners, (2) the scaling over time, (3) the behavior in handicap games, and (4) the main strength of MoGo in contact fights. The results reveal that MoGo can reach the level of 3 Dan with, (1) good skills for fights, (2) weaknesses in corners, in particular for “semeai” situations, and (3) weaknesses in favorable situations such as handicap games. It is hoped that the advances in artificial intelligence and computational power will enable considerable progress in the field of computer Go, with the aim of achieving the same levels as computer chess or Chinese chess in the future.

PDF file

Bibtex (click on "Export this paper")

Publi 40.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 41. DCMA, yet another derandomization in covariance-matrix-adaptation

2007
Teytaud, Olivier
Gelly, Sylvain

Abstract: Résumé : In a preliminary part of this paper, we analyze the neces- sity of randomness in evolution strategies. We conclude to the necessity of ”continuous”-randomness, but with a much more limited use of randomness than what is commonly used in evolution strategies. We then apply these results to CMA- ES, a famous evolution strategy already based on the idea of derandomization, which uses random independent Gaus- sian mutations. We here replace these random independent Gaussian mutations by a quasi-random sample. The mod- ification is very easy to do, the modified algorithm is com- putationally more efficient and its convergence is faster in terms of the number of iterates for a given precision.

PDF file

Bibtex (click on "Export this paper")

Publi 42.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 43. Continuous lunches are free!

2007
Auger, Anne
Teytaud, Olivier

Abstract: Résumé : This paper investigates extensions of No Free Lunch (NFL) theorems to countably infinite and uncountable infinite do- mains. The original NFL due to Wolpert and Macready states that all search heuristics have the same performance when averaged over the uniform distribution over all possible functions. For infinite domains the extension of the concept of distribution over all possible functions involves measur- ability issues and stochastic process theory. For countably infinite domains, we prove that the natural extension of NFL theorems does not hold, but that a weaker form of NFL does hold, by stating the existence of non-trivial distribution of fitness leading to equal performance for all search heuristics. Our main result is that for continuous domains, NFL does not hold.

PDF file

Bibtex (click on "Export this paper")

Publi 44.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 45. Conditionning, halting criteria and choosing lambda

2007
Teytaud, Olivier

Abstract: Résumé : We show the convergence of 1+ lambda-ES with standard step-size update-rules on a large family of fitness functions without any convexity assumption or quasi-convexity assumptions ([5, 6]). The result provides a rule for choosing lambda and shows the consistency of halting criteria based on thresholds on the step-size. The family of functions under work is defined through a conditionnumber that generalizes usual condition-numbers in a manner that only depends on level-sets. We consider that the definition of this conditionnumber is the relevant one for evolutionary algorithms; in particular, global convergence results without convexity or quasi-convexity assumptions are proved when this condition-number is finite.

PDF file

Bibtex (click on "Export this paper")

Publi 46.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 47. Inductive-deductive systems: a mathematical logic and statistical learning perspective

2007
Baskiotis, Nicolas
Sebag, Michele
Teytaud, Olivier

Abstract: Résumé : The theorems about incompleteness of arithmetic have often been cited as an argument against automatic theorem proving and expert systems. However, these theorems rely on a worst-case analysis, which might happen to be overly pessimistic with respect to real-world domain applications. For this reason, a new framework for a probabilistic analysis of logical complexity is presented in this paper. Specifically, the rate of non-decidable clauses and the convergence of a set of axioms toward the target one when the latter exists in the language are studied, by combining results from mathematical logic and from statistical learning. Two theoretical settings are considered, where learning relies respectively on Turing oracles guessing the provability of a statement from a set of statements, and computable approximations thereof. Interestingly, both settings lead to similar results regarding the convergence rate towards completeness.

PDF file

Bibtex (click on "Export this paper")

Publi 48.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 49. Anytime many-armed bandits

2007
Teytaud, Olivier
Gelly, Sylvain
Sebag, Michele

Abstract: Résumé : This paper introduces the many-armed bandit problem (ManAB), where the number of arms is large comparatively to the relevant number of time steps. While the ManAB framework is relevant to many real-world applications, the state of the art does not offer anytime algorithms handling ManAB problems. Both theory and practice suggest that two problem categories must be distinguished; the easy category includes those problems where good arms have reward probability close to 1; the difficult category includes other problems. Two algorithms termed FAILURE and MUCBT are proposed for the ManAB framework. FAILURE and its variants extend the non-anytime approach proposed for the denumerable-armed bandit and non-asymptotic bounds are shown; it works very efficiently for easy ManAB problems. Meanwhile, MUCBT efficiently deals with difficult ManAB problems.

PDF file

Bibtex (click on "Export this paper")

Publi 50.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 51. Slightly beyond Turing-Computability for studying Genetic Programming

2007
Teytaud, Olivier

Abstract: Résumé : Inspired by genetic programming (GP), we study iterative algorithms for non-computable tasks and compare them to naive models. This framework justifies many practical standard tricks from GP and also provides complexity lower-bounds which justify the computational cost of GP thanks to the use of Kolmogorov's complexity in bounded time.

PDF file

Bibtex (click on "Export this paper")

Publi 52.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 53. On the hardness of offline multiobjective optimization

2007
Teytaud, Olivier

Abstract: Résumé : It is empirically established that multiobjective evolutionary algorithms do not scale well with the number of conflicting objectives. We here show that the convergence rate of all comparison-based multiobjective algorithms, for the Hausdorff distance, is not much better than the convergence rate of the random search, unless the number of objectives is very moderate, in a framework in which the stronger assumptions are (i) that the objectives are conflicting (ii) that lower bounding the computational cost by the number of comparisons is a good model. Our conclusions are (i) the relevance of the number of conflicting objectives (ii) the relevance of criteria based on comparisons with random-search for multi-objective optimization (iii) the very-hardness of more than 3- objectives optimization (iv) some hints about cross-over operators.

PDF file

Bibtex (click on "Export this paper")

Publi 54.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 55. Non linear programming for stochastic dynamic programming

2007
Teytaud, Olivier
Gelly, Sylvain

Abstract: Résumé : Many stochastic dynamic programming tasks in continuous action-spaces are tackled through discretization. We here avoid discretization; then, approximate dynamic programming (ADP) involves (i) many learning tasks, performed here by Support Vector Machines, for Bellman-function-regression (ii) many non-linearoptimization tasks for action-selection, for which we compare many algorithms. We include discretizations of the domain as particular non-linear-programming-tools in our experiments, so that by the way we compare optimization approaches and discretization methods. We conclude that robustness is strongly required in the non-linear-optimizations in ADP, and experimental results show that (i) discretization is sometimes inefficient, but some specific discretization is very efficient for "bang-bang" problems (ii) simple evolutionary tools outperform quasi-random in a stable manner (iii) gradient-based techniques are much less stable (iv) for most high-dimensional "less unsmooth" problems Covariance-Matrix-Adaptation is first ranked.

PDF file

Bibtex (click on "Export this paper")

Publi 56.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 57. Active learning in regression, with an application to stochastic dynamic programming

2007
Teytaud, Olivier
Gelly, Sylvain
Mary, Jérémie

Abstract: Résumé : We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter for the exploration/exploitation dilemma (iii) less randomized than the full-random sampling (yet also not deterministic). Experiments are performed in the case of regression for value-function learning on a continuous domain. Our main results are (i) efficient partially derandomized point sets (ii) moderate-derandomization theorems (iii) experimental evidence of the importance of the frontier (iv) a new regression-specific user-friendly sampling tool lessrobust than blind samplers but that sometimes works very efficiently in large dimensions. All experiments can be reproduced by downloading the source code and running the provided command line.

PDF file

Bibtex (click on "Export this paper")

Publi 58.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 59. comparison-based algorithms are robust and randomized algorithms are anytime.

2007
Gelly, Sylvain
Ruette, Sylvie
Teytaud, Olivier

Abstract: Résumé : Randomized search heuristics (e.g., evolutionary algorithms, simulated annealing etc.) are very appealing to practitioners, they are easy to implement and usually provide good performance. The theoretical analysis of these algorithms usually focuses on convergence rates. This paper presents a mathematical study of randomized search heuristicswhich use comparison based selectionmechanism. The twomain results are: (i) comparison-based algorithms are the best algorithms for some robustness criteria, (ii) introducing randomness in the choice of offspring improves the anytime behavior of the algorithm. An original Estimation of Distribution Algorithm combining (i) and (ii) is proposed and successfully experimented.

PDF file

Bibtex (click on "Export this paper")

Publi 60.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 61. On the adaptation of the noise level for stochastic optimization

2007
Teytaud, Olivier
Auger, Anne

Abstract: Résumé : This paper deals with the optimization of noisy fitness functions, where the noise level can be reduced by increasing the computational effort. We theoretically investigate the question of the control of the noise level. We analyse two different schemes for an adaptive control and prove sufficient conditions ensuring the existence of an homogeneous Markov chain, which is the first step to prove linear convergence when dealing with non-noisy fitness functions. We experimentally validate the relevance of the homogeneity criterion. Large-scale experiments conclude to the efficiency in a difficult framework.

PDF file

Bibtex (click on "Export this paper")

Publi 62.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 63. Modification of UCT with Patterns in Monte-Carlo Go

2006
Gelly, Sylvain
Wang, Yizao
Munos, Rémi
Teytaud, Olivier

Abstract: Résumé : Algorithm UCB1 for multi-armed bandit problem has already been extended to Algorithm UCT (Upper bound Confidence for Tree) which works for minimax tree search. We have developed a Monte-Carlo Go program, MoGo, which is the first computer Go program using UCT. We explain our modification of UCT for Go application and also the intelligent random simulation with patterns which has improved significantly the performance of MoGo. UCT combined with pruning techniques for large Go board is discussed, as well as parallelization of UCT. MoGo is now a top level Go program on $9\times9$ and $13\times13$ Go boards.

PDF file

Bibtex (click on "Export this paper")

Publi 64.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 65. OpenDP a free Reinforcement Learning toolbox for discrete time control problems

2006
Gelly, Sylvain
Teytaud, Olivier

Abstract: Résumé : OpenDP (http://opendp.sourceforge.net) is a free software (under GPL) written in C++ for control problems: (i) with discrete time steps (either finite horizon, or approximation of infinite horizon by temporal-ring for stationnary problems), (ii) where the transition function newState = T (state, decision) can be encoded in C++, (iii) possibly stochastic (with the function T above depending on some random process).

PDF file

Bibtex (click on "Export this paper")

Publi 66.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 67. Universal Consistency and Bloat in GP

2006
Gelly, Sylvain
Teytaud, Olivier
Bredeche, Nicolas
Schoenauer, Marc

Abstract: Résumé : In this paper, we provide an analysis of Genetic Programming (GP) from the Statistical Learning Theory viewpoint in the scope of symbolic regression. Firstly, we are interested in Universal Consistency, i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity, and secondly, we focus our attention on the uncontrolled growth of program length (i.e. bloat), which is a well-known problem in GP. Results show that (1) several kinds of code bloats may be identified and that (2) Universal consistency can be obtained as well as avoiding bloat under some con- ditions. We conclude by describing an ad hoc method that makes it possible simultaneously to avoid bloat and to ensure universal consistency.

PDF file

Bibtex (click on "Export this paper")

Publi 68.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 69. Bayesian Networks: a Non-Frequentist Approach for Parametrization, and a more Accurate Structural Complexity Measure

2006
Gelly, Sylvain
Teytaud, Olivier

Abstract: Résumé : The problem of calibrating relations from examples is a classical problem in learning theory. This problem has in particular been studied in the theory of empirical processes (providing asymptotic results), and through statistical learning theory. The application of learning theory to bayesian networks is still uncomplete and we propose a contribution, especially through the use of covering numbers. We deduce multiple corollaries, among which a non-frequentist approach for parameters learning and a score taking into account a measure of structural entropy that has never been taken into account before. We then investigate the algorithmic aspects of our theoretical solution, based on BFGS and adaptive refining of gradient calculus. Empirical results show the relevance of both the statistical results and the algorithmic solution.

PDF file

Bibtex (click on "Export this paper")

Publi 70.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 71. General lower bounds for evolutionary algorithms

2006
Teytaud, Olivier
Gelly, Sylvain

Abstract: Résumé : Evolutionary optimization, among which genetic optimization, is a general framework for optimization. It is known (i) easy to use (ii) robust (iii) derivative-free (iv) unfortunately slow. Recent work [8] in particular show that the convergence rate of some widely used evolution strategies (evolutionary optimization for continuous domains) can not be faster than linear (i.e. the logarithm of the distance to the optimum can not decrease faster than linearly), and that the constant in the linear convergence (i.e. the constant C such that the distance to the optimum after n steps is upp er b ounded by C n ) unfortunately converges quickly to 1 as the dimension increases to infinity. We here show a very wide generalization of this result: al l comparison-based algorithms have such a limitation. Note that our result also concerns methods like the Hooke & Jeeves algorithm, the simplex method, or any direct search method that only compares the values to previously seen values of the fitness. But it does not cover methods that use the value of the fitness (see [5] for cases in which the fitness-values are used), even if these methods do not use gradients. The former results deal with convergence with respect to the number of comparisons performed, and also include a very wide family of algorithms with resp ect to the number of function-evaluations. However, there is still place for faster convergence rates, for more original algorithms using the full ranking information of the population and not only selections among the population. We prove that, at least in some particular cases, using the full ranking information can improve these lower bounds, and ultimately provide sup erlinear convergence results.

PDF file

Bibtex (click on "Export this paper")

Publi 72.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 73. On the ultimate convergence rates for isotropic algorithms and the best choices among various forms of isotropy

2006
Gelly, Sylvain
Mary, Jérémie
Teytaud, Olivier

Abstract: Résumé : In this paper, we show universal lower bounds for isotropic algorithms, that hold for any algorithm such that each new point is the sum of one already visited p oint plus one random isotropic direction multiplied by any step size (whenever the step size is chosen by an oracle with arbitrarily high computational power). The bound is 1 − O(1/d) for the constant in the linear convergence (i.e. the constant C such that the distance to the optimum after n steps is upp er b ounded by C n ), as already seen for some families of evolution strategies in [19, 12], in contrast with 1 − O(1) for the reverse case of a random step size and a direction chosen by an oracle with arbitrary high computational power. We then recall that isotropy does not uniquely determine the distribution of a sample on the sphere and show that the convergence rate in isotropic algorithms is improved by using stratified or antithetic isotropy instead of naive isotropy. We show at the end of the pap er that b eyond the mathematical proof, the result holds on exp eriments. We conclude that one should use antithetic-isotropy or stratified-isotropy, and never standard-isotropy.

PDF file

Bibtex (click on "Export this paper")

Publi 74.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 75. Comparison-based algorithms: worst-case optimality, optimality w.r.t a bayesian prior, the intraclass-variance minimization in EDA, and implementations with billiards

2006
Gelly, Sylvain
Ruette, Sylvie
Teytaud, Olivier

Abstract: Résumé : This paper is centered on the analysis of comparison-based algorithms. It has been shown recently that these algorithms are at most linearly convergent with a constant 1 − O(1/d); we here show that these algorithms are however optimal for robust optimization w.r.t increasing transformations of the fitness. We then turn our attention to the design of optimal comparison-based algorithms. No-Free-Lunch theorems have shown that introducing priors is necessary in order to design algorithms better than others; therefore, we include a bayesian prior in the spirit of learning theory. We show that these algorithms have a nice interpretation in terms of Estimation-Of-Distribution algorithms, and provide tools for the optimal design of generations of lambda-points by the way of billiard algorithms.

PDF file

Bibtex (click on "Export this paper")

Publi 76.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 77. Learning for stochastic dynamic programming

2006
Gelly, Sylvain
Mary, Jérémie
Teytaud, Olivier

Abstract: Résumé : We present experimental results about learning function values (i.e. Bellman values) in stochastic dynamic programming (SDP). All results come from openDP (opendp.sourceforge.net), a freely available source code, and therefore can be reproduced. The goal is an independent comparison of learning methods in the framework of SDP.

PDF file

Bibtex (click on "Export this paper")

Publi 78.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 79. Resource-Aware Parameterizations of EDA

2006
Gelly, Sylvain
Teytaud, Olivier
Cagne, Christian

Abstract: Résumé : This paper presents a framework for the theoretical analysis of Estimation of Distribution Algorithms (EDA). Using this framework, derived from the VC-theory, we propose non-asymptotic bounds which depend on: 1) the population size 2) the selection rate, 3) the families of distributions used for the modelling, 4) the dimension, and 5) the number of iterations. To validate these results, optimization algorithms are applied to a context where bounds on resources are crucial, namely Design of Experiments, that is a black-box optimization with very few fitness-values evaluations.

PDF file

Bibtex (click on "Export this paper")

Publi 80.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 81. A Statistical Learning Theory Approach of Bloat

2005-06-25
Teytaud, Olivier
Gelly, Sylvain
Bredeche, Nicolas
Schoenauer, Marc

Abstract: Résumé : Code bloat, the excessive increase of code size, is an important is- sue in Genetic Programming (GP). This paper proposes a theoreti- cal analysis of code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function lies in the search space or not. Then, important mathematical results are proved using classical results from Sta- tistical Learning. Namely, the Vapnik-Cervonenkis dimension of programs is computed, and further results from Statistical Learn- ing allow to prove that a parsimonious fitness ensures Universal Consistency (the solution minimizing the empirical error does con- verge to the best possible error when the number of samples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of samples might still result in programs of infinitely increasing size whith their accuracy; a more complicated modification of the fit- ness is proposed that theoretically avoids unnecessary bloat while nevertheless preserving the Universal Consistency.

PDF file

Bibtex (click on "Export this paper")

Publi 82.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 83. Apprentissage statistique et programmation génétique: la croissance du code est-elle inévitable ?

2005
Gelly, Sylvain
Teytaud, Olivier
Bredeche, Nicolas
Schoenauer, Marc

Abstract: Résumé : Universal Consistency, the convergence to the minimum possible error rate in learning through genetic programming (GP), and Code bloat, the excessive increase of code size, are important issues in GP. This paper proposes a theoretical analysis of universal consistency and code bloat in the framework of symbolic regression in GP, from the viewpoint of Statistical Learning Theory, a well grounded mathematical toolbox for Machine Learning. Two kinds of bloat must be distinguished in that context, depending whether the target function has finite description length or not. Then, the Vapnik-Chervonenkis dimension of programs is computed, and we prove that a parsimonious fitness ensures Universal Consistency (i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity). However, it is proved that the standard method consisting in choosing a maximal program size depending on the number of examples might still result in programs of infinitely increasing size with their accuracy; a fitness biased by parsimony pressure is proposed. This fitness avoids unnecessary bloat while nevertheless preserving the Universal Consistency.

PDF file

Bibtex (click on "Export this paper")

Publi 84.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 85. Convergence proofs, convergence rates and stopping criterions for multi-modal or multi-objective evolutionary algorithms

2005
Bonnemay, Yann
Sebag, Michele
Teytaud, Olivier

Abstract: Résumé : We provide - convergence proofs - convergence rates - a stopping criterion for multi-objective or multi-modal evolutionary algorithms.

PDF file

Bibtex (click on "Export this paper")

Publi 86.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 87. Quasi-random mutations for evolution strategies

2005
Teytaud, Olivier
Jebalia, Mohamed
Auger, Anne

Abstract: Résumé : We prove linear convergence rates for some derandomized evolution strategie.

PDF file

Bibtex (click on "Export this paper")

Publi 88.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 89. Bayesian networks : a better than frequentist approach for parametrization, and a more accurate structural complexity measure than the number of parameters

2005
Gelly, Sylvain
Teytaud, Olivier

Abstract: Résumé : We propose and justify a better-than-frequentist approach for bayesian network parametrization, and propose a structural entropy term that more precisely quantifies the complexity of a BN than the number of parameters. Algorithms for BN learning are deduced.

PDF file

Bibtex (click on "Export this paper")

Publi 90.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 91. Local and global oder 3/2 convergence of a surrogate evolutionary algorithm

2005
Auger, Anne
Schoenauer, Marc
Teytaud, Olivier

Abstract: Résumé : We prove a fast (better than superlinear) local convergence rate for an algorithm based on a quadratic approximation of the fitness function. An almost sure global convergence rate is proved for a corresponding memetic algorithm.

PDF file

Bibtex (click on "Export this paper")

Publi 92.



Abstract:

PDF file

Bibtex (click on "Export this paper")

Publi 93. Taylor-based pseudo-metrics for random process fitting in dynamic programming.

2005
Gelly, Sylvain
Mary, Jérémie
Teytaud, Olivier

Abstract: Résumé : Stochastic optimization is the research of $x$ optimizing $E\ C(x,A)$, the expectation of $C(x,A)$, wher e $A$ is a random variable. Typically $C(x,a)$ is the cost related to a strategy $x$ which faces the reali zation $a$ of the random process. Many stochastic optimization problems deal with multiple time steps, leading to computationally difficu lt problems ; efficient solutions exist, for example through Bellman's optimality principle, but only provided that the random process is represented by a well structured process, typically an inhomogeneous Markovian process (hopefully with a finite number of states) or a scenario tree. The problem is that in the general case, $A$ is far from b eing Markovian. So, we look for $A'$, "looking like $A$", but belonging to a given family $\A'$ which do es not at all contain $A$. The problem is the numerical evaluation of "$A'$ looks like $A$". A classical method is the use of the Kantorovitch-Rubinstein distance or other transportation metrics \c ite{Pflug}, justified by straightforward bounds on the deviation $|E C(x,A)-E C(x,A')|$ through the use of the Kantorovitch-Rubinstein distance and uniform lipschitz conditions. These approaches might be bett er than the use of high-level statistics \cite{Keefer}. We propose other (pseudo-)distances, based upon refined inequalities, guaranteeing a good choice of $A'$. Moreover, as in many cases, we indeed prefer t he optimization with risk management, e.g. optimization of $EC(x,noise(A))$ where $noise(.)$ is a random noise modelizing the lack of knowledge on the precise random variables, we propose distances which can deal with a user-defined noise. Tests on artificial data sets with realistic loss functions show the rel evance of the method.

PDF file

Bibtex (click on "Export this paper")

Publi 94.



Abstract:

PDF file

Bibtex (click on "Export this paper")