Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Group : Bioinformatics

Passage à l'échelle, propriétés et qualité des algorithmes de classements consensuels pour les données biologiques massives

Starts on 01/10/2017
Advisor : COHEN-BOULAKIA, Sarah

Funding : Contrat doctoral uniquement recherche
Affiliation : Université Paris-Saclay
Laboratory : LRI - BioInfo

Defended on 14/06/2021, committee :
Directrice de thèse :
- Sarah Cohen-Boulakia Professeure, LISN, Université Paris-Saclay

Rapporteurs et examinateurs :
- Guillaume Fertin, Professeur, LS2N (Laboratoire des Sciences du Numérique de Nantes), Université de Nantes
- Sylvie Hamel, Professeure, Département d’informatique et de recherche opérationnelle, Université de Montréal, Canada

Examinateurs :
- Mokrane Bouzeghoub, Professeur, DAVID, UVSQ, Université Paris-Saclay
- Miguel Couceiro, Professeur, LORIA (Laboratoire lorrain de Recherche en Informatique et ses Applications), Université de Lorraine
- Gaëlle Lelandais, Professeure, I2BC (Institut de biologie intégrative de la cellule), Université Paris-Saclay
- Stéphane Vialette, Directeur de Recherche CNRS, LIGM (Laboratoire d’Informatique Gaspard-Monge), Université Gustave Eiffel

Co-encadrant, jury invité :
- Alain Denise, Professeur, LISN, Université Paris-Saclay

Jury invité :
- Adeline Pierrot, Maître de Conférences, LISN, Université Paris-Saclay

Research activities :

Abstract :
Biologists and physicians regularly query public biological databases, for example when they are looking for the most associated genes towards a given disease. The chosen keyword are particularly important: synonymous reformulations of the same disease (for example "breast cancer" and "breast carcinoma") may lead to very different rankings of (thousands of) genes. The genes, sorted by relevance, can be tied (equal importance towards the disease). Additionally, some genes returned when using a first synonym may be absent when using another synonym. The rankings are then called "incomplete rankings with ties". The challenge is to combine the information provided by these different rankings of genes. The problem of taking as input a list of rankings and returning as output a so-called consensus ranking, as close as possible to the input rankings, is called the "rank aggregation problem". This problem is known to be NP-hard. Whereas most works focus on complete rankings without ties, we considered incomplete rankings with ties. Our contributions are divided into three parts. First, we have designed a graph-based heuristic able to divide the initial problem into independent sub-problems in the context of incomplete rankings with ties. Second, we have designed an algorithm able to identify common points between all the optimal consensus rankings, allowing to provide information about the robustness of the provided consensus ranking. An experimental study on a huge number of massive biological datasets has highlighted the biological relevance of these approaches. Our last contribution the following one : we have designed a parameterized model able to consider various interpretations of missing data. We also designed several algorithms for this model and did an axiomatic study of this model, based on social choice theory.

Ph.D. dissertations & Faculty habilitations


The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.