Ph.D

Group : Bioinformatics

*Algorithmique de l'alignement structure-séquence d'ARN: une approche générale et paramétr*
Starts on 01/10/2009

Advisor : DENISE, Alain

**Funding :** CDD sur contrat UPS

**Affiliation :** Université Paris-Sud

**Laboratory :** LRI BIO-INFO

**Defended** on 05/12/2012, committee :

Rapporteurs:

Jean-Claude König, Professeur,Université de Montpellier, France

Stéphane Vialette, Directeur de recherche, Université de Marne-la-vallée, France

Examinateurs:

Yannis Manoussakis, Professeur, Université Paris-Sud, France

Alessandra Carbone, Professeur, Université Pierre et Marie Curie, France

Directeurs de thèse:

Alain Denise, Professeur, Université Paris-Sud, France

Dominique Barth, Professeur, Université Versailles-Saint-Quentin

**Research activities :**
**Abstract :**
Non-coding RNA macromolecules are involved in the metabolism of all living beings. From the computational point of view, their two major biological problems are: the prediction of their structure to better understand their functions and their detection in databases or genomes. The RNA structure-sequence alignment addresses these two issues. The RNA structure-sequence aligment is to align a known structure of a first RNA with the sequence of a second RNA. The structure is represented as an arc-annotated sequence and the sequence represents the RNA nucleotide sequence. To solve this problem, we want to optimize the alignment according to a cost function. So this is an optimization problem, which is NP-hard. Accordingly, different works define several reduced structure classes for which they propose specific algorithms but with polynomial complexity.

The presented work unifies and generalizes all previous approaches by building a unique non-specific class algorithm with parametrized complexity. This algorithm uses a technique from graph theory: the decomposition tree, that is to say, it transforms the given structure into a tree-decomposition and then I will explain how to align this decomposition with the sequence. I will then highlight why the implementation of this approach requires a reformulation of the problem as well as a substantial modification to the conventional use of dynamic programming for tree decompositions. This leads to a parameterized algorithm whose parameter is entirely related to the tree-decomposition.

The construction of tree decomposition for which alignment is the most effective is unfortunately also a NP-hard problem. However, I will outline a heuristic decompositions construction adapted to RNA structures and show that the complexity of the approach (solving the problem in its generality) equals or outperforms all previous approaches in their respective structure classes. I will finish by presenting new structure classes which extend existing ones without degrating the complexity of the alignment but which can represent the majority of known structures containing many important elements not previously taken into account (such as RNA tertiary motifs).