Research Interests

  • Integration of Big and heterogeneous biological and biomedical data
  • Provenance in scientific workflows
  • Querying and ranking biological & biomedical data
  • Workflow/dataflow design, software engineering, user requirements
  • Semantic Web, metadata, quality

Current Research Projects on Ranking Bio data

QualiBioConsensus (present, Mastodons call, since April 2016)

  • Topic: Ranking biological data using consensus ranking techniques.
  • Partners: Laboratoire d'Informatique de Marnes la vallée, IFB (Institut Francais de Bioinformatique, Gif-sur-Yvette)
  • Collaborators: APHP Paul-Brousse, APHP George Pompidou
  • Main Web page: QualiBioConsensus
  • Tools involved: ConQur-Bio [39], Rank-and-ties
  • Master or PhD student involved: Pierre Andrieu 
  • Previous related projects or past PhDs involved: RankaBio (PEPS Fascido call, 2015), PhD of Bryan Brancotte [42], [43]

Current Research Projects on Scientific Workflows

Analysing Plant phenotyping data with scientific workflows
  • Topic: Designing and executing scientific workflows to analyse highly complex and big plants datasets.
  • Collaborators: Inria VirtualPlants (in particular, Christophe Pradal) and Zenith (in patricular Patrick Valduriez) groups at the Institute of Computational Biology, Montpellier; INRA Montpellier (in particular, Pascal Neuveu).
  • Tools involved: OpenAlea (developped by Christophe Pradal et al.) [45], InfraPhenoGrid (developped by Christophe Pradal et al.) [47]
  • Previous related projects: Junior project IBC grant 

Reproducibility and reuse of experiments in the life sciences using scientific workflows 
  • Topic: studying the ability of current scientific workflow systems to enable various levels of reproducibility and reuse; sharing experience on using scientific workflows and companion tools to enhance reproducibility [48]
  • Groups involved: this work is conducted within two working groups, Bigdata@IFB and ReprovirtuFlow@GRD MaDICS, involving a high number of laboratories in computer science (e.g., LRI, LORIA, LAMSADE, Inria groups) and in Biology/Medecine (Institut Pasteur, CHU of Nantes, INRA, Inserm...).
  • Web pages of the working groups: ReprovirtuFlow@GDR MaDICS, Bigdata@IFB
  • Previous related projects: similarity serach in scientific workflows [38], [41], [46].

Next Generation 

  • Topic: With 50,000 data analysis per month and more than 1,500 citations (google scholar), the phylogenetic analysis pipeline is one of the most visible French IT resources both at the national and international levels. is now used for teaching, inducing possibly hundreds of users at the same time, or employ it in batch mode leading to the submission of large amount of requests to the same server. In this project, we thus plan to increase the robustness of The originality of the new version of lies in considering a scientific workflow environment (Galaxy) coupled with a web interface allowing visualization and interaction with phylogenetic objects. More precisely, this project will provide (i) a large set of phylogenetic analysis bricks and for each brick, access to diverse programs, all encapsulated into Galaxy thus making the system able to deal with large groups of users and/or large sets of data, (ii) a set of optimized, robust and expressive workflows extending the basic phylogenetic workflow to various and rich contexts of phylogenetic analyses, (iii) an easy-to-install environment equipped with a new visualization layer, on top of the Galaxy system, and dedicated to phylogenetic analyses.
  • Groups involved: Institut Pasteur, LIRMMLRI, IGS
  • Previous related projects: DistillFlow, refactoring scientific workflows [40], [35], [37].

Current Collaborations

With groups from computer science labs (all groups are interested in Computational biology)

  • Since 2014 : Inria groups Virtual Plants and Zenith @ Institute of Computational Biology (IBC), Montpellier
  • Since 2012 : University of Manchester and Newcastle
  • Since 2010 : University of Montreal
  • Since 2010 : University of Berlin (Humboldt)
  • Since 2005 : University of Pennsylvania 

With groups from Bioinformatics lab and Hospitals

  • Since 2016 : Hospital Paul-Brousse, APHP
  • Since 2015 : Institut Pasteur (C3BI) Paris
  • Since 2014 : Hospital Georges Pompidou, APHP
  • Since 2002 : Institut Curie (Bioinformatics group and molecular ontology group)

Supervision of Students

I currently co-supervise Karimas Rafès with Serge Abiteboul. Karima works on designing platforms for datasets and query sharing.
In France, I have co-supervised the following PhD students
  • Bryan Brancotte who worked on Consensus ranking techniques to rank Biological data (co-supervised with Alain Denise) and defended on September 25th 2015 (now research engineer at the French Institute of Bioinformatics).
  • Jiuqiang Chen defended on Oct. 2013. His thesis was about Designing scientific workflows following a structure and provenance-aware strategy (co-supervised with Christine Froidevaux).

I have participated to the supervision of the following PhD students: J. Starlinger (with Ulf Leser, Humboldt, Berlin), Z. Bao (with Susan Davidson, UPenn)

I have (co-)supervised the following master students in the past years:
  • Stéphanie Kamgnia, on "Frequent patterns in scientific workflows"
  • With Patrick Valduriez : Moussa Yattara, on "Provenance models in scientific workflow systems"
  • With Christine Froidevaux : Jun Li on "Series-parallel graphs for scientific workflows", Nicolas Laignel (co-supervised with Ulf Leser) on "Ranking biological data with PageRank-like solutions", Wael Hamdam [Research, Computer Science, 2nd year] on "Classifying provenance queries"
  • With Alain Denise: Bryan Brancotte, Pierre Andrieu on "Ranking biological data using consensus ranking approaches." 
  • With Susan Davidson: Weijia Wang, Pierrick Girard [Engineering school, Polytech Paris Sud] on "Comparing scientific workflow executions"
  • Heloïse Bourlet [Bioinformatics program, 1st year] on "Designing workflows with Taverna"
  • Kevin Massini on "Designing BioBrowsing" [Bioinformatics program, 1st year]

Invited Talks

International context

  • November 2016 : Instituto de Matematicas de la UNAM, Meeting on Data Analysis, Mexique
  • November 2016 : Bayer Crop Sciences, Ghent, Belgium
  • Decembre 2014 : Universitat zu Berlin, Germany
  • April 2014 : University of Pennsylvania, USA
  • February 2012 : Dagsthul seminar "Principles of Provenance", Germany
  • July 2012 : Université de Manchester, UK
  • July 2009 : Children’s Hospital of Philadelphia, USA
  • July 2007 : Penn Computation Biology Institute, USA
  • November 2006 : University of Maryland, USA
  • December 2005 : University of Bolzen-Bolzano, Italy

National context

  • January 2017: Annual Scientific meeting de l’Institut de Biologie Computationnelle, Montpellier
  • December 2016 : International Workshop DaQuata, Lyon
  • May 2016 : Laboratoire d’Informatique de Grenoble, LIG, séminaire du laboratoire
  • June 2015 : Ecole d’été Cumulo Numbio, Aussois
  • May 2015 : Ecole de printemps DigiCosme DataSense, ENSTA ParisTech, Palaiseau
  • February 2015 : Institut Pasteur, kick-off C3BI
  • February 2015, Laboratoire d’Informatique Gaspard-Monge, LIGM, séminaire AlgoMarnes la Vallée
  • February 2014 : Institut de Biologie Computationelle (IBC), Montpellier, séminaire du aboratoire
  • May 2013 : Séminaire de clôture de l’ANR DAG, Besse
  • Mai 2012 : Ecole d’été thématique BDA "Masses de données", Aussois
  • Juin 2012 : Neurospin (CEA, INRIA), Saclay
  • Juillet 2012 : Internet memory, Saclay 
  • Novembre 2010 : Institut de génétique et de microbiologie, IGM, Orsay 
  • Mai 2010 : Plan Pluri-Formation Lille, Journée Scientifique "Intégration de données hétérogènes" (H. Touzet, M. Paupin), Institut Pasteur de Lille
  • Juillet 2010 : Bilan des un an de l’ANR BioWic (D. Lavenier), Perpignan
  • Décembre 2007 : Cédric (CNAM), séminaire de l'équipe Base de données
  • Octobre 2007 : Business Object (SAP), Puteaux