Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Faculty habilitation de COHEN-BOULAKIA Sarah
COHEN-BOULAKIA Sarah
Faculty habilitation
Group : Bioinformatics

Data Integration in the Life Sciences: Scientific Workflows, Provenance, and Ranking

Starts on 17/06/2015
Advisor :

Funding :
Affiliation : Université Paris-Sud
Laboratory : Université Paris-Sud

Defended on 17/06/2015, committee :
Peter Buneman, Val Tannen, Alain Viari, Christine Froidevaux, Olivier Gascuel, Ioana Manolescu, Patrick Valduriez

Research activities :

Abstract :
Biological research is a science which derives its findings from the proper analysis of experiments. Today, a large variety of experiments are carried-out in hundreds of labs around the world, and their results are reported in a myriad of different databases, web-sites, publications etc., using different formats, conventions, and schemas. Providing a uniform access to these diverse and distributed databases is the aim of data integration solutions, which have been designed and implemented within the bioinformatics community for more than 20 years. However, the perception of the problem of data integration research in the life sciences has changed: While early approaches concentrated on handling schema-dependent queries over heterogeneous and distributed databases, current research emphasizes instances rather than schemas, tries to place the human back into the loop, and intertwines data integration and data analysis. Transparency -- providing users with the illusion that they are using a centralized database and thus completely hiding the original databases -- was one of the main goals of federated databases. It is not a target anymore. Instead, users want to know exactly which data from which source was used in which way in studies (Provenance). The old model of "first integrate, then analyze" is replaced by a new, process-oriented paradigm: "integration is analysis - and analysis is integration". This paradigm change gives rise to some important research trends. First, the process of integration itself, i.e., the integration workflow, is becoming a research topic in its own. Scientific workflows actually implement the paradigm "integration is analysis". A second trend is the growing importance of sensible ranking, because data sets grow and grow and it becomes increasingly difficult for the biologist user to distinguish relevant data from large and noisy data sets. This HDR thesis outlines my contributions to the field of data integration in the life sciences. More precisely, my work takes place in the first two contexts mentioned above, namely, scientific workflows and biological data ranking. The reported results were obtained from 2005 to late 2014, first as a postdoctoral fellow at the Uniersity of Pennsylvania (Dec 2005 to Aug 2007) and then as an Associate Professor at Université Paris-Sud (LRI, UMR CNRS 8623, Bioinformactics team) and Inria (Saclay-Ile-de-France, AMIB team 2009-2014).

More information: https://hal.archives-ouvertes.fr/tel-01245229
Ph.D. dissertations & Faculty habilitations
DECODING THE PLATFORM SOCIETY: ORGANIZATIONS, MARKETS AND NETWORKS IN THE DIGITAL ECONOMY
The original manuscript conceptualizes the recent rise of digital platforms along three main dimensions: their nature of coordination devices fueled by data, the ensuing transformations of labor, and the accompanying promises of societal innovation. The overall ambition is to unpack the coordination role of the platform and where it stands in the horizon of the classical firm – market duality. It is also to precisely understand how it uses data to do so, where it drives labor, and how it accommodates socially innovative projects. I extend this analysis to show continuity between today’s society dominated by platforms and the “organizational society”, claiming that platforms are organized structures that distribute resources, produce asymmetries of wealth and power, and push social innovation to the periphery of the system. I discuss the policy implications of these tendencies and propose avenues for follow-up research.

DISTRIBUTED COMPUTING WITH LIMITED RESOURCES


VALORISATION DES DONNéES POUR LA RECHERCHE D'EMPLO