LRI : Welcome
Laboratoire de Recherche en Informatique
Sarah Cohen-Boulakia

Personal information
e-mail cohen @ lri . fr
web page http://www.lri.fr/~cohen
position Assistant professor, Ph.D. in Computer Science
phone +(33) 1 69 15 32 16
fax +(33) 1 69 15 65 86
office145a
adressLRI, Bâtiment 490
Université Paris-Sud
91405 Orsay cedex

News

Interests

  • Integrating and querying biological & biomedical databases
  • Provenance in scientific workflows
  • Semantic Web, metadata
  • Workflow/dataflow design, software engineering, user requirements
Since graduate school, I've been working in the BioGuide project. Since my postdoctoral fellow, I've been working in the SHARQ project. I am currently working on the problematic of provenance in scientific workflows.

During my PhD, I worked in several fields, see HKIS, Cooperative mediator, Design & development of databases.

Research Experience

Current Projects

  • Provenance for scientific workflow systems
    This project aims to provide a formal model of provenance for scientific workflows which is both simple and general (i.e. can be used with existing workflow systems, such as Ptolemy/Kepler and myGrid) and sufficiently expressive to answer the provenance queries encountered in case studies. Interestingly, the proposed model not only takes into account the chained and complex structures of scientific workflows, but also allows for reasoning about provenance at different levels of abstraction through user views.
    In the context of this project, I have participated with other members of the UPenn Db group, to the "Provenance Challenge". More information can be found here.



  • BioGuide is a collaborative work between the LRI bioinformatics group at Paris-South University and the database group at UPenn. Have a look at bioguide-project.net!
    BioGuide extends DSS (see below) to be adapted to many user profiles. I have elaborated a questionnaire and performed interviews of 20 scientists from various domains (cancer study, annotation project, ...) to evaluate their needs in the process of querying. In collaboration with C. Froidevaux and S. Davidson, I have designed BioGuide, a generic framework to guide the users to select the relevant sources to be queried and the tools to be used according to their preferences (e.g., the reliability level of the sources) and following their querying strategies. The biological significance of the results obtained with BioGuide has been shown in the context of Comparative Genomic Hybridization (CGH) analysis performed at the Curie Institute. I have developed the BioGuide system in JAVA (applet) with the help of Olivier Biton. BioGuide is available for use. The system is very flexible and can be adapted to any biological domain. I have recently developped a module to use BioGuide on top of the SRS system. BioGuideSRS provides acess to instances of data!
    Have a look at BioGuide-project.net!

  • SHARQ (Sharing Heterogenous and Autonomous Resources and Queries) aims to develop generic tools and technologies for creating and maintaining confederations whose purpose is distributed data sharing that is, data cooperatives. SHARQ is a collaborative work with two biological partners: the Computational Biology and Informatics Laboratory, leaded by Chris Stoeckert, and the Pew project group leaded by Pete White from the Children Hospital of Philadelphia. We propose to develop a specific data cooperative as a biological testbed for evaluating the proposed technologies.
    In this project, I am working on the SHARQ Guide which is therefore being designed to enable biologists to find relevant information within a peer data management system. It provides assistance not only for users who ask queries, but also for owners of peers who wish to be registered within the Guide. This work is closely related to my work on BioGuide (see below). More information is available here.

    Past Projects



  • HKIS

    HKIS is a European research & development project between five partners: ISoft Company (Gif-Sur-Yvette, France), Curie Institute (Paris, France), University of Medecine of Ulm (Ulm, Germany), European Institute of Oncology (Milano, Italy) and University of Paris-South (Orsay, France). HKIS was a central component of my dissertation. This project aims at developing an integrative software platform for biological and biomedical data processing in oncology. I have contributed to HKIS as follows:
    • I have participated in the collection of the user requirements (sources and tools accessed, bioinformatics tasks performed by the partners). I have designed a framework to represent the HKIS analysis scenarios. Each scenario reflects the way a HKIS users manage their data.
    • I have developed the integrating schema of the HKIS platform by making explicit the biological entities contained in the sources selected by the partners and by capturing important metadata from sources (e.g. reliability of an entity in a source).
    • I have designed the DSS (Data Source Selection) algorithm which aims at guiding the HKIS users in the task of selecting data sources. DSS provides the user with alternative ways of finding data in the sources: it allows the user to exploit complementary information and is a guide to deal with divergent data. This algorithm has been developed following the HKIS users process of querying.
    More about HKIS...

  • Cooperative Mediator

    Collaborative work between members of the LRI Bioinformatics group. This data integration project is based on the the Picsel project, an innovative mediator about to be an industrial product. In Picsel, the language used to express queries and describe the sources (a description logic) is very simple and can be easily understood by end-users. Our aim is to exploit the capabilities of Picsel in the context of biological data and to propose an extended mediator system allowing both transparent querying (as usual) and cooperative answering process (which meets specific biologist requirements). We address the problem of expressing and answering cooperative queries, keeping a tractable logical framework. We provide the users with the possibility of specifying properties on the sources (metadata) they would like to access and our proposal enables to trace the origins of the answers got.

  • Design and Development of Biological Databases

    Collaborative work with biologists from the Institute of Genetics and Microbiology, Orsay (UMR 8621, CNRS).
    • Development of WInGS, a local data warehouse dedicated to yeast.
    • Development of Genopage, a database of proteins modules encoded by completely sequenced genomes.
    Development under PostgreSQL, ProC, PHP.

Selected Publications

* indicates that I gave the corresponding conference presentation.
    Book Chapter

  • Sarah Cohen-Boulakia and Wang-Chiew Tan.
    Provenance in scientific databases.
    Encyclopedia of Database Systems. (Invited entry) Springer, M. T. Özsu and L. Liu, editors.

    Proceedings Editor

  • Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux.
    Review of the selected proceedings of the Fifth International Workshop on Data Integration in the Life Sciences.
    BMC Bioinformatics 2008, 9(Suppl 8). Available here.

  • Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux.
    Proceedings of DILS 2008, 5th International Workshop in Data Integration in the Life Sciences
    Lecture Notes in Computer Science, Vol. 5109 Springer-Verlag, Evry, France, June 25-27, 2008.

  • Sarah Cohen-Boulakia, Val Tannen.
    Proceedings of DILS 2007, 4th International Workshop in Data Integration in the Life Sciences
    Lecture Notes in Computer Science, Vol. 4544 Springer-Verlag, Philadelphia, PA, USA, June 27-29, 2007.


    International peer-reviewed journals

  • [1] Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Susan B. Davidson.
    Addressing the Provenance Challenge using ZOOM.
    In Concurrency and Computation: Practice and Experience, Wiley InterScience, 2008.

  • [2] Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Susan B. Davidson.
    Addressing the Provenance Challenge using ZOOM.
    In Concurrency and Computation: Practice and Experience, 20(5):497-506, Wiley InterScience, 2008.

  • [3] (Two organizors of the challenge followed by all the participants in alphabetical order)
    Luc Moreau, Bertram Ludäscher, Ilkay Altintas, Roger S. Barga, Shawn Bowers, Steven P. Callahan, George Chin Jr., Ben Clifford, Shirley Cohen, Sarah Cohen-Boulakia, Susan B. Davidson, Ewa Deelman, Luciano A. Digiampietri, Ian T. Foster, Juliana Freire, James Frew, Joe Futrelle, Tara Gibson, Yolanda Gil, Carole A. Goble, Jennifer Golbeck, Paul T. Groth, David A. Holland, Sheng Jiang, Jihie Kim, David Koop, Ales Krenek, Timothy M. McPhillips, Gaurang Mehta, Simon Miles, Dominic Metzger, Steve Munroe, Jim Myers, Beth Plale, Norbert Podhorszki, Varun Ratnakar, Emanuele Santos, Carlos Eduardo Scheidegger, Karen Schuchardt, Margo I. Seltzer, Yogesh L. Simmhan, Cláudio T. Silva, Peter Slaughter, Eric G. Stephan, Robert Stevens, Daniele Turi, Huy T. Vo, Michael Wilde, Jun Zhao, Yong Zhao.
    Special Issue: The First Provenance Challenge.
    In Concurrency and Computation: Practice and Experience20(5): 409-418, Wiley InterScience, 2008.

  • [4] Susan B. Davidson, Sarah Cohen-Boulakia, Anat Eyal, Bertram Ludaescher, Timothy M. McPhillips, Shawn Bowers, Manish Kumar Anand, Juliana Freire.
    Provenance in Scientific Workflow Systems.
    In IEEE Data Eng. Bull. , 30(4): 44-50 (2007).

  • [5] Sarah Cohen-Boulakia, Susan Davidson, Christine Froidevaux, Zoe Lacroix, and Maria-Esther Vidal.
    Path-based systems to guide scientists in the maze of biological data sources.
    In Journal of Bioinformatics and Computational Biology (JBCB), Oct. 2006, 4(5), pp. 1069-95.

  • [6] Frederique Lisacek, Sarah Cohen-Boulakia, and Ron D. Appel.
    Proteome informatics II. Bioinformatics for comparative proteomics.
    In Proteomics, 2006.

  • [7] Sarah Cohen-Boulakia, Séverine Lair, Nicolas Stransky, Stéphane Graziani, François Radvanyi, Emmanuel Barillot and Christine Froidevaux.
    Selecting biomedical data sources according to user preferences.*
    In Bioinformatics, 20(1):i86-i93, Special number, Proceedings of ISMB/ECCB 2004, Glasgow, UK, 2004.

    International peer-reviewed conferences

  • [8] Zhuowei Bao, Sarah Cohen Boulakia, Susan Davidson, Anat Eyal, Sanjeev Khanna
    Differencing Provenance in Scientific Workflows.
    Proceedings of ICDE 2009 (to appear), 2009.


  • [9] Sarah Cohen Boulakia, Kevin Massini
    BioBrowsing: Making the most of the data available in Entrez.
    Proceedings of SSDBM 2009 (to appear), 2009.


  • [10] Olivier Biton, Sarah Cohen Boulakia, Susan B. Davidson, Carmem Hara
    Querying and Managing Provenance through User Views in Scientific Workflows.
    Proceedings of ICDE 2008 (to appear), 2008.


  • [11] Olivier Biton, Sarah Cohen Boulakia, Susan B. Davidson, Carmem Hara
    Zoom*UserViews: Querying Relevant Provenance in Workflow Systems.
    Proceedings of VLDB 2007, pp. 1366-1369, Vienna, Austria.


  • [12] Shirley Cohen, Sarah Cohen-Boulakia and Susan Davidson.
    Towards a Model of Scientific workflows and User Views.*
    Proceedings of DILS'06, Data Integration for the Life Sciences, Springer-Verlag, Lecture Notes in Bioinformatics (LNBI), Cambridge, UK, 2006.


  • [13] Sarah Cohen-Boulakia, Christine Froidevaux and Emmanuel Pietriga.
    Selecting Biological Data Sources and Tools with XPR, a Path Language for RDF.
    Proceedings of PSB'06, Pacific Symposium on Biocomputing, 2006.
    BioGuide Site.

  • [14] Sarah Cohen-Boulakia, Susan Davidson and Christine Froidevaux.
    A User-centric Framework for Accessing Biological Sources and Tools.*
    Proceedings of DILS'05, Data Integration for the Life Sciences, Springer-Verlag, Lecture Notes in Bioinformatics (LNBI), Num. 3615, pp. 3-18, San Diego, USA, 2005.
    BioGuide Site.

  • [15] Alain Bidault, Sarah Cohen-Boulakia and Christine Froidevaux.
    Preferences for Queries in a Mediator Approach.
    In Proceedings of ECAI'2004, European Conference on Artificial Intelligence, pp. 963-964.

  • [16] Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Zachary Ives, Val Tannen and Susan Davidson.
    SHARQ Guide: Finding relevant biological data and queries in a peer data management system.*
    Poster (Selected for oral presentation), DILS'06, Data Integration for the Life Sciences, Cambridge, UK, 2006.

    National peer-reviewed conferences

  • [17] Sarah Cohen-Boulakia, Christine Froidevaux and Severine Lair.
    Interrogation de sources biomédicales : gestion des préférences de l'utilisateur.* (In French)
    In Proceedings of EGC'2004, Extraction et Gestion des Connaissances, pp. 53-64.

  • [18] David Abergel, Sarah Cohen-Boulakia, Frédéric Lemoine, Christine Froidevaux and Michel Termier.
    WInGS: A reliability controlled data warehouse for yeast.
    In Proceedings of JOBIM'2004, Journées Ouvertes, Biologie, Informatique et Mathématiques (CD-ROM).

  • [19] Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Waller and Bernard Labedan.
    Genopage: a Database of all proteins modules encoded by completely sequenced genomes.*
    In Proceedings of JOBIM'2002, Journées Ouvertes, Biologie, Informatique et Mathématiques, pp. 187-191.

    National peer-reviewed workshops
  • Sarah Cohen-Boulakia.
    BioGuide: Choisir les sources et les outils adaptés aux préférences et aux stratégies des utilisateurs.* (In French)
    In Proceedings of Ontologie, Grille et intégration Sémantique pour la Biologie OGSB '05, Lyon, France., 2005.

  • Christine Froidevaux et Sarah Cohen-Boulakia.
    Intégration de sources de données génomiques du Web. (In French)
    In Proceedings of Journées Scientifiques du Web Sémantique (On-line Proc.), October 11-12th, 2002.

    Research Reports

  • UPS-LRI : Christine Froidevaux and Sarah Cohen-Boulakia with contribution from all the HKIS-partners.
    Deliverable D2.1: List of Data Sources and Data Model, 2003.

Talks (main)


Complete list of Talks
  • Provenance in Scientific Workflows: ZOOM with user views. Invited talk, University of Maryland, USA. (December 12th, 2006)

  • Modeling Provenance through User views. Provenance Challenge, Washington DC, USA. (September 13th, 2006)

  • Querying multiple biological sources with BioGuideSRS. Invited talk, European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton Cambridge, UK. (July 19th, 2006)

  • A user-centric approach to query alternative biological data sources: New features of BioGuideSRS. Bioinformatics group lunch meeting, CBIL, Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, USA. (May 31st, 2006)

  • BioGuide: Supporting the scientist during the selection of sources and tools. Invited talk, CBIL, Computational Biology and Informatics Laboratory, University of Pennsylvania, Philadelphia, USA. (February 2nd, 2006)

  • BioGuide: Supporting the scientist during the selection of sources and tools. Invited talk, Group of Oncology, Pediatrics, Children Hospital of Philadelphia, USA. (January 30th, 2006)

  • Supporting the scientist during the selection of sources and tools. Penn Database Research Group, Penn University, Philadelphia, USA. (January 19th, 2006)

  • BioGuide: A User-centric approach for querying biological data. Invited talk, Computer Science and IT with/for Biology, An interdisciplinary Seminar Series, Bolzano, Italy (November 30th, 2005)


Service

  • Workshop co-organizer

    • DILS'08, the 5th Annual International workshop on Data Integration in the Life Sciences, University of Evry, France 2008.
      • Program co-Chair (with Amos Bairoch, SwissProt and Christine Froidevaux, LRI)


    • DILS'07, the 4th Annual International workshop on Data Integration in the Life Sciences, University of Pennsylvania, USA, 2007.
      • Proceedings co-editor (with Val Tannen), LNBI, Springer
      • Publicity Chair
      • Web Master


    • II 2007 (Workshop in Information Integration), University of Pennsylvania, USA.
      • Local organizer
      • Editor of the Proceedings (printed locally)


  • Reviewer for the following conferences and journals

  • University of Paris-Sud 11, Computer Science Department

    • Dean of the Bioinformatics and Biostatistics (BIBS) undergraduate studies.

Participation in working groups

  • Provenance Challenges, an international workshop which bring together researchers and indistrials interested in provenance for workflow systems.
  • DB/IR day, an american workshop which bring together database and information retrieval researchers and students from academic and research institutions across the tristate area and beyond.
  • ISIBio, a french interdisciplinary working group interested in various aspects of "Information Systems Integration in Biology". This group brings together researchers from seven computer science laboratories and from ten biological laboratories (2004-2006).
  • AS127, the national CNRS Working group on integration and interoperability of genomic data sources. (2003-2004).
  • PPF, multidisciplinary program "Programme PluriFormation" on Bioinformatics and Genomics. This PPF brings together the bioinformatics groups from three biological laboratories, two computer science laboratories and from the laboratory of mathematics at Orsay campus.

Teaching

University of Paris-Sud 11
I taught at IFIPS (students in computer science) and at BIBS (students in bioinformatics).
  • Faculty since Sept. 2007
    • Software Engineering: UML, JAVA (graduate)
    • Database Management Systems (junior, senior)

  • Temporary Faculty (Attaché Temporaire d'Enseignement et de Recherche), Sept. 2005 - Dec. 2005
    • Software Engineering: UML, JAVA (graduate), 2005
    • Database Management Systems (graduate), 2005

  • Teaching assistant, 2002-2005
    • Software Engineering: UML/OCL specifications (graduate), 2004-2005
    • Database Management Systems (graduate), 2003-2005
    • Algorithms (undergraduate), 2003-2004
    • ADA Programming (undergraduate), 2002-2003