LRI : Welcome
Laboratoire de Recherche en Informatique
Sarah Cohen-Boulakia

Personal information
e-mail cohen @ lri . fr
web page http://www.lri.fr/~cohen
position Assistant professor, Ph.D. in Computer Science
phone +(33) 1 69 15 32 16
fax +(33) 1 69 15 65 86
office145a
adressLRI, Bâtiment 490
Université Paris-Sud
91405 Orsay cedex

News

  • PhD Opportunity (for student with disabilities only)
    Towards an evaluation of the quality of predicted functional annotations
    • Type of contract: CNRS, please see this page for more information
    • Graduate school: 427 - Ecole Doctorale INFORMATIQUE DE PARIS SUD
    • Place : Laboratoire de Recherche en Informatique (LRI), Orsay
    • Abstract: The aim of this work is to develop automatic methods to find protein functions in microbial genomes. Functional annotation scenarios will have to be collected, formalized into workflows and the results obtained in different workflow executions should be compared. Supervised learning methods for predicting annotations will be proposed while an estimation of the quality of annotations obtained will be produced. This work will be done in close collaboration with biologists from IGM (Orsay).
    • PhD advisors: Ch. Froidevaux and S. Cohen-Boulakia (LRI, CNRS UMR 8623, University Paris Sud)

  • I won the ICDE Outstanding Reviewer Award!

Interests

  • Integrating and querying biological & biomedical databases
  • Provenance in scientific workflows
  • Semantic Web, metadata
  • Workflow/dataflow design, software engineering, user requirements
Since graduate school, I've been working in the BioGuide project. Since my postdoctoral fellow, I've been working in the SHARQ project. I am currently working on the problematic of provenance in scientific workflows.

Research Experience

  • Provenance in scientific workflow systems
    This work is done in collaboration with members from the Database group at the University of Pennsylvania: S. Davidson, Z. Bao, S. Khanna and S. Roy.
    I am interested in the problem of storing and querying provenance in scientific workflows. In particular, I have worked on techniques to reduce the huge amount of data provenance to help users focus on relevant information (ZOOM*UserView project). We have demonstrated the benefit of using our techniques by participating in the "Provenance Challenge". More information can be found here. Another direction of research is on differencing workflow runs (PDiffview Project). Very recently, I have been involved in a project on secure views for scientific workflows.

  • GeneValorization
    Collaborative work between LRI and the Institut Curie. GeneValorization is a tool which gives a clear and concise overview of the publications existing on a list of genes. We have used GeneValorization to study several lists of genes involved in Cancer. Please have a look at the GeneValorization Web site!

  • BioGuide
    Collaborative work between the LRI bioinformatics group at Paris-Sud University and the database group at UPenn. Have a look at bioguide-project.net!
    BioGuide extends DSS (see below) to be adapted to many user profiles. I have elaborated a questionnaire and performed interviews of 20 scientists from various domains (cancer study, annotation project, ...) to evaluate their needs in the process of querying. In collaboration with C. Froidevaux and S. Davidson, I have designed BioGuide, a generic framework to guide the users to select the relevant sources to be queried and the tools to be used according to their preferences (e.g., the reliability level of the sources) and following their querying strategies. The biological significance of the results obtained with BioGuide has been shown in the context of Comparative Genomic Hybridization (CGH) analysis performed at the Curie Institute. I have developed the BioGuide system in JAVA (applet) with the help of Olivier Biton. BioGuide is available for use. The system is very flexible and can be adapted to any biological domain. I have recently developped a module to use BioGuide on top of the SRS system. BioGuideSRS provides acess to instances of data!
    Have a look at BioGuide-project.net!

  • SHARQ (Sharing Heterogenous and Autonomous Resources and Queries) aims to develop generic tools and technologies for creating and maintaining confederations whose purpose is distributed data sharing that is, data cooperatives. SHARQ is a collaborative work with two biological partners: the Computational Biology and Informatics Laboratory, leaded by Chris Stoeckert, and the Pew project group leaded by Pete White from the Children Hospital of Philadelphia. We propose to develop a specific data cooperative as a biological testbed for evaluating the proposed technologies.
    In this project, I am working on the SHARQ Guide which is therefore being designed to enable biologists to find relevant information within a peer data management system. It provides assistance not only for users who ask queries, but also for owners of peers who wish to be registered within the Guide. This work is closely related to my work on BioGuide (see below). More information is available here.

Selected Publications

2011
[32] Search Adapt and Reuse: The Future of Scientific Workflows. (Sarah Cohen-Boulakia, Ulf Leser) In SIGMOD Records, volume 40, 2011. [bib]
[31] Gene List significance at-a-glance with GeneValorization. (Bryan Brancotte, Anne Biton, Isabelle Bernard-Pierrot, Francois Radvanyi, Fabien Reyal, Sarah Cohen-Boulakia) In Bioinformatics, volume 27, 2011. [bib]
[30] Next generation data integration for Life Sciences (Sarah Cohen-Boulakia, Ulf Leser) In Proc. of the 25th Int. Conf. on Data Engineering (ICDE) IEEE, 2011. [bib]
[29] Using medians to generate consensus rankings for biological data (Sarah Cohen-Boulakia, Alain Denise, Sylvie Hamel) In Proc. SSDBM: Scientific and Statistical Database Management Conference, 2011. [bib]
2010
[28] Privacy Issues in Scientific Workflow Provenance (Susan Davidson Sanjeev Khanna Sudeepa Roy, Sarah Cohen-Boulakia) In Proc. of the 1st Int. Workshop on Workflow Approaches to New Data-centric Science (SIGMOD Workshop), 2010. [bib]
2009
[27] Biological Resource Discovery (Zoe Lacroix, Cartik R. Kothari, Peter Mork, Rami Rifaieh, Mark Wilkinson, Juliana Freire, Sarah Cohen-Boulakia) In Encyclopedia of Database Systems, 2009. [bib]
[26] Provenance in Scientific Databases (Sarah Cohen-Boulakia, Wang Chiew Tan) In Encyclopedia of Database Systems, 2009. [bib]
[25] Biological Metadata Management (Zoe Lacroix, Cartik R. Kothari, Peter Mork, Mark Wilkinson, Sarah Cohen-Boulakia) In Encyclopedia of Database Systems, 2009. [bib]
[24] Differencing Provenance in Scientific Workflows (Zhuowei Bao, Sarah Cohen-Boulakia, Susan Davidson, Anat Eyal, Sanjeev Khanna) In Proc. of the 25th Int. Conf. on Data Engineering (ICDE) IEEE, 2009. [bib]
[23] PDiffView: Viewing the Difference in Provenance of Workflow Results (Zhuowei Bao, Sarah Cohen-Boulakia, Susan Davidson, Pierrick Girard) In PVLDB Proc. of the 35th Int. Conf. on Very Large Data Bases, volume 2, 2009. [bib]
[22] BioBrowsing: Making the Most of the Data Available in Entrez (Sarah Cohen-Boulakia, Kevin Masini) In 21st Int. Conf. in Scientific and Statistical Database Management (SSDBM) LNCS 5566 Springer, 2009. [bib]
[21] On User Views in Scientific Workflow Systems (Invited Paper) (Susan Davidson, Yi Chen, Peng Sun, Sarah Cohen-Boulakia) In Proc. of the the First Int. Workshop on the role of Semantic Web in Provenance Management (ISWC 2009 Workshop), 2009. [bib]
2008
[20] Addressing the Provenance Challenge using ZOOM (Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Susan B. Davidson) In Concurrency and Computation: Practice and Experience, volume Vol 20, 2008. [bib]
[19] Special Issue: The First Provenance Challenge (Luc Moreau, Bertram Lud\ascher, Ilkay Altintas, Roger S. Barga, Shawn Bowers, Steven P. Callahan, George Chin Jr., Ben Clifford, Shirley Cohen, Sarah Cohen-Boulakia, Susan B. Davidson, Ewa Deelman, Luciano A. Digiampietri, Ian T. Foster, Juliana Freire, James Frew, Joe Futrelle, Tara Gibson, Yolanda Gil, Carole A. Goble, Jennifer Golbeck, Paul T. Groth, David A. Holland, Sheng Jiang, Jihie Kim, David Koop, Ales Krenek, Timothy M. McPhillips, Gaurang Mehta, Simon Miles, Dominic Metzger, Steve Munroe, Jim Myers, Beth Plale, Norbert Podhorszki, Varun Ratnakar, Emanuele Santos, Carlos Eduardo Scheidegger, Karen Schuchardt, Margo I. Seltzer, Yogesh L. Simmhan, Claudio T. Silva, Peter Slaughter, Eric G. Stephan, Robert Stevens, Daniele Turi, Huy T. Vo, Michael Wilde, Jun Zhao, Yong Zhao) In Concurrency and Computation: Practice and Experience, volume 20, 2008. [bib]
[18] Data Integration in the Life Sciences 5th International Workshop DILS 2008 Evry France June 25-27 2008. Proceedings (Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux), 2008. [bib]
[17] Review of the selected proceedings of the Fifth International Workshop on Data Integration in the Life Sciences 2008 (Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux) In BMC Bioinformatics, volume 9, 2008. [bib]
[16] Querying and Managing Provenance through User Views in Scientific Workflows (Olivier Biton, Sarah Cohen-Boulakia, Susan B. Davidson, Carmem S. Hara) In Proceedings of the 24th International Conference on Data Engineering ICDE 2008, 2008. [bib]
2007
[15] Provenance in Scientific Workflow Systems (Susan B. Davidson, Sarah Cohen-Boulakia, Anat Eyal, Bertram Lud\ascher, Timothy M. McPhillips, Shawn Bowers, Manish Kumar Anand, Juliana Freire) In IEEE Data Eng. Bull., volume 30, 2007. [bib]
[14] Data Integration in the Life Sciences 4th International Workshop DILS 2007 Philadelphia PA USA June 27-29 2007 Proceedings (Sarah Cohen-Boulakia, Val Tannen), 2007. [bib]
[13] Zoom*UserViews: Querying Relevant Provenance in Workflow Systems (Olivier Biton, Sarah Cohen-Boulakia, Susan B. Davidson) In VLDB 2007 Proc. of the International Conference on Very Large Data Bases, 2007. [bib]
[12] BioGuideSRS: querying multiple sources with a user-centric perspective (Sarah Cohen-Boulakia, Olivier Biton, Susan B. Davidson, Christine Froidevaux) In Bioinformatics, volume 23, 2007. [bib]
2006
[11] Towards a Model of Provenance and User Views in Scientific Workflows (Shirley Cohen, Sarah Cohen-Boulakia, Susan B. Davidson) In DILS 2006 Third International Workshop in Data Integration in the Life Sciences, 2006. [bib]
[10] Selecting Biological Data Sources and Tools with XPR a Path Language for RDF (Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Pietriga) In PSB 2006 Pacific Symposium on Biocomputing, 2006. [bib]
[9] Path-based Systems to Guide Scientists in the Maze of Biological Data Sources (Sarah Cohen-Boulakia, Susan B. Davidson, Christine Froidevaux, Zoe Lacroix, Maria-Esther Vidal) In J. Bioinformatics and Computational Biology, volume 4, 2006. [bib]
[8] Proteome informatics II. Bioinformatics for comparative proteomics (Frederique Lisacek, Sarah Cohen-Boulakia, Ron D. Appel) In ICDE 2008 Proc. of the International Conference on Data Engineering, 2006. [bib] [pdf]
2005
[7] A User-centric Framework for Accessing Biological Sources and Tools (Sarah Cohen-Boulakia, Susan B. Davidson, Christine Froidevaux) In DILS'05 Data Integration in the Life Sciences Lecture Notes in Computer Science (LNCS) series LNBI, 2005. [bib]
2004
[6] WInGS:A reliability controlled data warehouse for yeast (David Abergel, Sarah Cohen-Boulakia, Frederic Lemoine, Christine Froidevaux, Michel Termier) In In Proc. of JOBIM'2004 Journees Ouvertes Biologie Informatique et Mathematiques, 2004. [bib]
[5] Selecting biomedical data sources according to user preferences (Sarah Cohen-Boulakia, Severine Lair, Nicolas Stransky, Stephane Graziani, François Radvanyi, Emmanuel Barillot, Christine Froidevaux) In Bioinformatics, volume Vol 20 Suppl. 1, 2004. [bib] [pdf]
[4] Interrogation de sources biomedicales : gestion des preferences de l'utilisateur (Sarah Cohen-Boulakia, Christine Froidevaux, Severine Lair) In Proc. of EGC'2004 Extraction et Gestion des Connaissances, 2004. [bib] [pdf]
[3] Preferences for Queries in a Mediator Approach (Alain Bidault, Sarah Cohen-Boulakia, Christine Froidevaux) In In Proc. of ECAI'2004 European Conference on Artificial Intelligence, 2004. [bib] [pdf]
2002
[2] Integration de Sources de Donnees Genomiques du Web (Christine Froidevaux, Sarah Cohen-Boulakia) In Journees scientifiques du Web Semantique (actes electroniques), 2002. [bib] [pdf]
[1] Genopage : A database of all protein modules encoded by completely sequenced genomes (Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Waller, Bernard Labedan) In Proc. of JOBIM 2002 Journees Ouvertes Biologie Informatique et Mathematiques, 2002. [bib] [pdf]

Service

Participation in working groups

  • Provenance Challenges, an international workshop which bring together researchers and indistrials interested in provenance for workflow systems.
  • DB/IR day, an american workshop which bring together database and information retrieval researchers and students from academic and research institutions across the tristate area and beyond.
  • ISIBio, a french interdisciplinary working group interested in various aspects of "Information Systems Integration in Biology". This group brings together researchers from seven computer science laboratories and from ten biological laboratories (2004-2006).
  • AS127, the national CNRS Working group on integration and interoperability of genomic data sources. (2003-2004).
  • PPF, multidisciplinary program "Programme PluriFormation" on Bioinformatics and Genomics. This PPF brings together the bioinformatics groups from three biological laboratories, two computer science laboratories and from the laboratory of mathematics at Orsay campus.

Teaching

University of Paris-Sud 11
I taught at IFIPS (students in computer science) and at BIBS (students in bioinformatics).
  • Faculty since Sept. 2007
    • Software Engineering: UML, JAVA (graduate)
    • Database Management Systems (junior, senior)

  • Temporary Faculty (Attaché Temporaire d'Enseignement et de Recherche), Sept. 2005 - Dec. 2005
    • Software Engineering: UML, JAVA (graduate)
    • Database Management Systems (graduate)

  • Teaching assistant, 2002-2005
    • Software Engineering: UML/OCL specifications (graduate)
    • Database Management Systems (graduate)
    • Algorithms (undergraduate)
    • ADA Programming (undergraduate)

Others

Sioukiou Web site I wanted to provide a link to the web site of my sister in law, Sioukiou,
who is an artist and makes beautiful sculptures. Please have a look at it and enjoy your visit!