LRI : Welcome
Laboratoire de Recherche en Informatique
Sarah Cohen-Boulakia

Personal information
e-mail cohen @ lri . fr
web page http://www.lri.fr/~cohen
position Associate professor (maitre de conferences HDR), Ph.D. in Computer Science
phone +(33) 1 69 15 32 16
fax +(33) 1 69 15 65 86
office140
adressLRI, Bâtiment 650
Université Paris-Sud
91405 Orsay cedex

News

I am currently on sabbatical leave in Montpellier at the Institute of Computational Biology, in the INRIA groups Virtual Plants and Zenith.

I am the Program Committee chair of the next TaPP international workshop (8th USENIX International Workshop on the Theory and Practice of Provenance). Please submit papers and participate to the workshop (Washington DC, June 8-9)!

I co-animate the working group (Action) on "Reproducibility of scientific experiments" with Ch. Blanchet at IFB the GDR MaDICS. Please contact us if you want to participate to the group.

Interests

  • Provenance in scientific workflows
  • Integrating, querying and ranking biological & biomedical databases
  • Workflow/dataflow design, software engineering, user requirements
  • Semantic Web, metadata
I am currently working on the problematic of provenance in scientific workflows and ranking biological data.

Research Experience

  • Provenance in scientific workflow systems
    I am interested in the problem of storing and querying provenance in scientific workflows. I am working on techniques to reduce the complexity of the structure of scientific workflows (SPFlow and DistillFlow projects).
    Previously, I have worked on techniques to reduce the huge amount of data provenance to help users focus on relevant information (ZOOM*UserView project). We have demonstrated the benefit of using our techniques by participating in the "Provenance Challenge". More information can be found here. Another direction of research is on differencing workflow runs (PDiffview Project). Very recently, I have been involved in a project on secure views for scientific workflows.

  • BioConsert and ConQur-Bio
    Using consensus rankings techniques to order the results of several biological queries. This is the main topic of Bryan Brancotte's thesis. Please have a look at the ConQuR-BioWeb site!

  • GeneValorization
    Collaborative work between LRI and the Institut Curie. GeneValorization is a tool which gives a clear and concise overview of the publications existing on a list of genes. We have used GeneValorization to study several lists of genes involved in Cancer. The main developper and designer of this project is Bryan Brancotte. Please have a look at the GeneValorization Web site!

  • BioGuide
    Collaborative work between the LRI bioinformatics group at Paris-Sud University and the database group at UPenn. Have a look at bioguide-project.net!
    BioGuide extends DSS to be adapted to many user profiles. I have elaborated a questionnaire and performed interviews of 20 scientists from various domains (cancer study, annotation project, ...) to evaluate their needs in the process of querying. In collaboration with C. Froidevaux and S. Davidson, I have designed BioGuide, a generic framework to guide the users to select the relevant sources to be queried and the tools to be used according to their preferences (e.g., the reliability level of the sources) and following their querying strategies. The biological significance of the results obtained with BioGuide has been shown in the context of Comparative Genomic Hybridization (CGH) analysis performed at the Curie Institute. I have developed the BioGuide system in JAVA (applet) with the help of Olivier Biton. BioGuide is available for use. The system is very flexible and can be adapted to any biological domain. I have developped a module to use BioGuide on top of the SRS system. BioGuideSRS provides acess to instances of data!
    Have a look at BioGuide-project.net!

Supervision of Students

In France, I have co-supervised the following PhD students
  • Bryan Brancotte who works on Consensus ranking techniques to rank Biological data (co-supervised with Alain Denise) and has defended on September 25th 2015 (now research engineer at the French Institute of Bioinformatics).
  • Jiuqiang Chen has defended on Oct. 2013. His thesis was about Designing scientific workflow following a structure and provenance-aware strategy (co-supervised with Christine Froidevaux).
I have (co-)supervised the following master students in the past years:
  • Stéphanie Kamgnia
  • With Patrick Valduriez : Moussa Yattara
  • With Christine Froidevaux : Jun Li, Nicolas Laignel (co-supervised with Ulf Leser), Wael Hamdam [Research, Computer Science, 2nd year]
  • With Alain Denise: Bryan Brancotte
  • With Susan Davidson: Weijia Wang, Pierrick Girard [Engineering school, Polytech Paris Sud]
  • Heloise Bourlet, Kevin Massini [Bioinformatics program, 1st year]

Collaborations

  • Since 2005 : University of Pennsylvania (Susan Davidson and the Penn Database group)
  • Since 2010 : University of Berlin (Ulf Leser), PHC Grant (Procope Program) accepted on "Sharing and Optimizing Scientific Workflows".
  • Since 2012 : University of Manchester
  • Since 2014 : Institute of Computational Biology (IBC), Montpellier

Selected Publications

2015
[46] Effective and Efficient Similarity Search in Scientific Workflow Repositories (Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser) In Future Generation Computer Systems, volume missing volume, 2015. [bib] [pdf]
[45] OpenAlea: Scientific Workflows Combining Data Analysis and Simulation (Christophe Pradal, Christian Fournier, Patrick Valduriez, Sarah Cohen-Boulakia) In SSDBM 2015: 27th International Conference on Scientific and Statistical Database Management, 2015. [bib] [pdf] [doi]
[44] Interrogation et Gestion de donnees bioinformatiques pour la biologie moleculaire (Sarah Cohen-Boulakia, Patrick Valduriez) In Les Techniques de l'Ingenieur, volume missing volume, 2015. [bib] [pdf]
[43] Interrogation de bases de donnees biologiques publiques par reformulation de requetes et classement des resultats avec ConQuR-Bio (Bryan Brancotte, Bastien Rance, Alain Denise, Sarah Cohen-Boulakia), 2015. [bib] [pdf]
[42] Rank aggregation with ties: Experiments and Analysis (Bryan Brancotte, Bo Yang, Guillaume Blin, Sarah Cohen-Boulakia Alain Denise, Sylvie Hamel) In Proceedings of the VLDB Endowment (PVLDB), volume missing volume, 2015. [bib] [pdf]
2014
[41] Layer Decomposition: An Effective Structure-based Approach for Scientific Workflow Similarity (Johannes Starlinger, Sarah Cohen-Boulakia, Sanjeev Khanna, Susan Davidson, Ulf Leser) In Proc. of the 10th IEEE International Conference in eScience, 2014. [bib]
[40] DistillFlow: Removing redundancy in Scientific Workflows (Jiuqiang Chen, Sarah Cohen-Boulakia, Christine Froidevaux, Carole Goble, Paolo Missier, Alan Williams) In Proc. of the 22nd Int. Conf. in Scientific and Statistical Database Management (SSDBM), 2014. [bib] [pdf]
[39] ConQuR-Bio: Consensus ranking with Query Reformulation for Biological data (Bryan Brancotte, Bastien Rance, Alain Denise, Sarah Cohen-Boulakia) In DILS 2014 Tenth International Workshop in Data Integration in the Life Sciences, 2014. [bib] [pdf]
[38] Similarity Search for Scientific Workflows (Johannes Starlinger, Bryan Brancotte, Sarah Cohen-Boulakia, Ulf Leser) In Proceedings of the VLDB Endowment (PVLDB), volume missing volume, 2014. [bib] [pdf]
[37] Distilling Structure in Taverna Scientific Workflows: A refactoring approach. (Sarah Cohen-Boulakia, Jiuqiang Chen, Carole Goble, Paolo Missier, Alan Williams, Christine Froidevaux) In BMC Bioinformatics, volume 15, 2014. [bib]
2012
[36] (Re)Use in Public Scientific Workflow Repositories. (Johannes Starlinger, Sarah Cohen-Boulakia, Ulf Leser) In Proc. of the 22nd Int. Conf. in Scientific and Statistical Database Management (SSDBM), 2012. [bib]
[35] Scientific Workflow Rewriting while Preserving Provenance (Sarah Cohen-Boulakia, Christine Froidevaux, Jiuqiang Chen) In Proc. of the 8th IEEE International Conference in eScience, 2012. [bib]
[34] Reecriture de workflows scientifiques et provenance (Sarah Cohen-Boulakia, Christine Froidevaux, Jiuqiang Chen) In Proc. of the 28th Journees de Bases de Donnees Avancees, 2012. [bib]
[33] Distilling scientific workflow structure (Sarah Cohen-Boulakia, Christine Froidevaux, Carole Goble, Alan Williams, Jiuqiang Chen) In EMBnet Journal Proc. of the 12th International Workshop on Network Tools and Applications in Biology Nettab 2012 (poster), volume 18, 2012. [bib]
2011
[32] Search Adapt and Reuse: The Future of Scientific Workflows. (Sarah Cohen-Boulakia, Ulf Leser) In SIGMOD Records, volume 40, 2011. [bib]
[31] Gene List significance at-a-glance with GeneValorization. (Bryan Brancotte, Anne Biton, Isabelle Bernard-Pierrot, Francois Radvanyi, Fabien Reyal, Sarah Cohen-Boulakia) In Bioinformatics, volume 27, 2011. [bib]
[30] Next generation data integration for Life Sciences (Sarah Cohen-Boulakia, Ulf Leser) In Proc. of the 25th Int. Conf. on Data Engineering (ICDE) IEEE, 2011. [bib]
[29] Using medians to generate consensus rankings for biological data (Sarah Cohen-Boulakia, Alain Denise, Sylvie Hamel) In Proc. SSDBM: Scientific and Statistical Database Management Conference, 2011. [bib]
2010
[28] Privacy Issues in Scientific Workflow Provenance (Susan Davidson Sanjeev Khanna Sudeepa Roy, Sarah Cohen-Boulakia) In Proc. of the 1st Int. Workshop on Workflow Approaches to New Data-centric Science (SIGMOD Workshop), 2010. [bib]
2009
[27] Biological Resource Discovery (Zoe Lacroix, Cartik R. Kothari, Peter Mork, Rami Rifaieh, Mark Wilkinson, Juliana Freire, Sarah Cohen-Boulakia) In Encyclopedia of Database Systems, 2009. [bib]
[26] Provenance in Scientific Databases (Sarah Cohen-Boulakia, Wang Chiew Tan) In Encyclopedia of Database Systems, 2009. [bib]
[25] Biological Metadata Management (Zoe Lacroix, Cartik R. Kothari, Peter Mork, Mark Wilkinson, Sarah Cohen-Boulakia) In Encyclopedia of Database Systems, 2009. [bib]
[24] Differencing Provenance in Scientific Workflows (Zhuowei Bao, Sarah Cohen-Boulakia, Susan Davidson, Anat Eyal, Sanjeev Khanna) In Proc. of the 25th Int. Conf. on Data Engineering (ICDE) IEEE, 2009. [bib]
[23] PDiffView: Viewing the Difference in Provenance of Workflow Results (Zhuowei Bao, Sarah Cohen-Boulakia, Susan Davidson, Pierrick Girard) In PVLDB Proc. of the 35th Int. Conf. on Very Large Data Bases, volume 2, 2009. [bib]
[22] BioBrowsing: Making the Most of the Data Available in Entrez (Sarah Cohen-Boulakia, Kevin Masini) In 21st Int. Conf. in Scientific and Statistical Database Management (SSDBM) LNCS 5566 Springer, 2009. [bib]
[21] On User Views in Scientific Workflow Systems (Invited Paper) (Susan Davidson, Yi Chen, Peng Sun, Sarah Cohen-Boulakia) In Proc. of the the First Int. Workshop on the role of Semantic Web in Provenance Management (ISWC 2009 Workshop), 2009. [bib]
2008
[20] Addressing the Provenance Challenge using ZOOM (Sarah Cohen-Boulakia, Olivier Biton, Shirley Cohen, Susan B. Davidson) In Concurrency and Computation: Practice and Experience, volume Vol 20, 2008. [bib]
[19] Special Issue: The First Provenance Challenge (Luc Moreau, Bertram Lud\ascher, Ilkay Altintas, Roger S. Barga, Shawn Bowers, Steven P. Callahan, George Chin Jr., Ben Clifford, Shirley Cohen, Sarah Cohen-Boulakia, Susan B. Davidson, Ewa Deelman, Luciano A. Digiampietri, Ian T. Foster, Juliana Freire, James Frew, Joe Futrelle, Tara Gibson, Yolanda Gil, Carole A. Goble, Jennifer Golbeck, Paul T. Groth, David A. Holland, Sheng Jiang, Jihie Kim, David Koop, Ales Krenek, Timothy M. McPhillips, Gaurang Mehta, Simon Miles, Dominic Metzger, Steve Munroe, Jim Myers, Beth Plale, Norbert Podhorszki, Varun Ratnakar, Emanuele Santos, Carlos Eduardo Scheidegger, Karen Schuchardt, Margo I. Seltzer, Yogesh L. Simmhan, Claudio T. Silva, Peter Slaughter, Eric G. Stephan, Robert Stevens, Daniele Turi, Huy T. Vo, Michael Wilde, Jun Zhao, Yong Zhao) In Concurrency and Computation: Practice and Experience, volume 20, 2008. [bib]
[18] Data Integration in the Life Sciences 5th International Workshop DILS 2008 Evry France June 25-27 2008. Proceedings (Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux), 2008. [bib]
[17] Review of the selected proceedings of the Fifth International Workshop on Data Integration in the Life Sciences 2008 (Amos Bairoch, Sarah Cohen-Boulakia, Christine Froidevaux) In BMC Bioinformatics, volume 9, 2008. [bib]
[16] Querying and Managing Provenance through User Views in Scientific Workflows (Olivier Biton, Sarah Cohen-Boulakia, Susan B. Davidson, Carmem S. Hara) In Proceedings of the 24th International Conference on Data Engineering ICDE 2008, 2008. [bib]
2007
[15] Provenance in Scientific Workflow Systems (Susan B. Davidson, Sarah Cohen-Boulakia, Anat Eyal, Bertram Lud\ascher, Timothy M. McPhillips, Shawn Bowers, Manish Kumar Anand, Juliana Freire) In IEEE Data Eng. Bull., volume 30, 2007. [bib]
[14] Data Integration in the Life Sciences 4th International Workshop DILS 2007 Philadelphia PA USA June 27-29 2007 Proceedings (Sarah Cohen-Boulakia, Val Tannen), 2007. [bib]
[13] Zoom*UserViews: Querying Relevant Provenance in Workflow Systems (Olivier Biton, Sarah Cohen-Boulakia, Susan B. Davidson) In VLDB 2007 Proc. of the International Conference on Very Large Data Bases, 2007. [bib]
[12] BioGuideSRS: querying multiple sources with a user-centric perspective (Sarah Cohen-Boulakia, Olivier Biton, Susan B. Davidson, Christine Froidevaux) In Bioinformatics, volume 23, 2007. [bib]
2006
[11] Towards a Model of Provenance and User Views in Scientific Workflows (Shirley Cohen, Sarah Cohen-Boulakia, Susan B. Davidson) In DILS 2006 Third International Workshop in Data Integration in the Life Sciences, 2006. [bib]
[10] Selecting Biological Data Sources and Tools with XPR a Path Language for RDF (Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Pietriga) In PSB 2006 Pacific Symposium on Biocomputing, 2006. [bib]
[9] Path-based Systems to Guide Scientists in the Maze of Biological Data Sources (Sarah Cohen-Boulakia, Susan B. Davidson, Christine Froidevaux, Zoe Lacroix, Maria-Esther Vidal) In J. Bioinformatics and Computational Biology, volume 4, 2006. [bib]
[8] Proteome informatics II. Bioinformatics for comparative proteomics (Frederique Lisacek, Sarah Cohen-Boulakia, Ron D. Appel) In ICDE 2008 Proc. of the International Conference on Data Engineering, 2006. [bib] [pdf]
2005
[7] A User-centric Framework for Accessing Biological Sources and Tools (Sarah Cohen-Boulakia, Susan B. Davidson, Christine Froidevaux) In DILS'05 Data Integration in the Life Sciences Lecture Notes in Computer Science (LNCS) series LNBI, 2005. [bib]
2004
[6] WInGS:A reliability controlled data warehouse for yeast (David Abergel, Sarah Cohen-Boulakia, Frederic Lemoine, Christine Froidevaux, Michel Termier) In In Proc. of JOBIM'2004 Journees Ouvertes Biologie Informatique et Mathematiques, 2004. [bib]
[5] Selecting biomedical data sources according to user preferences (Sarah Cohen-Boulakia, Severine Lair, Nicolas Stransky, Stephane Graziani, François Radvanyi, Emmanuel Barillot, Christine Froidevaux) In Bioinformatics, volume Vol 20 Suppl. 1, 2004. [bib] [pdf]
[4] Interrogation de sources biomedicales : gestion des preferences de l'utilisateur (Sarah Cohen-Boulakia, Christine Froidevaux, Severine Lair) In Proc. of EGC'2004 Extraction et Gestion des Connaissances, 2004. [bib] [pdf]
[3] Preferences for Queries in a Mediator Approach (Alain Bidault, Sarah Cohen-Boulakia, Christine Froidevaux) In In Proc. of ECAI'2004 European Conference on Artificial Intelligence, 2004. [bib] [pdf]
2002
[2] Integration de Sources de Donnees Genomiques du Web (Christine Froidevaux, Sarah Cohen-Boulakia) In Journees scientifiques du Web Semantique (actes electroniques), 2002. [bib] [pdf]
[1] Genopage : A database of all protein modules encoded by completely sequenced genomes (Sarah Cohen-Boulakia, Christine Froidevaux, Emmanuel Waller, Bernard Labedan) In Proc. of JOBIM 2002 Journees Ouvertes Biologie Informatique et Mathematiques, 2002. [bib] [pdf]

Service

  • PC member

    • SIGMOD 2016, SIGMOD 2015
    • ICDE 2015, ICDE 2012, ICDE 2010 (I won the ICDE Outstanding Reviewer Award!)
    • TAPP 2013, TAPP 2014, TAPP 2015, TAPP 2016(International Workshop on the Theory and Practice of Provenance)
    • SWEET 2012, SWEET 2013, International workshop on Scalable Workflow Enactment Engines and Technologies
    • VLDB 2011 (demo track), Very Larges Data Bases.
    • JOBIM 2010, JOBIM 2011, the French conference in Bioinformatics.
    • SSDBM 2009, International Conference on Scientific and Statistical Database Management.
    • SWPM 2009, SWPM 2010, International Workshop on the role of Semantic Web in Provenance Management (ISWC workshop)
    • DILS 2006, DILS 2007, DILS 2008, DILS 2009, DILS 2010, DILS 2012, DILS 2013, DILS 2014 International workshops on Data Integration in the Life Sciences (I was co-chair of DILS 2008), I am member of the Steering committee of DILS.
    • BDA 2009, BDA 2012, the French conference in Databases
    • IIMAS'08, Workshop on Information Integration Methods, Architectures, and Systems (ICDE workshop).
    • ISMB 2006, International conference on Intelligent Systems for Molecular Biology.

  • Member of editorial board:Journal on data Semantics (JoDS)

  • Reviewer


  • Workshop co-organizer

    • DILS'08, the 5th Annual International workshop on Data Integration in the Life Sciences, University of Evry, France 2008.
      • Program co-Chair (with Amos Bairoch, SwissProt and Christine Froidevaux, LRI)


    • DILS'07, the 4th Annual International workshop on Data Integration in the Life Sciences, University of Pennsylvania, USA, 2007.
      • Proceedings co-editor (with Val Tannen), LNBI, Springer
      • Publicity Chair
      • Web Master


    • II 2007 (Workshop in Information Integration), University of Pennsylvania, USA.
      • Local organizer
      • Editor of the Proceedings (printed locally)


Participation in working groups

  • Provenance Challenges, an international workshop which bring together researchers and indistrials interested in provenance for workflow systems.
  • DB/IR day, an american workshop which bring together database and information retrieval researchers and students from academic and research institutions across the tristate area and beyond.
  • ISIBio, a french interdisciplinary working group interested in various aspects of "Information Systems Integration in Biology". This group brings together researchers from seven computer science laboratories and from ten biological laboratories (2004-2006).
  • AS127, the national CNRS Working group on integration and interoperability of genomic data sources. (2003-2004).
  • PPF, multidisciplinary program "Programme PluriFormation" on Bioinformatics and Genomics. This PPF brings together the bioinformatics groups from three biological laboratories, two computer science laboratories and from the laboratory of mathematics at Orsay campus.

Teaching

University of Paris-Sud 11
I taught at Polytech Paris Sud (students in computer science) and at BIBS (students in bioinformatics).
  • Faculty since Sept. 2007
    • Software Engineering: UML, JAVA (graduate)
    • Database Management Systems (junior, senior)

Others

Sioukiou Web site I wanted to provide a link to the web site of my sister in law, Sioukiou,
who is an artist and makes beautiful sculptures. Please have a look and enjoy your visit!