Objectives
Nowadays, the Web of documents has evolved into a Web of Data connecting distributed and structured data (e.g., RDF, RDFa, MicroFormat) across the Web. To benefit of all the Web of data richness, it is important to establish whether two pieces of data refer to the same real world entity. In this module, we first survey well-known data integration architectures. Then, we present the data linking problem by giving a classification of the main existing approaches: supervised/unsupervised, local/global, knowledge-based and single/multi-ontologies. After that, we introduce the data fusion issue encountered when data connected by an identity link has to be integrated, which arises the problem of conflicting values. The main approaches, techniques and knowledge used to solve all these issues are explored.
Intended outcome: This course gives the students an understanding of the difficulties encountered with regard to the design of an application when he has to decide that the “Musée des Arts Premier”, located near “Trocadero” and the “Musée du quai Branly”, located in “Paris’s 7th arrondissement”, refer to the same museum. It gives also an understanding of the criteria to choose a data linking approach in order to take into account characteristics related to the data and to the application. Furthermore, it introduces students to the data fusion issue, allowing to develop tools specifically adapted to the data and application domain. After that, the students will have an introduction to querying and navigating through real biological databases, levels of heterogeneity, major kinds of data integration architecture to integrate bio data. Then, an overview of existing solutions to enhance reproducibility of bioinformatics experiments: scientific workflows and provenance, will also be shown to the students. Finally, this course will finish by giving a presentation of real world use cases of data integration in agronomy domain with a focus on ontology medelling and semantic annotation.
Course Organization
- 02/12/2019, 13h30 - 16h30: Semantic data integration - Data Linking and Identity Problem (slides)
(by Fatiha Saïs sais@lri.fr) - 09/12/2019, 13h30 - 16h30: Course 1 (Cont.), Lab session
(by Fatiha Saïs sais@lri.fr) Practical Session MATERIALS - 16/12/2019, 13h30 - 16h30: Semantic data integration – Ontology Alignment and Knowledge discovery + Presentation of the projects (slides)
(by Nathalie Pernelle pernelle@lri.fr) - 06/01/2020, 13h30 - 16h30: Practical session for projects
(by Fatiha Saïs sais@lri.fr and Nathalie Pernelle pernelle@lri.fr ) - 13/01/2020, 13h30 - 16h30: Data integration in life science (Lab document and slides)
(by Sarah Cohen Boulakia Sarah.Cohen_Boulakia@lri.fr) - 20/01/2020, 13h30 - 16h30: Ontology modelling and semantic annotation
(by Liliana Ibanescu liliana.ibanescu@agroparistech.fr) - 03/02/2020, 13h30 - 16h30: Project presentation and evaluation (by all Professors)
Evaluation (Grading) by Projects
- Link to the proposed projects
- Report (50%)
- Talk (50%)
- To be done by group (of 4 students)