M2 Data & Knowledge: Information Integration

Objectives

Nowadays, the Web of documents has evolved into a Web of Data connecting distributed and structured data (e.g., RDF, RDFa, MicroFormat) across the Web. To benefit of all the Web of data richness, it is important to establish whether two pieces of data refer to the same real world entity. In this module, we first survey well-known data integration architectures. Then, we present the data linking problem by giving a classification of the main existing approaches: supervised/unsupervised, local/global, knowledge-based and single/multi-ontologies. After that, we introduce the data fusion issue encountered when data connected by an identity link has to be integrated, which arises the problem of conflicting values. The main approaches, techniques and knowledge used to solve all these issues are explored.

Intended outcome: This course gives the students an understanding of the difficulties encountered with regard to the design of an application when he has to decide that the “Musée des Arts Premier”, located near “Trocadero” and the “Musée du quai Branly”, located in “Paris’s 7th arrondissement”, refer to the same museum. It gives also an understanding of the criteria to choose a data linking approach in order to take into account characteristics related to the data and to the application. Furthermore, it introduces students to the data fusion issue, allowing to develop tools specifically adapted to the data and application domain. After that, the students will have an introduction to querying and navigating through real biological databases, levels of heterogeneity, major kinds of data integration architecture to integrate bio data. Then, an overview of existing solutions to enhance reproducibility of bioinformatics experiments: scientific workflows and provenance, will also be shown to the students. Finally, this course will finish by giving a presentation of real world use cases of data integration in agronomy domain with a focus on ontology medelling and semantic annotation.


Course Organization

Evaluation (Grading) by Projects