Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Faculty habilitation de SAÏS Fatiha
SAÏS Fatiha
Faculty habilitation
Group : Large-scale Heterogeneous DAta and Knowledge

Knowledge Graph Refinement: Link Detection, Link Invalidation, Key Discovery and Data Enrichment

Starts on 01/01/1970
Advisor :

Funding :
Affiliation : vide
Laboratory :

Defended on 20/06/2019, committee :

Research activities :

Abstract :
This habilitation thesis outlines some methods and tools resulting from my research activities during the last ten years as well as my scientific projects for a near future. These methods and tools have been developed for knowledge graph refinement in the context of Web of data. We are experiencing an unprecedented production of resources published as Linked Open Data (LOD, for short). This has led to the creation of knowledge graphs (KGs) containing billions of RDF (Resource Description Framework) triples, such as DBpedia, YAGO and Wikidata on the academic side, and the Google Knowledge Graph or eBay Knowledge Graph or Facebook Graph on the commercial side. However, building knowledge graphs while ensuring their completeness and correctness, is a challenging endeavour. For this challenging problem, my research contributions have focused on several issues. First, identity link invalidation problem for which we developed two main approaches relying on either the semantics of ontology axioms to detect inconsistency in the KGs or on the network structure of identity links to assign an error degree for every identity link in the LOD. Second, in the settings of scientific KGs, we defined a generic approach for detecting contextual identity links representing a weak identity relation between entities that is valid in an explicit context expressed as a sub-part of the ontology. This approach is a contribution to the overcoming problem of the strict semantics of owl:sameAs}predicate, that is not required in all application domains. Third, we proposed a data fusion approach that is able to aggregate data coming from different sources and to compute a unique representation for a set of given linked entities. Furthermore, to deal with missing value prediction, we developed an approach that relies on data linking and case-based reasoning to predict missing values. Finally, to enrich the conceptual level of KGs with new key axioms, that are particularly important for detecting identity links, we defined three efficient methods: KD2R, for discovering exact keys, SAKey for discovering n-almost keys and VICKEY for discovering conditional keys. These three methods are based on computing first the maximal non-keys and then deriving the minimal keys, and apply several strategies to prune the search space.

Overall these approaches have been developed in collaboration with several fellow researchers, in the setting of several PhD theses, post-docs and master theses; some of them in the context of ANR, CNRS and industrial research projects, involving different organisms and companies, such as, INRA, INA, ABES, IGN and Thalès.

Ph.D. dissertations & Faculty habilitations
Creative work has been at the core of research in Human-Computer Interaction (HCI). I describe the results of a series of studies that look at how creators work, where creators include artists with years of professional practice, as well as learners, or novices and casual makers. My research focuses on three creation activities: drawing, physical modeling, and music composition. For these activities, I examine how artists switch between representations and how these representations evolve throughout their creative process, from early sketches to fine-grained forms or structured vocabularies. I present interactive systems that enrich their workflow (i) by extending their computer tools with physical user interfaces, or (ii) by making physical materials interactive. I also argue that sketch-based representations can allow for user interfaces that are more personal and less rigid. My presentation will reflect on lessons and limitations of this work and discuss challenges for future design-support tools.

Interactive visualizations combine human computer interaction, visual design, perception theory, as well as data processing methods in order to propose visual data representations that amplify cognition, and aid data exploration and understanding. We can consider visualization as a communication medium or channel between humans and their data. The higher the communication bandwidth (the data that can be communicated and understood), the more effective the visualization is. My research attempts to increase the bandwidth of this communication channel in the following two ways. (i) First, by moving away from traditional desktops towards larger displays that can both render larger amounts of data and can accommodate multiple viewers. (ii) And second, by designing and studying appropriate visual representations that show salient information. In my presentation I will describe my work on these topics, the challenges it tries to address, and discuss the methodology and inspiration behind this research.