Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Faculty habilitation de SAÏS Fatiha
SAÏS Fatiha
Faculty habilitation
Group : Large-scale Heterogeneous DAta and Knowledge

Knowledge Graph Refinement: Link Detection, Link Invalidation, Key Discovery and Data Enrichment

Starts on 01/01/1970
Advisor :

Funding :
Affiliation : vide
Laboratory :

Defended on 20/06/2019, committee :

Research activities :

Abstract :
This habilitation thesis outlines some methods and tools resulting from my research activities during the last ten years as well as my scientific projects for a near future. These methods and tools have been developed for knowledge graph refinement in the context of Web of data. We are experiencing an unprecedented production of resources published as Linked Open Data (LOD, for short). This has led to the creation of knowledge graphs (KGs) containing billions of RDF (Resource Description Framework) triples, such as DBpedia, YAGO and Wikidata on the academic side, and the Google Knowledge Graph or eBay Knowledge Graph or Facebook Graph on the commercial side. However, building knowledge graphs while ensuring their completeness and correctness, is a challenging endeavour. For this challenging problem, my research contributions have focused on several issues. First, identity link invalidation problem for which we developed two main approaches relying on either the semantics of ontology axioms to detect inconsistency in the KGs or on the network structure of identity links to assign an error degree for every identity link in the LOD. Second, in the settings of scientific KGs, we defined a generic approach for detecting contextual identity links representing a weak identity relation between entities that is valid in an explicit context expressed as a sub-part of the ontology. This approach is a contribution to the overcoming problem of the strict semantics of owl:sameAs}predicate, that is not required in all application domains. Third, we proposed a data fusion approach that is able to aggregate data coming from different sources and to compute a unique representation for a set of given linked entities. Furthermore, to deal with missing value prediction, we developed an approach that relies on data linking and case-based reasoning to predict missing values. Finally, to enrich the conceptual level of KGs with new key axioms, that are particularly important for detecting identity links, we defined three efficient methods: KD2R, for discovering exact keys, SAKey for discovering n-almost keys and VICKEY for discovering conditional keys. These three methods are based on computing first the maximal non-keys and then deriving the minimal keys, and apply several strategies to prune the search space.

Overall these approaches have been developed in collaboration with several fellow researchers, in the setting of several PhD theses, post-docs and master theses; some of them in the context of ANR, CNRS and industrial research projects, involving different organisms and companies, such as, INRA, INA, ABES, IGN and Thalès.

Ph.D. dissertations & Faculty habilitations
The original manuscript conceptualizes the recent rise of digital platforms along three main dimensions: their nature of coordination devices fueled by data, the ensuing transformations of labor, and the accompanying promises of societal innovation. The overall ambition is to unpack the coordination role of the platform and where it stands in the horizon of the classical firm – market duality. It is also to precisely understand how it uses data to do so, where it drives labor, and how it accommodates socially innovative projects. I extend this analysis to show continuity between today’s society dominated by platforms and the “organizational society”, claiming that platforms are organized structures that distribute resources, produce asymmetries of wealth and power, and push social innovation to the periphery of the system. I discuss the policy implications of these tendencies and propose avenues for follow-up research.