Data Linkage

I am interested in data linkage approaches which exploit knowledge that can be declared in the ontology.
To combine and reason about data coming from different RDF data sources, semantic links are needed to connect resources. In particular, identity links allow to declare that two data items refer to the same real world object, i.e the same hotel, the same lab, ... Based on these links, it is possible to combine information about the same real-world entity.
In the setting of the PICSEL 3 project, we have defined a first logical approach named L2R, in which some ontology axioms and additionnal knowledge about the data sources are automatically translated into Horn rules and used to infer (non) identity links. We have also proposed a numerical approach named N2R, in which knowledge semantics is automatically translated into non-linear equations which allow to compute similarity scores for reference pairs.
Discriminative properties that can be used for data linkage are difficult to determine even for domain experts. In the setting of the Qualinca project, we have developped approaches that can discover the composite keys from RDF data sources. KD2R, exploits data sources for which the Unique Name Assumption is stated. SAKey can discover keys when datasets may contain duplicates or erroneous property values. Vickey is an efficent approach that discover conditional keys (keys that are valid for class expressions). We have also proposed approaches that can deal with scientific data where many properties are described using numerical values.
Last, we have investigated how invalid sameAs links can be logically detected and how the invalidation process can be explained to a user expert and we try now to represent and detect identity links that do not follow the strict OWL2 semantics of the sameAs construct.

Data integration - Semantic annotation of web documents

I have investigated ontology-based approaches that aim to semantically annotate web documents. I am interested in approaches that are guided by the syntactic structure of (part of) documents.
In the setting of the project, we have investigated the semantic annotation of tabular data. Indeed, annotation tools can take benefit of the structure of the tables. The idea was to semantically annotate as many information as possible while allowing the users to have access to elements of the original table in order to limit the errors due to wrong annotations (Xtab2SML).
In the SHIRI project we have also exploited the syntactic structure of heterogeneous HTML documents in an annotation process. In the developped approaches, the HTML structure is either used to better rank the annotations when they appear in the most structured parts of the documents (SHIRI-Querying) or to provide semantic relations that are difficult to discover with lexico-syntactic patterns (REISA).

Formal Concept Analysis - Automatic construction of class hierarchies over XML data

I have also worked on approaches that can help users to access to a large amount of XML data by clustering them in a small number of classes described at different levels of abstraction. Our basic representation was a Galois lattice which is a well-defined and exhaustive representation of the classes embedded in a data set (approach ZOOM).

Supervised PhD Students

Joe Raad (co-supervised with Juliette Dibie, Liliana Ibanescu, Fatiha Sais)

Danai Symeonidou (co-supervised with Fatiha Sais)

Yassine Mrabet (co-supervised with Chantal Reynaud, Nacera Bennacer)

Mouhamadou Thiam (co-supervised with Chantal Reynaud, Nacera Bennacer)

Fatiha Sais (co-supervised with Marie-Christine Rousset)