Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group :

Digital Identity Discovery and Reconciliation for Human Resources Management

Starts on 01/04/2014
Advisor : SEGHOUANI BENNACER, Nacéra
[QUERCINI Gianluca]

Funding : Contrat doctoral uniquement recherche
Affiliation : Centrale Supélec
Laboratory :

Defended on 27/11/2017, committee :
- Patrick MARCEL – François-Rabelais Université

- Mathieu ROCHE – CIRAD

- Nacéra SEGHOUANI BENNACER – LRI, CentraleSupélec

- Gianluca QUERCINI – LRI, CentraleSupélec

- Dario COLAZZO – Paris-Dauphine Université

- Nicolas SABOURET – Paris-Sud Université

- Florent ANDRÉ – MindMatcher

Research activities :

Abstract :
Finding the appropriate individual to hire is a crucial part of any organization. With the number of applications increasing due to the introduction of online job portals, it is desired to automatically match applicants with job offers. Existing approaches that match applicants with job offers take resumes as they are and do not attempt to complete the information on a resume by looking for more information on the Internet. The objective of this thesis is to fill this gap by discovering online resources pertinent to an applicant. To this end, a novel method for extraction of key information from resumes is proposed. This is a challenging task since resumes can have diverse structures and formats, and the entities present within are ambiguous. Identification of Web results using the key information and their reconciliation is another challenge. We propose an algorithm to generate queries, and rank the results to obtain the most pertinent online resources. In addition, we specifically tackle reconciliation of social network profiles through a method that is able to identify profiles of individuals across different networks. Moreover, a method to resolve ambiguity in locations, or predict it when absent, is also presented. Experiments on real data sets are conducted for all the different algorithms proposed in this thesis and they show good results.

Ph.D. dissertations & Faculty habilitations
OPTIMISATION DES SERVICES DE PROXIMITé UTILISANT LES TECHNOLOGIES EDGE EN IOT


ANALYSE DE ROBUSTESSE POUR LA CONCEPTION DE SYSTèMES TEMPS RéEL IMPLéMENTABLES


DATA INTEGRATION IN THE LIFE SCIENCES: SCIENTIFIC WORKFLOWS, PROVENANCE, AND RANKING
Biological research is a science which derives its findings from the proper analysis of experiments. Today, a large variety of experiments are carried-out in hundreds of labs around the world, and their results are reported in a myriad of different databases, web-sites, publications etc., using different formats, conventions, and schemas. Providing a uniform access to these diverse and distributed databases is the aim of data integration solutions, which have been designed and implemented within the bioinformatics community for more than 20 years. However, the perception of the problem of data integration research in the life sciences has changed: While early approaches concentrated on handling schema-dependent queries over heterogeneous and distributed databases, current research emphasizes instances rather than schemas, tries to place the human back into the loop, and intertwines data integration and data analysis. Transparency -- providing users with the illusion that they are using a centralized database and thus completely hiding the original databases -- was one of the main goals of federated databases. It is not a target anymore. Instead, users want to know exactly which data from which source was used in which way in studies (Provenance). The old model of "first integrate, then analyze" is replaced by a new, process-oriented paradigm: "integration is analysis - and analysis is integration". This paradigm change gives rise to some important research trends. First, the process of integration itself, i.e., the integration workflow, is becoming a research topic in its own. Scientific workflows actually implement the paradigm "integration is analysis". A second trend is the growing importance of sensible ranking, because data sets grow and grow and it becomes increasingly difficult for the biologist user to distinguish relevant data from large and noisy data sets. This HDR thesis outlines my contributions to the field of data integration in the life sciences. More precisely, my work takes place in the first two contexts mentioned above, namely, scientific workflows and biological data ranking. The reported results were obtained from 2005 to late 2014, first as a postdoctoral fellow at the Uniersity of Pennsylvania (Dec 2005 to Aug 2007) and then as an Associate Professor at Université Paris-Sud (LRI, UMR CNRS 8623, Bioinformactics team) and Inria (Saclay-Ile-de-France, AMIB team 2009-2014).