Ph.D de

Group : Artificial Intelligence and Inference Systems

Recherche ciblée de documents sur le Web

Starts on 01/10/2001
Advisor : ROUSSET, Marie-Christine

Funding : A
Affiliation : Université Paris-Saclay
Laboratory : Orsay

Defended on 08/06/2005, committee :
MOHAND Hacid-Saïd
ROUSSET Marie-Christine

Research activities :
   - Artificial Intelligence
   - Semantic Web

Abstract :
Our work combines a search engine such as Google or a Web crawler (such as that of Xyleme) with a filtering tool that can distinguish, among the possible thousands of web pages returned by Google or the Xyleme Crawler, those that really contain useful data for the datawarehouse. In the first experiments, it was shown that guiding the search through the web by keywords extracted from the domain ontology was not precise enough to guarantee that the returned Web pages were relevant to the topic of the warehouse. Our approach for designing a filtering tool is generic and declarative. We have defined and implemented a query language, called WebQueL, which anables the combination of different criteria for specifying the web pages of interest. Those criteria allow for combining content and structure of searched documents.