Robust Module-based Data Management by François Goasdoué and Marie-Christine Rousset. To appear in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2012.
The current trend for building an ontology-based data management system (DMS) is to capitalize on efforts made to design a preexisting well-established DMS (a reference system). The method amounts to extracting from the reference DMS a piece of schema relevant to the new application needs -- a module --, possibly personalizing it with extra-constraints wrt the application under construction, and then managing a dataset using the resulting schema.
In this paper, we extend the existing definitions of modules and we introduce novel properties of robustness that provide means for checking easily that a robust module-based DMS evolves safely wrt both the schema and the data of the reference DMS.
We carry out our investigations in the setting of description logics which underlie modern ontology languages, like RDFS, OWL, and OWL2 from W3C. Notably, we focus on the DL-lite_A dialect of the DL-lite family, which encompasses the foundations of the QL profile of OWL2 (ie DL-lite_R): the W3C recommendation for efficiently managing large datasets.
Getting more RDF support from relational databases by François Goasdoué, Ioana Manolescu, and Alexandra Roatiş (poster paper). International World Wide Web conference (WWW), 2012.
We introduce the database fragment of RDF, which extends the popular Description Logic fragment, notably with support for incomplete information. We then provide novel sound and complete saturation- and reformulation-based techniques for answering the Basic Graph Pattern queries of SPARQL in this fragment. We extend the state of the art on pushing RDF query processing within robust / efficient relational database management systems. Finally, we experimentally compare our query answering techniques using well-established datasets.
RDF Data Management in the Amazon Cloud by Francesca Bugiotti, François Goasdoué, Zoi Kaoudi, and Ioana Manolescu. EDBT/ICDT Workshop on Data analytics in the Cloud (DanaC), 2012.
Cloud computing has been massively adopted recently in many applications for its elastic scaling and fault-tolerance. At the same time, given that the amount of available RDF data sources on the Web increases rapidly, there is a constant need for scalable RDF data management tools. In this paper we propose a novel architecture for the distributed management of RDF data, exploiting an existing commercial cloud infrastructure, namely Amazon Web Services (AWS). We study the problem of indexing RDF data stored within AWS, by using SimpleDB, a key-value store provided by AWS for small data items. The goal of the index is to efficiently identify the RDF datasets which may have answers for a given query, and route the query only to those. We devised and experimented with several indexing strategies; we discuss experimental results and avenues for future work.
Répondre aux requêtes par reformulation dans les bases de données RDF par François Goasdoué, Ioana Manolescu, and Alexandra Roatiş. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2012.
Dans RDF, répondre aux requêtes repose soit sur la saturation des données, soit sur la reformulation des requêtes. L'idée des deux techniques est de découpler la notion d'entailment - le mécanisme de raisonnement à partir duquel les réponses aux requêtes sont définies - de l'évaluation de requêtes. Dans cet article, nous étendons l'état de l'art en proposant une technique de réponse aux requêtes par reformulation pour un fragment de RDF plus significatif et un langage de requêtes plus expressif que ceux étudiés dans la littérature. Nous comparons ensuite expérimentalement cette nouvelle technique avec la technique standard fondée sur la saturation de données.
View Selection in Semantic Web Databases by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. Proceedings of the VLDB Endowment (PVLDB), vol. 5, num. 2, 2011/2012.
We consider the setting of a Semantic Web database, containing both explicit data encoded in RDF triples, and implicit data, implied by the RDF semantics. Based on a query workload, we address the problem of selecting a set of views to be materialized in the database, minimizing a combination of query processing, view storage, and view maintenance costs. Starting from an existing relational view selection method, we devise new algorithms for recommending view sets, and show that they scale significantly beyond the existing relational ones when adapted to the RDF context. To account for implicit triples in query answers, we propose a novel RDF query reformulation algorithm and an innovative way of incorporating it into view selection in order to avoid a combinatorial explosion in the complexity of the selection process. The interest of our techniques is demonstrated through a set of experiments.
Growing Triples on Trees: an XML-RDF Hybrid Model for Annotated Documents by François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. VLDB Workshop on Very Large Data Search (VLDS) et Journées de bases de données avancées (BDA), 2011.
Content on today's Web is typically document-structured and richly connected; XML is by now widely adopted to represent Web data. Moreover, the vision of a computer-understandable Web relies on Web (and real world) resources described by simple properties having names or values; URIs are the normative method of identifying resources and RDF (the Resource Description Framework) enjoys important traction as a way to encode such statements. We present XR, a carefully designed hybrid model between XML and RDF, for describing RDF-annotated XML documents. XR follows and combines the W3C's XML, URI and RDF standards by assigning URIs to all XML nodes and enabling these URIs to appear in RDF statements. The XR management platform thus provides the capabilities to create and handle interconnected XML and RDF content. We define the XR data model, its query language, and present preliminary results with a prototype implementation.
RDFViewS: A Storage Tuning Wizard for RDF Applications by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. Démonstration à Journées de bases de données avancées (BDA), 2011.
L'émergence du Web sémantique et la multiplication des applications qui y sont liées nous conduisent à rechercher les moyens d’interroger efficacement de grands volumes de données RDF. Dans cette démonstration, nous présentons RDFViewS, un système permettant de trouver automatiquement le meilleur ensemble de vues à matérialiser, pour un ensemble de requêtes SPARQL donné. La solution doit minimiser conjointement le temps d'évaluation des requêtes, le coût de maintenance des vues et l'espace qu'elles occupent. L'algorithme sur lequel repose notre système explore un espace d'états au moyen de stratégies et d’heuritiques, en quête d'une configuration optimale. Ce faisant, il tient compte d'éventuels schémas RDFS accompagnant les données pour garantir la complétude du résultat des requêtes.
RDFViewS: A Storage Tuning Wizard for RDF Applications by François Goasdoué, Konstantinos Karanasos, Julien Leblay, and Ioana Manolescu. ACM Conference on Information and Knowledge Management (CIKM), 2010.
In recent years, the significant growth of RDF data used in numerous applications has made its efficient and scalable manipulation an important issue. In this paper, we present RDFViewS, a system capable of choosing the most suitable views to materialize, in order to minimize the query response time for a specific SPARQL query workload, while taking into account the view maintenance cost and storage space constraints. Our system employs practical algorithms and heuristics to navigate through the search space of poten- tial view configurations, and exploits the possibly available semantic information - expressed via an RDF Schema - to ensure the completeness of the query evaluation.
Traitement de requêtes RDF fondé sur des vues matérialisées par François Goasdoué, Konstantinos Karanasos, Julien Leblay et Ioana Manolescu. Journées de bases de données avancées (BDA), 2010.
Modules sémantiques robustes pour une réutilisation saine en DL-lite par François Goasdoué et Marie-Christine Rousset. Journées de bases de données avancées (BDA), 2010.
L'extraction de modules à partir d’ontologies a été récemment étudié dans le cadre des logiques de description, qui sont au coeur des langages modernes d’ontologies. Dans cet article, nous définissons une nouvelle notion de modules sémantiques capturant à la fois les modules obtenus par extraction d'un sous-ensemble d'une Tbox ou par "forgetting" de concepts et de roles d’une Tbox. Nous définissons et étudions ensuite la réutilisation saine d'un module sémantique d'une Tbox globale afin de construire des Aboxes locales et de les interroger soit indépendamment, soit de facon conjointe avec la Abox globale. Afin que la Abox locale (associée au module) et que la Abox globale (associée à la Tbox initiale) puissent évoluer indépendamment, mais de manière cohérente, nous généralisons la notion d’extension conservative de requete et nous l'étendons au test de consistance. Enfin, nous fournissons des algorithmes et des résultats de complexité pour le calcul de modules sémantiques minimaux et robustes dans DL-liteF et DL-liteR. Ces dialectes sont membres de la famille DL-lite qui a été spécialement définie pour l’interrogation efficace de grandes masses de données.
Gestion décentralisée de données en DL-LITE par Nada Abdallah, François Goasdoué et Marie-Christine Rousset. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2010.
Cet article propose un modèle décentralisé de données et les algorithmes associés afin de mettre en oeuvre des systèmes pair-à-pair de gestion de données (PDMS) fondés sur la logique de description DL-liteR. Cette logique est un fragment - ayant de bonnes propriétés théoriques et pratiques - de la prochaine recommandation du W3C pour le Web Sémantique : OWL2. Notre approche consiste à réduire la reformulation de requêtes et le test de consistance des données par rapport à une ontologie à un raisonnement en logique propositionnelle. Ceci permet de déployer de manière simple des PDMS pour DL-liteR "au-dessus" de SomeWhere, un système pair-à-pair d'inférence pour la logique propositionnelle passant à l'échelle du millier de pairs. Nous montrons aussi comment répondre à des requêtes à l'aide de vues - des requêtes prédéfinies - dans DL-liteR pour les cas centralisés et décentralisés, en combinant l'algorithme de reformulation de DL-liteR et l'algorithme MiniCon.
Non-conservative Extension of a Peer in a P2P Inference System by Nada Abdallah and François Goasdoué. AI Communications: The European Journal on Artificial Intelligence, volume 22, number 4, pages 211-233, 2009.
This paper points out that the notion of non-conservative extension of a knowledge base (KB) is important to the distributed logical setting of peer-to-peer inference systems (P2PIS), a.k.a. peer-to-peer semantic systems. It is useful to a peer in order to detect/prevent that a P2PIS corrupts (part of) its knowledge or to learn more about its own application domain from the P2PIS. That notion is all the more important since it has connections with the privacy of a peer within a P2PIS and with the quality of service provided by a P2PIS. We therefore study the following tightly related problems from both the theoretical and decentralized algorithmic perspectives: (i) deciding whether a P2PIS is a conservative extension of a given peer and (ii) computing the witnesses to the corruption of a given peer's KB within a P2PIS so that we can forbid it. We consider here scalable P2PISs that have already proved useful to Artificial Intelligence and DataBases.
DL-liteR in the Light of Propositional Logic for Decentralized Data Management by Nada Abdallah, François Goasdoué, and Marie-Christine Rousset. International Joint Conference on Artificial Intelligence (IJCAI), pages 2010-2015, 2009.
This paper provides a decentralized data model and associated algorithms for peer data management systems (PDMS) based on the DL-liteR description logic. Our approach relies on reducing query reformulation and consistency checking for DL-liteR into reasoning in propositional logic. This enables a straightforward deployment of DL-liteR PDMSs on top of SomeWhere, a scalable propositional peer-to-peer inference system. We also show how to use the state-of-the-art Minicon algorithm for rewriting queries using views in DL-liteR in the centralized and decentralized cases.
Numéro spécial Web sémantique de la revue Technique et Science Informatiques (TSI), édité par François Goasdoué et Alain Léger aux éditions Hermès-Lavoisier, volume 28, février 2009.
Semantic Web Take-Off in a European Industry Perspective by Alain Léger, Johannes Heinecke, Lyndon J.B. Nixon, Pavel Shvaiko, Jean Charlet, Paola Hobson, and François Goasdoué. Book chapter in Semantic Web for Business: Cases and Applications, Idea Group Inc., pages 1-29, 2008.
Semantic Web technology is being increasingly applied in a large spectrum of applications in which domain knowledge is conceptualized and formalized (e.g., by means of an ontology) in order to support diversified and automated knowledge processing (e.g., reasoning) performed by a machine. Moreover, through an optimal combination of (cognitive) human reasoning and (automated) machine processing (mimicking reasoning); it becomes possible for humans and machines to share more and more complementary tasks. The spectrum of applications is extremely large and to name a few: corporate portals and knowledge management, e-commerce, e-work, e-business, healthcare, e-government, natural language understanding and automated translation, information search, data and services integration, social networks and collaborative filtering, knowledge mining, business intelligence and so on. From a social and economic perspective, this emerging technology should contribute to growth in economic wealth, but it must also show clear cut value for everyday activities through technological transparency and efficiency. The penetration of Semantic Web technology in industry and in services is progressing slowly but accelerating as new success stories are reported. In this chapter we present ongoing work in the cross-fertilization between industry and academia. In particular, we present a collection of application fields and use cases from enterprises which are interested in the promises of Semantic Web technology.
WebContent: Efficient P2P Warehousing of Web Data by Serge Abiteboul, Tristan Allard, Philippe Chatalic, Georges Gardarin, Anca Ghitescu, François Goasdoué, Ioana Manolescu, Benjamin Nguyen, Mohamed Ouazara, Aditya Somani, Nicolas Travers, Gabriel Vasile, and Spyros Zoupanos. Demonstration paper in Very Large Data Bases (VLDB), pages 1428-1431, 2008.
We present the WebContent platform for managing distributed repositories of XML and semantic Web data. The platform allows integrating various data processing building blocks (crawling, translation, semantic annotation, full-text search, structured XML querying, and semantic querying), presented as Web services, into a large-scale efficient platform. Calls to various services are combined inside ActiveXML documents, which are XML documents including service calls. An ActiveXML optimizer is used to: (i) efficiently distribute computations among sites; (ii) perform XQuery-specific optimizations by leveraging an algebraic XQuery optimizer; and (iii) given an XML query, chose among several distributed indices the most appropriate in order to answer the query.
Calcul de conséquences dans un système d'inférence pair-à-pair propositionnel (revisité) par Nada Abdallah et François Goasdoué. Reconnaissance des Formes et Intelligence Artificielle (RFIA), 2008.
Dans cet article, nous étudions le calcul de conséquences dans les systèmes d'inférence pair-à-pair (P2PIS) propositionnels avec des mappings orientés. Dans ces systèmes, un mapping allant d'un pair vers un autre spécifie un ensemble de connaissances que le premier pair doit observer, ainsi que les connaissances qu'il doit notifier au second pair si les connaissances observées sont satisfaites. Ces nouveaux P2PIS pouvant modéliser de nombreuses applications réelles, il est important de les doter d'inférences clés de l'IA, en l'occurrence du calcul de conséquences. Nos contributions sont doubles. Nous définissons tout d'abord le premier cadre logique pour représenter des P2PIS propositionnels avec des mappings orientés. Nous étudions ensuite le calcul de conséquences dans ce nouveau cadre. En particulier, nous proposons un algorithme totalement décentralisé pour ce problème.
Systèmes pair-à-pair sémantiques et extension (non) conservative d'une base de connaissances par Nada Abdallah et François Goasdoué. Journées de bases de données avancées (BDA), 2008.
Cet article montre en quoi la notion d'extension non conservative d'une base de connaissances (KB) est importante dans les systèmes d'inférence pair-à-pair (P2PIS), aussi connus sous le nom de systèmes pair-à-pair sémantiques. Cette notion est utile à un pair afin de détecter si (une partie de) sa KB est corrompue par un P2PIS ou pour apprendre du P2PIS de nouvelles connaissances sur son propre domaine d'application. Cette notion est d'autant plus importante qu'elle a des liens étroits avec la confidentialité des connaissances d'un pair au sein d'un P2PIS et avec la qualité de service fournie par un P2PIS. Nous étudions ici, du point de vue théorique et de l'algorithmique décentralisée, les deux problèmes suivants : (i) décider si un P2PIS est une extension conservative d'un pair donné et (ii) calculer les témoins d'une possible corruption de la KB d'un pair donné par un P2PIS, de sorte à pouvoir l'empêcher. Nous considérons des P2PIS passant à l'échelle d'un millier de pairs et dont l'utilité a déjà été démontrée en Intelligence Artificielle et en Bases de Données.
Calcul de conséquences pour le test d'extension conservative dans un système pair-à-pair par Nada Abdallah et François Goasdoué. Journées Francophones de Programmation par Contraintes (JFPC), 2008.
Dans un système d'inférence pair-à-pair (P2PIS), un pair étend sa base de connaissances (KB) avec celles des autres pairs afin d'utiliser leurs connaissances pour répondre aux requêtes qui lui sont posées. Toutefois, l'extension d'une KB n'est pas nécessairement conservative. Une extension conservative garantit que le sens d'une KB est le même lorsqu'elle est considérée seule ou avec son extension. En revanche, une extension non conservative peut changer radicalement le sens d'une KB au sein de la théorie résultante. Il est par conséquent crucial pour un pair de savoir si un P2PIS est une extension conservative de sa KB.
SomeRDFS in the Semantic Web by Philippe Adjiman, François Goasdoué, and Marie-Christine Rousset. Journal on Data Semantics, No 8, pages 158-181, Springer Journal (LNCS 4380), 2007.
The Semantic Web envisions a world-wide distributed architecture where computational resources will easily inter-operate to coordinate complex tasks such as query answering. Semantic marking up of web resources using ontologies is expected to provide the necessary glue for making this vision work. Using ontology languages, (communities of) users will build their own ontologies in order to describe their own data. Adding semantic mappings between those ontologies, in order to semantically relate the data to share, gives rise to the Semantic Web: data on the web that are annotated by ontologies networked together by mappings. In this vision, the Semantic Web is a huge semantic peer data management system. In this paper, we describe the SomeRDFS peer data management systems that promote a "simple is beautiful" vision of the Semantic Web based on data annotated by RDFS ontologies.
Distributed Reasoning in a Peer-to-Peer Setting: Application to the Semantic Web by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. Journal of Artificial Intelligence Research, Vol. 25, pages 269-314, 2006.
In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The main contribution of this paper is to provide the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. Another important contribution is to apply this general distributed reasoning setting to the setting of the Semantic Web through the somewhere semantic peer-to-peer data management system. The last contribution of this paper is to provide an experimental analysis of the scalability of the peer-to-peer infrastructure that we propose, on large networks of 1000 peers.
SomeWhere: A Scalable Peer-to-Peer Infrastructure for Querying Distributed Ontologies by Marie-Christine Rousset, Philippe Adjiman, Philippe Chatalic, François Goasdoué, Laurent Simon. Invited talk paper (talk given by Marie-Christine Rousset) in OTM Conferences 2006, Lecture Notes in Computer Science, volume 4275, pages 698-703, 2006.
In this invited talk, we present the SomeWhere approach and infrastructure for building semantic peer-to-peer data management systems based on simple personalized ontologies distributed at a large scale. Somewhere is based on a simple class-based data model in which the data is a set of resource identifiers (e.g., URIs), the schemas are (simple) definitions of classes possibly constrained by inclusion, disjunction or equivalence statements, and mappings are inclusion, disjunction or equivalence statements between classes of different peer ontologies. In this setting, query answering over peers can be done by distributed query rewriting, which can be equivalently reduced to distributed consequence finding in propositional logic. It is done by using the message-passing distributed algorithm that we have implemented for consequence finding of a clause w.r.t a set of distributed propositional theories. We summarize its main properties (soundness, completeness and termination), and we report experiments showing that it already scales up to a thousand of peers. Finally, we mention ongoing work on extending the current data model to RDF(S) and on handling possible inconsistencies between the ontologies of different peers.
The Semantic Web from an Industrial Perspective by Alain Léger, Johannes Heinecke, Lyndon J.B. Nixon, Pavel Shvaiko, Jean Charlet, Paola Hobson, and François Goasdoué. Tutorial paper for Reasoning Web Summer School, LNCS 4126, pages 232-268, Springer-Verlag, 2006.
Semantic Web technology is being increasingly applied in a large spectrum of applications in which domain knowledge is conceptualized and formalized (e.g., by means of an ontology) in order to support diversified and automated knowledge processing (e.g., reasoning) performed by a machine. Moreover, through an optimal combination of (cognitive) human reasoning and (automated) machine reasoning and processing, it is possible for humans and machines to share complementary tasks. The spectrum of applications is extremely large and to name a few: corporate portals and knowledge management, e-commerce, e-work, e-business, healthcare, e-government, natural language understanding and automated translation, information search, data and services integration, social networks and collaborative filtering, knowledge mining, business intelligence and so on. From a social and economic perspective, this emerging technology should contribute to growth in economic wealth, but it must also show clear cut value for everyday activities through technological transparency and efficiency. The penetration of Semantic Web technology in industry and in services is progressing slowly but accelerating as new success stories are reported. In this paper and lecture we present ongoing work in the cross-fertilization between industry and academia. In particular, we present a collection of application fields and use cases from enterprises which are interested in the promises of Semantic Web technology. The use cases are detailed and focused on the key knowledge processing components that will unlock the deployment of the technology in the selected application field. The paper ends with the presentation of the current technology roadmap designed by a team of Academic and Industry researchers.
SomeWhere in the Semantic Web by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. International Workshop on Principles and Practice of Semantic Web Reasoning, Lecture Notes in Computer Science, volume 3703, pages 1-16, 2005. Also in SOFSEM'06: Theory and Practice of Computer Science (a.k.a. International Conference on Current Trends in Theory and Practice of Computer Science), as an invited talk paper (talk given by Marie-Christine Rousset).
In this paper, we describe the SomeWhere semantic peer-to-peer data management system that promotes a "small is beautiful" vision of the Semantic Web based on simple personalized ontologies (e.g., taxonomies of classes) but which are distributed at a large scale. In this vision of the Semantic Web, no user imposes to others his own ontology. Logical mappings between ontologies make possible the creation of a web of people in which personalized semantic marking up of data cohabits nicely with a collaborative exchange of data. In this view, the Web is a huge peer-to-peer data management system based on simple distributed ontologies and mappings.
Scalability Study of Peer-to-Peer Consequence Finding by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset and Laurent Simon. International Joint Conference on Artificial Intelligence (IJCAI), 2005.
In peer-to-peer inference systems, each peer can reason locally but also solicit some of its acquaintances, sharing part of its vocabulary. This paper studies both theoretically and experimentally the problem of computing proper prime implicates for propositional peer-to-peer systems, the global theory (union of all peer theories) of which is not known (as opposed to partition-based reasoning).
Answering Queries using Views: a KRDB Perspective for the Semantic Web by François Goasdoué and Marie-Christine Rousset. ACM Journal - Transactions on Internet Technology (TOIT), Volume 4, Issue 3, pages 255-288, 2004.
In this paper, we investigate a first step towards the long-term vision of the Semantic Web by studying the problem of answering queries posed through a mediated ontology to multiple information sources whose content is described as views over the ontology relations. The contributions of this paper are twofold. We first offer a uniform logical setting which allows us to encompass and to relate the existing work on answering and rewriting queries using views. In particular, we make clearer the connection between the problem of rewriting queries using views and the problem of answering queries using extensions of views. Then we focus on an instance of the problem of rewriting conjunctive queries using views through an ontology expressed in a description logic, for which we exhibit a complete algorithm.
Distributed Reasoning in a Peer-to-peer Setting by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset et Laurent Simon. European Conference on Artificial Intelligence (ECAI'04), pages 945-946 (accepted as a short paper), 2004.
In a peer-to-peer inference system, each peer can reason locally but can also solicit some of its acquaintances, which are peers sharing part of its vocabulary. In this paper, we consider peer-to-peer inference systems in which the local theory of each peer is a set of propositional clauses defined upon a local vocabulary. An important characteristic of peer-to-peer inference systems is that the global theory (the union of all peer theories) is not known (as opposed to partition-based reasoning systems). The contribution of this paper is twofold. We provide the first consequence finding algorithm in a peer-to-peer setting: it is anytime and computes consequences gradually from the solicited peer to peers that are more and more distant. We exhibit a sufficient condition on the acquaintance graph of the peer-to-peer inference system for guaranteeing the completeness of this algorithm. We also present first experimental results that are promising.
Raisonnement distribué dans un environnement de type pair-à-pair by Philippe Adjiman, Philippe Chatalic, François Goasdoué, Marie-Christine Rousset et Laurent Simon. Actes des dixièmes Journées Nationales sur la résolution Pratique de Problèmes NP-Complets (JNPC'04), pages 11-22, 2004.
Dans un système d'inférence pair-à-pair, chaque pair peut raisonner localement mais peut également solliciter son voisinage constitué des pairs avec lesquels il partage une partie de son vocabulaire. Dans cet article, on s'intéresse aux systèmes d'inférence pair-à-pair dans lesquels la théorie de chaque pair est un ensemble de clauses propositionnelles construites à partir d'un vocabulaire local. Une caractéristique importante des systèmes pair-à-pair est que la théorie globale (l'union des théories de tous les pairs) n'est pas connue (par opposition aux systèmes de raisonnement fondés sur le partitionnement). La contribution de cet article est double. Nous exposons le premier algorithme de calcul d'impliqués dans un environnement pair-à-pair : il est anytime et calcule les impliqués progressivement, depuis le pair interrogé jusqu'aux pairs de plus en plus distants. Nous énonçons une condition suffisante sur le graphe de voisinage du système d'inférence pair-à-pair, garantissant la complétude de notre algorithme. Nous présentons également quelques résultats expérimentaux prometteurs.
Intégration d'Information par Médiation by François Goasdoué et Marie-Christine Rousset. Plein Sud Spécial Recherche, Université Paris-Sud XI, 2004.
L'intégration d'information est une discipline récente et incontournable de l'informatique. Elle a pour but de faciliter aux utilisateurs de moyens informatiques l'accès aux informations disséminées sur les réseaux (Internet, intranets,...). Les applications d'intégration d'informations les plus connues du grand public sont certainement les moteurs de recherche (Google, Voila, Yahoo,...). Une méthode d'intégration nommée médiation permet d'aller au-delà des services rendus par ces moteurs en concevant par exemple des portails de commerce électronique capables d'intégrer les données de plusieurs fournisseurs de contenu (Kelkoo,...). C'est cette méthode que nous abordons ici en présentant les travaux menés sur ce sujet dans l'équipe Intelligence Artificielle et Systèmes d'Inférence du Laboratoire de Recherche en Informatique de l'Université Paris Sud XI.
Querying Distributed Data through Distributed Ontologies: a Simple but Scalable Approach by François Goasdoué and Marie-Christine Rousset. IEEE Intelligent Systems, Volume 18, Issue 5, pages 60-65, 2003. Also in the international workshop Information Integration on the Web (IIWeb'03) of the Internation Joint Conference on Artificial Intelligence (IJCAI'03).
In this paper, we define a simple but scalable framework for peer-to-peer data sharing systems, in which the problem of answering queries over a network of semantically related peers is always decidable. Our approach is characterized by a simple class-based language for defining peer schemas as hierarchies of atomic classes, and mappings as inclusions of logical combinations of atomic classes. We provide an anytime and incremental method for computing all the certain answers to a query posed to a given peer such that the answers are ordered from the ones involving peers close to the queried peer to the ones involving more distant peers.
Construction de Médiateurs pour Intégrer des Sources d'Information Multiples et Hétérogènes : le Projet PICSEL by Alain Bidault, Christine Froidevaux, Hélène Gagliardi, François Goasdoué, Chantal Reynaud, Marie-Christine Rousset et Brigitte Safar. Journal I3 : Information - Interaction - Intelligence, Volume 2, Numéro 1, pages 9-58, 2002.
Le nombre croissant de données accessibles via des
réseaux (intranet, internet, etc.) pose le
problème de l'intégration de sources
d'information préexistantes, souvent distantes et
hétérogènes, afin de faciliter leur
interrogation par un large public. Une des premières
approches proposées en intégration d'informations
pr&ocric;ne la construction de médiateurs.
Un médiateur joue un rôle d'interface de
requêtes entre un utilisateur et des sources de
données. Il donne à l'utilisateur l'illusion
d'interroger un système homogène et
centralisé en lui évitant d'avoir à
trouver les sources de données pertinentes pour sa
requête, de les interroger une à une, et de
combiner lui-même les informations obtenues.
L'objectif de cet article est de présenter le projet PICSEL
qui offre un environnement déclaratif de construction de
médiateurs.
PICSEL se distingue des systèmes d'intégration
d'informations existants par la possibilité d'exprimer le
schéma du médiateur dans un langage CARIN
combinant le pouvoir d'expression d'un formalisme à base de
règles et d'un formalisme à base de classes (la
logique de description ALN). PICSEL intègre un module
d'affinement de requêtes, première brique d'un
module de dialogue coopératif entre un médiateur
et ses utilisateurs.
Compilation and Approximation of Conjunctive Queries by Concept Descriptions by François Goasdoué and Marie-Christine Rousset. European Conference on Artificial Intelligence (ECAI'02), pages 267-271, 2002. Also in the international workshops Description Logics (DL'02) and Knowledge Representation meets DataBases (KRDB'02).
In this paper, we characterize the logical correspondence between conjunctive queries and concept descriptions. We exhibit a necessary and sufficient condition for the compilation of a conjunctive query into an equivalent ALE concept description. We provide a necessary and sufficient condition for the approximation of conjunctive queries by maximally subsumed ALN concept descriptions.
Réécriture de Requêtes en termes de Vues dans CARIN et Intégration d'Informations by François Goasdoué. Thèse de doctorat, Université Paris-Sud XI, 2001.
The Use of CARIN Language and Algorithms for Information Integration: The PICSEL Project by François Goasdoué, Véronique Lattes, and Marie-Christine Rousset. International Journal of Cooperative Information Systems (IJCIS), World Scientific Publishing Company, Volume 9, Number 4, pages 383-401, 2000.
PICSEL is an information integration system over sources that are distributed and possibly heterogeneous. The approach which has been chosen in PICSEL is to define an information server as a knowledge-based mediator in which CARIN is used as the core logical formalism to represent both the domain of application and the contents of information sources relevant to that domain. In this paper, we describe the way the expressive power of the CARIN language is exploited in the PICSEL information integration system, while maintaining the decidability of query answering. We illustrate it on examples coming from the tourism domain, which is the first real case that we have to consider in PICSEL, in collaboration with the travel agency Degriftour.
Rewriting Conjunctive Queries using Views in Description Logics with Existential Restrictions by François Goasdoué and Marie-Christine Rousset. Description Logics (DL'00), pages 113-122, 2000.
In
database, rewriting queries using views has received significant
attention because of its relevance to several fields such as query
optimization, data warehousing, and information integration. In those
settings, data used to answer a query are restricted to be extensions
of a set of predefined queries (views).
The information integration context is typical of the need of rewriting
queries using views for answering queries. Users of information
integration systems do not pose queries directly to the (possibly
remote) sources in which data are stored but to a set of virtual
relations that have been designed to provide a uniform and homogeneous
access to a domain of interest.
When the contents of the sources are described as views over the
(virtual) domain relations, the problem of reformulating a user query
into a query that refers directly to the relevant sources becomes a
problem of rewriting queries using views.
In this paper, we study the problem of rewriting conjunctive queries
over DL expressions into conjunctive queries using a set of views that
are a set of distinguished DL expressions, for three DLs allowing
existential restrictions: FLE, ALE and ALEN. For FLE, we present an
algorithm that computes a representative set of all the rewritings of a
query. In the full version of this paper (cf. Technical report), we
show how to adapt it to deal with the negation of atomic concepts in
the queries and in the views, in order to obtain a rewriting algorithm
for ALE. Finally, we show that obtaining a finite set representative of
all the rewritings of a query is not guaranteed in ALEN.
A Knowledge Based Approach for Information Integration: The PICSEL System by François Goasdoué. Declarative Data Access on the Web, Dagstuhl-Seminar-Report 251, page 7, 1999.
Nowadays,
a large amount of data is reachable on the web. Data are stored in
information sources that can be heterogeneous and distributed.
Information integration provides many interesting approaches like
mediation to allow users to access these data. Mediation aims at
building a mediator which acts as an interface between users and
information sources, giving users the illusion of querying a
homogeneous and centralized system. To do this, a mediator provides a
unique query language to users and a vocabulary from a semantic
description (ontology) of a particular application domain. Those ones
are used to formulate queries.
Here, we present our knowledge based mediator: the PICSEL system. Its
main characteristic is an integration of information sources fully
driven by the semantic description of an application domain, and by a
semantic description of integrated sources consisting in (i) one-to-one
mappings between sources and domain relations (semantic views), (ii)
semantic constraints over thoses views.
Moreover, since XML emerges as a new standard for web documents, we
show how easy it is for us to perform the integration of such documents
in PICSEL. First, we show how we can capture the semantic of an XML
document schema (DTD) thanks to the vocabulary of the application
domain semantic description. Then, we present a generic way to connect
a mediator to an XML repository. In PICSEL, it consists in building a
generic wrapper which translates a PICSEL query into an X-OQL query
(X-OQL is an XML query language). This generic wrapper is not a
traditional one i.e., a fixed set of predefined queries. Our generic
wrapper looks like a black box which dynamicaly generates for any
PICSEL query, the right X-OQL query.
Modeling Information Sources for Information Integration by François Goasdoué and Chantal Reynaud. International Conference on Knowledge Engineering and Knowledge Management (EKAW'99), p. 121-138, Lecture Notes in AI 1621, Springer-Verlag, 1999.
The aim of this paper is to present an approach and automated tools for designing knowledge bases describing the contents of information sources in PICSEL knowledge-based mediators. We address two problems: the abstraction problem and the representation problem, when information sources are relational databases. In a first part, we present an architectural overview of the PICSEL mediator showing its main knowledge components. Then, we describe our approach and tools that we have implemented (1) to identify, by means of an abstraction process, the main relevant concepts, called semantic concepts, in an Entity Relationship model and (2) to help representing these concepts using CARIN, a logical language combining description logics and Datalog Rules, and using specific terms in the application domain model.
Aide à la Conception de Bases de Connaissances pour le Médiateur PICSEL by François Goasdoué. Rapport de DEA, Université Paris-Sud XI, 1998.