Français Anglais
Accueil Annuaire Plan du site
Accueil > Evenements > Séminaires
Séminaire d'équipe(s)
Scalable algorithms for cloud-based Semantic Web data management
Stamatis Zampetakis

21 September 2015, 00h00
Salle/Bat : 435/PCRI-N
Contact :

Activités de recherche : Algorithmes pour les grands volumes de données distribuées

Résumé :
In order to build smart systems, where machines are able to reason exactly like humans, data with semantics is a major requirement. This need led to the advent of the Semantic Web, proposing standard ways for representing and querying data with semantics. RDF is the prevalent data model used to describe web resources, and SPARQL is the query language that allows expressing queries over RDF data. Being able to store and query data with semantics triggered the development of many RDF data management systems.

The rapid evolution of the Semantic Web provoked the shift from centralized data management systems to distributed ones. The first systems to appear relied on P2P and client-server architectures, while recently the focus moved to cloud computing.

Cloud computing environments have strongly impacted research and development in distributed software platforms. Cloud providers offer distributed, shared-nothing infrastructures, that may be used for data storage and processing. The main features of cloud computing involve scalability, fault-tolerance, and elastic allocation of computing and storage resources following the needs of the users.

This thesis investigates the design and implementation of scalable algorithms and systems for cloud-based Semantic Web data management. In particular, we study the performance and cost of exploiting commercial cloud infrastructures to build Semantic Web data repositories, and the optimization of SPARQL queries for massively parallel frame- works.

First, we introduce the basic concepts around Semantic Web and the main components of cloud-based systems. In addition, we provide an extended overview of existing RDF data management systems in the centralized and distributed settings, emphasizing on the critical concepts of storage, indexing, query optimization, and infrastructure.

Second, we present AMADA, an architecture for RDF data management using public cloud infrastructures. We follow the Software as a Service (SaaS) model, where the complete platform is running in the cloud and appropriate APIs are provided to the end-users for storing and retrieving RDF data. We explore various storage and querying strategies revealing pros and cons with respect to performance and also to monetary cost, which is a important new dimension to consider in public cloud services.

Finally, we present CliqueSquare, a distributed RDF data management system built on top of Hadoop, incorporating a novel optimization algorithm that is able to produce massively parallel plans for SPARQL queries. We present a family of optimization algorithms, relying on n-ary (star) equality joins to build flat plans, and compare their ability to find the flattest possibles. Inspired by existing partitioning and indexing techniques we present a generic storage strategy suitable for storing RDF data in HDFS (Hadoop’s Distributed File System). Our experimental results validate the efficiency and effectiveness of the optimization algorithm demonstrating also the overall performance of the system.

Pour en savoir plus : http://pages.saclay.inria.fr/stamatios.zampetakis/
Séminaires
A Family of Tractable Graph Distances
Gestion de données du Web
Wednesday 04 July 2018 - 10h30
Salle : 465 - PCRI-N
Stratis Ioannidis .............................................

Binary pattern of length greater than 14 are abeli
Combinatoire
Friday 29 June 2018 - 14h30
Salle : 445 - PCRI-N
Matthieu Rosenfeld .............................................

Distributionally Robust Optimization with Principa
Optimisation combinatoire et stochastique
Friday 29 June 2018 - 11h00
Salle : 455 - PCRI-N
Dr. Jianqiang Cheng .............................................

Caractérisation de réseaux égocentrés par l'énumér
Friday 15 June 2018 - 14h30
Salle : 455 - PCRI-N
Raphaël Charbey .............................................

DATA VERACITY ASSESSMENT: HOW A-PRIORI KNOWLEDGE E
Intégration de données et de connaissances
Friday 15 June 2018 - 14h00
Salle : 445 - PCRI-N
Valentina Beretta .............................................