Français Anglais
Accueil Annuaire Plan du site
Accueil > Evenements > Séminaires
Séminaire d'équipe(s) LaHDAK
Dynamically Optimizing Queries over Large Scale Data Platforms
Konstantinos Karanassos

26 November 2014, 11h00 - 26 November 2014, 13h00
Salle/Bat : 445/PCRI-N
Contact :

Activités de recherche : Algorithmes pour les grands volumes de données distribuées

Résumé :
Enterprises are adapting large-scale data processing platforms, such as Hadoop, to gain actionable insights from their "big data". Query optimization is still an open challenge in this environment due to the volume and heterogeneity of data, comprising both structured and un/semi-structured datasets. Moreover, it has become common practice to push business logic close to the data via user-defined functions (UDFs), which are usually opaque to the optimizer, further complicating cost-based optimization. As a result, classical relational query optimization techniques do not fit well in this setting, while at the same time, suboptimal query plans can be disastrous with large datasets. In this talk, I will present new techniques that take into account UDFs and correlations between relations for optimizing queries running on large scale clusters. We introduce "pilot runs", which execute part of the query over a sample of the data to estimate selectivities, and employ a cost-based optimizer that uses these selectivities to choose an initial query plan. Then, we follow a dynamic optimization approach, in which plans evolve as parts of the queries get executed. Our experimental results show that our techniques produce plans that are at least as good as, and up to 2x (4x) better for Jaql (Hive) than, the best hand-written left-deep query plans.

This work was done while I was with IBM Research at Almaden and appeared in SIGMOD 2014. It is a joint work with Andrey Balmin, Marcel Kutsch, Fatma Ozcan, Vuk Ercegovac, Chunyang Xia, and Jesse Jackson.

Pour en savoir plus :
Séminaires
A Family of Tractable Graph Distances
Gestion de données du Web
Wednesday 04 July 2018 - 10h30
Salle : 465 - PCRI-N
Stratis Ioannidis .............................................

Binary pattern of length greater than 14 are abeli
Combinatoire
Friday 29 June 2018 - 14h30
Salle : 445 - PCRI-N
Matthieu Rosenfeld .............................................

Distributionally Robust Optimization with Principa
Optimisation combinatoire et stochastique
Friday 29 June 2018 - 11h00
Salle : 455 - PCRI-N
Dr. Jianqiang Cheng .............................................

Caractérisation de réseaux égocentrés par l'énumér
Friday 15 June 2018 - 14h30
Salle : 455 - PCRI-N
Raphaël Charbey .............................................

DATA VERACITY ASSESSMENT: HOW A-PRIORI KNOWLEDGE E
Intégration de données et de connaissances
Friday 15 June 2018 - 14h00
Salle : 445 - PCRI-N
Valentina Beretta .............................................