Stages de Master Recherche d'Informatique: Data Centric Programming Languages and Systems Mechanization
General Context

Current data management applications and systems involve increasingly massive data volumes. Surprisingly, while the amount of data stored and managed by data engines has drastically increased, little attention has been devoted to ensure that those are reliable. Obtaining strong guarantees requires the use of formal methods and mature tools. A very promising approach consists in using proof assistants such as Coq. Our purpose is to prove that existing systems conform to their specifications and to verify that programs that make intensive use of database queries are correct. We propose different internships on that subject which all lead to a PhD if the candidate demonstrates the expected skills.

SQL certification

Based on the specification given in [3], the internship consist in providing an SQL certified compiler. SQL compilers not only transform queries into relational algebra (as far as possible) yielding an AST whose nodes are labelled by relational operators and leaves are base relations, but, they also choose de "best" access method to evaluate the query. To do so they rely on the fact that different algorithms for joins or selections do exist (sort-merge joins, hash-based, nested loops) and on system based issues for example indexes. Doing so they generate what is called query evaluation plans and choose, according to a cost model, the most efficient plan. We plan to verify those algorithms using our formalization and the Why3 deductive program verification tool.

NoSQL certification

The aim of this internship is to give a first step toward formalizing and/or mechanizing data intensive systems such as XML/XQuery or JSON/NoSql engines using formal tools such as Coq or SMT provers (e.g., Alt-ergo) embedded in the Why(3). First the candidate will have to extend the formalization in [3] in order to capture data models such as JSon, XML and then to take into consideration languages such as Jaql, Pig, mongoDB and XQuery, CDuce for XML.

Concurrency certification

Based on the specification given in [4], the candidate will formalize in Coq the concurrency model and verify the corresponding algorithms and/protocols.

Data Privacy and Security certification

Database management systems guarantee security and privacy through the view mechanism. The aim of this internship is to formalize and verify the different security policies provided by mainstream database systems such as Oracle, DB2 or Postgresql starting with the modeling given in [3].

Bibliography
  1. Xavier Leroy. A formally verified compiler back-end. J. Autom. Reasoning, 43(4):363--446, 2009.
  2. Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. Toward a verified relational database management system. In ACM International Conference on Principles of Programming Languages (POPL), 2010.
  3. Veronique Benzaken, Evelyne Contejean, Stefania Dumbrava, A Coq Formalization of the Relational Data Model, European Symposium on Programming - ETAPS, (ESOP), 2014.
  4. Ph. Bernstein, V. Hadzilacos, N. Goodman, Concurrency Control and Recovery in Database Systems available here
Prerequisite

The candidate should have a very strong background in theoretical computer science with emphasis in logic as well as a concrete practical experience of functional programming in OCaml-like programming languages. A good knowledge of Coq will be a plus.