Ph.D

Group : Verification of Algorithms, Languages and Systems

*A Coq Formalization of Relational and Deductive Databases - and Mechanizations of Datalog*
Starts on 01/10/2012

Advisor : BENZAKEN, Véronique

[CONTEJEAN Evelyne]

**Funding :** Contrat doctoral uniquement recherche

**Affiliation :** Université Paris-Sud

**Laboratory :** LRI-Proval

**Defended** on 02/12/2016, committee :

Directrices de thèse :

- Mme. Véronique Benzaken, Professeur, Université Paris-Sud, LRI

- Mme. Évelyne Contejean, Chargée de Recherche HDR, CNRS, LRI

Rapporteurs :

- Mme. Sandrine Blazy, Professeur, Université de Rennes 1, INRIA Rennes - Bretagne Atlantique

- M. Mohand-Saïd Hacid, Professeur, Université Claude Bernard Lyon 1, LIRIS

Examinateurs :

- Mme. Sarah Cohen-Boulakia, Maître de Conférences HDR, Université Paris-Sud, LRI

- M. Michaël Rusinowitch, Directeur de Recherche, Université Henri Poincaré Nancy 1, LORIA-INRIA Lorraine

**Research activities :**
- Formalisation of (Specification and Programming) Languages in Proof Assistants

- Data-Centric Languages and Systems

**Abstract :**
This thesis presents a Coq formalization of fundamental database languages and algorithms. It provides formal specifications stemming from two different approaches in defining database models: relational and logic based.

As such, the first contribution of the thesis is the development of a Coq library for the relational model. This contains formalizations of the relational algebra and of conjunctive queries. It also includes a mechanization of integrity constraints and of their inference procedures. We model two types of dependencies, which are among the most widely used: the functional dependencies and the multivalued dependencies, as well as their corresponding axiomatizations. We formally prove the soundness of their inference algorithms and, for the case of functional dependencies, also the completeness. These types of dependencies are instances of more general ones, namely general dependencies: equality generating dependencies and, respectively, tuple generating dependencies. We model general dependencies and their inference procedure, i.e, "the chase", for which we establish the soundness. Finally, we formally prove the fundamental database theorems, i.e, algebraic equivalence, the homomorphism theorem and the minimization of conjunctive queries.

A second contribution consists of the development of a Coq/ssreflect library for logic programming, restricted to Datalog. As part of this work, we provide a first mechanization of a standard Datalog engine, as well as of its extension with negation. The library contains a formalization of their model-theoretic semantics, together with the fixpoint one, implemented through a stratified evaluation procedure. The library is complete with the corresponding soundness, termination and completeness proofs. In this setting, we construct a preliminary framework for reasoning about stratified programs. The platform paves the way towards the certification of data-centric applications.