joined work with:
V. Benzaken1 | G. Castagna2 | L. Daynès3 | J. Lopez1 | (R. Vernoux4) |
[…] strategic breakthrough will come from redoing the representation of your data or table. This is where the heart of a program lies. Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
The Mythical Man-Month (Frederick P. Brooks, Jr., 1975/1995)
Data dominates.
Rob Pike's rule number 5
function printPeopleFromCity (db, city) {
//db holds a connection to the database
q = "SELECT firstname, lastname FROM PEOPLE P WHERE P.city = '"
+ city + "'";
result = db.query (q);
while ((row = next(result)) != null) {
println (row.get("lastname") + " " + row.get("firstname"));
}
}
// typo !
"SELECT firstname, lastname FROMPEOPLE WHERE …"
// what if city contains a quote ?
"SELECT * FROM PEOPLE P WHERE P.city = '" + city + "'"
PEOPLE → CLIENT
firstname → firstn
Errors are detected at runtime, whenever the function is called (even for statically typed languages).
db.from('PEOPLE').where(equal(city, column('city')))
.select('firstname', 'lastname');
//Java code + JPA annotation processor & runtime
@Entity class People {
@Id @GeneratedValue(strategy = GenerationType.AUTO) long cid;
@Column (length = 25, nullable = false) String firstname;
@Column (length = 25, nullable = false) String lastname;
@Column (length = 25) String city;
@ManyToMany Set<People> friends = new HashSet<People>();
…
}
//Generates a fresh ID and insert a corresponding DB line
People p1 = new People ("Nguyen", "Kim", "Saclay");
People p2 = new People ("Baelde", "David", "Cachan");
//Adds an entry in a join table !
p1.friends.add(p2);
p2.friends.add(p1);
// cb is a CriteriaBuilder object
CriteriaQuery cq = cb.createQuery(People.class);
Root p = cq.from(People.class);
//People_ is a singleton object with one field for each
//database column (with the name of that column inside)
cq.where(cb.equal(p.get(People_.town), city));
//The typesystem ensures that queries are well formed and
//well typed (city needs to be a String)
The host language integrates queries as syntactic elements.
.NET based languages support this in a statically typed setting :
//C# code out of the box!
//(if you use Microsoft SQL Server)
void printPeopleFromCity(DB db, string city) {
var q = from p in db.people
where p.city == city
select new Person(p.firstname, p.lastname);
foreach(p in q) {
Console.WriteLine(p.toString ());
}
}
function allCapitals (db) {
res = db.query ("SELECT capital FROM COUNTRY");
return res;
}
function printPeopleFromCity(db, city) { … } // unchanged
function printPeopleFromCapital(db) {
caps = allCapitals(db);
while ((row = next(caps)) != null) {
printPeopleFromCity(db, row.get("capital"));
}
}
//What you really want is SELECT firstname,lastname
// FROM COUNTRY C, PEOPLE P
// WHERE P.city = C.capital
User-Defined Functions (UDF) encapsulate application logic, but …
function isGood (p) { … } //intricate condition
function printGoodPeople (db) {
//can't call isGood() in a string !
q = "SELECT * FROM PEOPLE P ";
result = db.query (q);
while ((row = next(result)) != null) {
//filtering is done on the client !
if (isGood(row))
println (row.get("lastname") + " " + row.get("firstname"));
}
}
May introduces query avalanches silently :
IEnumerable<City> allCapitals(DB db) {
return (from c in db.country
select new City(c.capital));
}
void printPeopleFromCityList(DB db, IEnumerable<City> cities) {
var q = from p in db.people //has the syntactic
from c in cities //structure of a join
where p.city == c.name
select new Person(p.firstname, p.lastname);
foreach (p in q) {
Console.WriteLine(p.toString());
}
}
void test (db) {
printPeopleFromCityList(allCapitals(db));
}
Does not help with other kinds of UDFs :
void printGoodPeople(db) {
var q = from p in db.people
where isGood(p);
select new Person (p.firstname, p.lastname);
foreach(p in q) {
Console.WriteLine(p.toString());
}
}
Unable to cast object of type 'System.Data.Linq.SqlClient.SqlColumn'
to type 'System.Data.Linq.SqlClient.SqlMethodCall'.
Database side functions
-- in the database
CREATE OR REPLACE FUNCTION isGood(p IN PEOPLE%ROWTYPE) IS BOOLEAN
BEGIN
…
END;
//in your programming language
function printGoodPeople (db, city) {
q = "SELECT * FROM PEOPLE P WHERE isGood(P)";
result = db.query (q);
while ((row = next(result))!= null) {
println (row.get("lastname") + " " + row.get("firstname"));
}
}
A common framework to express query fragments in host languages :
Uses a common Query Intermediate Representation (QIR)
Untyped λ-calculus with data-operators and data-constructors:
q ∷= x
| λx.q
| q @ q
| c
| if q then q else q
| { x : q, … , x : q }
| q · x
| q :: q
| ldestr q q q
| Oq, …, q(q, …, q)
| ■H(t)
| DB
variable
function
application
constants
conditional
record constructor
record destructor
list constructor ([ q1, …, qn]≡q1::…::qn::[])
list destructor
data operator
foreign H-language code
data reference to a database B
Data reference D : a database-side value. LD is the query language of the database (e.g. SQL).
Data operators :
Projectq1(q2) ⇝ [ q1@v | v ∈ q2 ] (a.k.a List.map)
Filterq1(q2) ⇝ [ v | v ∈ q2, q1@v ⇝ true ] (a.k.a List.filter)
Joinq(q1, q2) ⇝ [ v1⋈v2 | v1∈q1, v2∈q2, q @ v1 @ v2 ⇝ true ]
ScanD : returns the sequence of elements in D
…
AST node (■H(t)) : is a term from a host language H (e.g. Python, Javascript, …)
(values) v ∷= c | { x : v, …, x : v } | [ v, …, v ] | λΓx.q
Given a host language H, we need :
… and of course, its abstract syntax
``JavaScripty´´ language (statements, closures, side-effects, …)
e ::= c SL-constants
| x SL-variables
| fun (x,…,x) { s } SL-functions
| e (e,…,e) SL-application
| { e : e, …, e : e } SL-records
| e.e SL-field access
| e := e SL-assignment
s ::= e SL-expression
| s;s SL-sequence
| if (e) { s } else { s } SL-conditional
| while (e) { s } SL-loop
| return e SL-return
We define a translation ⟦ s ⟧ from SL programs to QIR terms :
⟦ c ⟧
⟦ x ⟧
⟦ fun (x₁,…,xₙ) { s } ⟧
⟦ e₀ (e₁,…,eₙ) ⟧
⟦ o (e₁,…,eₙ) ⟧
⟦{ "l₁" : e₁, …, "lₙ" : eₙ }⟧
⟦e."l"⟧
⟦x = e;s⟧
⟦s₁;s₂⟧
⟦if (e) { s₁ } else { s₂ }⟧
⟦return e⟧
⟦ s ⟧
=
=
=
=
=
=
=
=
=
=
=
=
HtoQIR(c)
x
(λx₁.….λxₙ. ⟦ s ⟧)
⟦ e₀ ⟧ @ ⟦ e₁ ⟧ @ … @ ⟦ eₙ ⟧
o⟦e₀⟧,…,⟦eₖ⟧(⟦eₖ₊₁⟧,…,⟦eₙ⟧)
o ∈ { Filter, Scan, Project, … }
{ l₁ : ⟦ e₁ ⟧, …, lₙ : ⟦ eₙ ⟧ }
⟦e⟧·l
(λx⟦s⟧) @ ⟦e⟧ (x is in the local scope and …)
(λ_⟦s₂⟧) @ ⟦s₁⟧
if ⟦e⟧ then ⟦s₁⟧ else ⟦s₂⟧
⟦e⟧
freeze(s) otherwise
The semantics of H is modified as follows :
o(e₁,…,eₙ) ⇝H ⟦ o (e₁,…,eₙ) ⟧
force(q)⇝H QIRtoH(eval(λx₁.….λxₙ.q)@⟦Γ(x₁)⟧@…@⟦Γ(xₙ)⟧)
for all xi ∈ FV(q)
function printPeopleFromCity (db, city) {
q = Filter(fun (p) { return p.city == city; }, Scan(DPEOPLE));
result = force(q);
while ((row = next(result)) != null) {
println (row.lastname + " " + row.firstname);
}
}
becomes similar to:
function printPeopleFromCity (db, city) {
result = QIRtoSL(eval((λq.q)@Filterλp. p·city == city(Scan(DPEOPLE))));
while ((row = next(result)) != null) {
println (row.lastname + " " + row.firstname);
}
}
We want to translate QIR terms
Joinλa.λb a·id == b·id(Scan(DPEOPLE),Scan(DACTOR))
should become (e.g.) :
SELECT * FROM PEOPLE A, ACTOR B WHERE A.id == B.id
if both tables are in a SQL database
Joinλa.λb a·id == b·id(HBaseToQIR@(get 'PEOPLE'),
HBaseToQIR@(get 'ACTOR'))
if both tables are in a HBase database (which does not support joins)
Generic translation : q
→ t, LD where t is a
term in LD
Specific translations : (Q[t1,…,tn])⇒LDt
where Q is a QIR term context
Example of rules
Requirements for a database D
Consider the QIR term :
(λa.λb. Joinλx.λy x·id = y·id (a, b)) @ (ScanSQL.emp()) (ScanSQL.dept())
Translation output :
(λa.λb. Joinλx.λy x·id = y·id (a, b))
@ (SQLtoQIR @ (evalSQL @ (SELECT * FROM emp)))
@ (SQLtoQIR @ (evalSQL @ (SELECT * FROM dept)))
If we had performed the β-reduction :
(SQLtoQIR @ (evalSQL @ (SELECT * FROM (SELECT * FROM emp) AS x
JOIN (SELECT * FROM dept) AS y
ON x.id = y.id)))
Given a QIR term, we want to capture the following intuition
Reductions yielding ``good´´ terms depend on the target database language
A data-operator or expression o is compatible with a database D if there exists a rule for o in the ⇒LD specific translation.
A fragment F is a sub-term of a QIR term q such that
q ≡ C[T(q1,…,F[e1,…,ek],…,qn)]
(λa.λb.
Joinλx.λy x·id = y·id (Filterλx. x.age < 100(a,b)))
@ (ScanSQL.emp())
@ (ScanSQL.dept())
We define the measure M(q) on a QIR term q by :
Lot of heuristics, optimality of the strategy for some class of terms,…
The only thing you have to do is change the semantics of the host language […]
me, section 2 of this talk.
Available languages : SimpleLanguage, R, Python, Ruby, JavaScript (closed-source)
BOLDR now supports SimpleLanguage and R (alpha-version), and main-memory, SQL and HBase as back-ends
A modular framework that supports
What's next ?