$\newcommand{\E}{\mathbb{E}}$ $\newcommand{\R}{\mathbb{R}}$

Deep Learning in Practice

Chapter 3: Interpretability

NB: turn on javascript to get beautiful mathematical formulas thanks to MathJax
NB2: an old raw text file is also available if you wish for a more compact summary

$\newcommand{\epsi}{\varepsilon}$

Overview:

I - Visualization / Analysis (of a neural network trained)

Visualization
Analysis, invariances
Adversarial attacks

II - Interpretability: societal impact and approaches

Why interpretability is important: what is at stake
Interpretability by design: "Explainable AI"
Interpretability of data: causality

III - Issues related to datasets

Dataset poisoning
Fairness
Differential privacy

I - Visualization / Analysis (of a neural network trained)

Visualization

Analysis, invariances

Adversarial attacks

→ Edouard's slides
→ Pierre Stock's presentation slides

NB: in the following, most is not specific to deep learning, but applicable to ML in general

II - Interpretability: societal impact and approaches

Why interpretability is important: what is at stake

Example: medical diagnosis

need for explanation of the final score (help to diagnosis): why should we trust the prediction? Can we see on which elements the decision was based?
Isabelle Guyon's skin disease classification tool [patent (pdf), (html)]
- hand-crafted features are designed, and labeled by their type (are they based on: color? texture? shape? etc.);
- at test time, for each diagnosis, the importance of each feature is estimated (a bit like Grad-CAM's spirit); then a score for each type of feature is given (e.g., this decision was based for 40% on color-type features, for 25% on shape-type features, etc.)

Societal impact: "Weapons of Maths Destruction" by Cathy O'Neil

companies using black-box software (provided by other companies) for important matters (for the life of people involved), such as hiring (example in the book: waiter position), firing (example in a big school), loans (whom should the bank lend money to?)
- examples of widespread algorithms behaving arbitrarily, sometimes even stochastically
- no feedback or questioning possible (in spite of heavy consequences)
- same arbitrary algorithm used everywhere $\implies$ people trapped in arbitrariness nightmare (no loan / hiring possible nowhere if everyone uses the same software)
- using illegal criteria, or proxy for them (e.g., living neighborhood for ethnicity)
- It has been found in 2016 that COMPAS, the algorithm used for recidivism prediction (in the US) produces much higher false positive rate for black people than white people (and jail duration is based on it!)
self-reinforcing/predicting (self-fulfilling prophecy): police patrol optimization: go more often in ghettos $\implies$ arrest more people in ghettos $\implies$ go more in ghettos $\implies$ etc. $\implies$ focus on ghettos and forget the rest (during that time, no white-collar crime investigation)

$\implies$ crucial: feedback (from people involved), explanability, right to contest/appeal
$\implies$ think twice about the impact of your algorithms before deploying them

Be responsible and careful

"With great power comes great responsability" (guess the source;)

machine learning tools are becoming more and more powerful
software: easy to deploy, potential great impact
choose which impact and where: military, finance, advertising, or humanitarian? (shortage of machine learners, so nobody will do what you refuse to do)
Thales announced it will not produce killer robots; Google left a military drone project after employees' revolt; NB: France/Europe/Russia/US = world biggest weapon producers
many discussions about AI ethics; in particular, Montreal declaration for responsible AI
"FAT"-ML: Fairness, Accountability, and Transparency in Machine Learning
- principles defined on fatml.org
- key concepts: Responsibility, Explainability, Accuracy, Auditability, Fairness

Interpretability by design: "Explainable AI"

By breaking the pipeline into interpretable steps
Example: image captioning

[Women also Snowboard: Overcoming Bias in Captioning Models; Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach; FAT-ML 2018]
- pipeline: input image → regions of interest → object classication (for each region) → captioning based on objects found
- grad-CAM on mistaken caption indicates what the neural network was looking at to take its decision
- example of bias found by analysing mistakes: "man sitting in front of computer" (while it's a woman) with "man" linked to the computer, not the person sitting
[Grounding Visual Explanations; Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata; ECCV 2018]
- captioning pipeline with a criterion favoring words (object subparts) that are both discrimant (for the object class) and relevant (for the input image)
Principle:

Pipeline:

Results:

Interpretability of data: causality

Growing field of machine learning

given a set of random variables (i.e., a dataset of examples of joint realization of these variables), determine which ones depend on which ones (oriented dependency graph)
NB: causality is not correlation
eg, sometimes, A and B are correlated because they're both caused by another variable, C
[Bernhard Schölkopf's team; book], [Isabelle Guyon's team; workshop/challenge]

III - Issues related to datasets

Dataset poisoning

Possible to forge a dataset:

in each image, add some invisible noise (e.g. color of one particular pixel) extremely correlated with the label to predict
machine learning algorithms trained on that dataset will learn that obvious dependency (invisible noise / label) and nothing else
anyone who train will not be able to generalize (to examples not in the dataset)

Variation:

not pixel noise, but other objects. For instance, in a classification task including the category 'cat', build a dataset where all cat pictures also include chairs, so that the algorithm actually learns to detect chairs, and not necessarily cats.
don't explicitely build such a dataset, but put such pictures on the web, well indexed by search engines, so that automatic dataset builders include them

Fairness

Overview:

problems at stake (societal impact)
a number of different definitions
some of which are not compatible
ensuring fairness decreases accuracy
examples of algorithms

Intro

NB: unfairness might be more subtle than expected
eg: word2vec trained on Google News:

provides a "linear" embedding of words, such that $f($Paris$) - f($London$) \,=\, f($France$) - f($UK$)$ for instance, $f($man$) - f($woman$) \;\,=\, f($king$) - f($queen$)$, etc.
but also $f($man$) - f($woman$) \;\,=\, f($computer programmer$) - f($homemaker$) \;\,=\, f($surgeon$) - f($nurse$)$...
$\implies$ de-bias...
[Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings; Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai; NIPS 2016]

Definition 1: fairness by (un)awareness

Simplistic version: unawareness

do not include sensitive features (such as gender, ethnicity...) in the data
matches the notion of "disparate treatment"
not sufficient: can use proxies (e.g., hair length for gender, address for ethnicity...)

[Fairness through awareness; Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel; ITCS 2012]

relaxed notion: for each individual $x$, the prediction $f(x)$ is stochastic: distribution $D(x)$
quantify unfairness: for any pair of samples $x, x'$, require $d_D( D(x), D(x') ) \leqslant d_X(x,x')$ : distance between the distributions of outputs is less than the original distance
doesn't rely on predefined groups of people (as in next definitions) but on individuals directly
issues: which metrics $d_D$ and $d_X$ ?

$\newcommand{\hY}{\widehat{Y}}$

Definition 2: Equal opportunity / $\epsi$-fairness

[Equality of Opportunity in Supervised Learning; Moritz Hardt, Eric Price, Nathan Srebro; NIPS 2016]

input $(X, A)$ with $A$ = sensitive attribute (gender, ethnicity...)
binary outcome variable, $Y=1$ is "success" (e.g., being hired)
binary prediction $\hY$; predicted success is thus when $\hY = 1$
the point is to ensure that chances of success ("opportunity"), for individuals deserving it, do not depend on the sensitive attribute $A$
equal opportunity: $$\forall a,a', \;\;\; p\left(\left.\hY=1\right|A=a,Y=1\right) \;=\; p\left(\left.\hY=1\right|A=a',Y=1\right) $$ i.e. $p($predicted success$\big|A=a,$ truth=success$) \;=\; p($predicted success$\big|A=a',$ truth=success$)$ for all groups $a,a'$
i.e. same chance to succeed when should succeed
NB: this definition relies on the notion of groups (people with same sensitive attribute)

$\epsi$-fairness: same but approximately:

difference $< \epsi$ : $$\left|\, p\left(\left.\hY=1\right|A=a,Y=1\right) \;-\; p\left(\left.\hY=1\right|A=a',Y=1\right)\, \right| \;<\; \epsi$$
(further details) [Empirical Risk Minimization under Fairness Constraints; Michele Donini, Luca Oneto, Shai Ben-David, John Shawe-Taylor, Massimiliano Pontil; NIPS 2018]
(variation on the loss) [Decoupled classifiers for group-fair and efficient machine learning; Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, Max Leiserson; 1st Conference on Fairness, Accountability and Transparency 2018]

Definition 3: same distribution (of outputs / of errors) : group-based

Principle: probability of outcome (or success) should not depend (or not much) on the sensitive attribute

Example: study of main commercial face classification softwares, tested on a grid of different ages/genders/etc bins (check the performance on each subset: young white males, adult asian women, etc.) [Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification; Joy Buolamwini, Timnit Gebru; 1st Conference on FAT, 2018 ]

accuracy not homogeneous at all!: many more mistakes for dark females gender classification
$\implies$ lead to the organization of a challenge in order to perform well on whole sub-categories
got better results than industry

Group-fairness:

Impact disparity: outputs conditioned on a subgroup (eg., gender) have different probabilities [what we want to avoid]
Treatment disparity: explicitely treat subgroups differently, to obtain impact parity [a possible way to solve the problem; whether good or not is debated]
Warning! Secondary effects might happen. Example: try to achieve both impact & treatment parity, for girls/boys admission at university. If "gender" is removed from data, but "hair length" is still there, as "hair length" is a (bad) proxy for "gender", short-haired women will be rejected and long-haired men will be accepted.
[Does mitigating ML's impact disparity require treatment disparity? Zachary C. Lipton, Alexandra Chouldechova, Julian McAuley; NIPS 2018]

$\newcommand{\hy}{\widehat{y}}$ 3 possible requirements (with the same notations as above: sensitive attribute $A$ to be independent of, prediction $\hY$, true label or value $Y$ to predict):

independence	$\hY$ independent of $A$	$\forall a, a', \hy, \;\;\;\;\;\;\;\; p(\hY=\hy\|A=a) \;=\; p(\hY=\hy\|A=a')$	outcome proba indep(group/sensitive info)
separation	$\hY$ independent of $A$ when $\|Y$	$\forall a, a',y,\hy, \;\;\;\;\;p(\hY=\hy\|A=a,Y=y) \;=\; p(\hY=\hy\|A=a',Y=y)$	$A$ doesn't influence distribution knowing skills : Equalized odds
sufficiency	$Y$ independent of $A$ when $\|\hY$	$\forall a, a',y,\hy, \;\;\;\;\; p(Y=y\|A=a,\hY=\hy) \;=\; p(Y=y\|A=a',\hY=\hy)$	$A$ doesn't influence the error distribution $y\|\hy$

→ variations: do not require strict equality, but |difference| $< \epsi$, or ratio of probabilities $< 1+ \epsi$

NB: these group-based definitions are incompatible (if A and Y are correlated, you can't have any 2 of these independences at once)

Definition 4: Causality (Counterfactual fairness)

[Counterfactual Fairness; Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva; NIPS 2017]

suppose we know the causality graph between attributes (e.g., variable A causes variable B, etc.)
the sensitive attributes should not influence the outcome
to check: replace the sensitive attributes with various values : does it change the outcome probabilities of the algorithm?
→ causality testing
formulation, for a binary sensitive attribute $A \in \{0,1\}$ and input data $X$: $$\forall X,a,\hy,\;\;\;\;\;\; p( \hY_{A \leftarrow 0}= \hy | X, A=a ) \;=\; p( \hY_{A \leftarrow 1}= \hy | X, A=a ) $$ where $\hY_{A \leftarrow 0}$ means "when replacing sensitive attribute $A$ with a particular value 0"
issue: $\hY_{A \leftarrow 0}$ in practice? and which causality graph? $\implies$ hot research topic

Algorithms

depend on the fairness definition (of course)
in general: enforcing fairness will decrease accuracy $\implies$ fairness/accuracy trade-off

Type 1 [before training]: pre-process data, to remove sensitive data
Type 2 [while training]: enforce fairness while optimizing
Type 3 [after training]: at post-processing: change thresholds/biases

Type 3 works well but requires the sensitive information at test time

Example of type 2 with adversarial approach:

[Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings; Mohsan Alvi, Andrew Zisserman and Christoffer Nellaker; ECCV 2018]
consider biases in face datasets (age, gender, ethnicity)
can remove a bias when learning a network with an adversarial approach, where the adversarial network tries to recover the sensitive attribute from a middle representation layer of the main network: impossibility to retrieve the sensitive data means independence to it

or enforce (soft, relaxed) constraints explicitely.

Example of type 1 :

~idem: use Information Bottleneck concepts:
from data $(x,a)$, build a new representation $z$ (to be used later for classification or regression, but we don't know the task yet):
map $(x,a) \mapsto z$
such that the mutual information $I(X,Z)$ is maximized while $I(A,Z)$ is minimized: i.e., keep relevant information only

Differential privacy

[NB: in French: "privacy" = "confidentialité"]

Issues regarding privacy

Why care about privacy? Isn't anonymization sufficient?
Netflix prize, 2007:

offered 1 million dollars to anyone able to increase by 10% their recommendation system performance
provided an anonymized dataset of users, with movie preferences (i.e. user name replaced)
Arvind Narayanan and Vitaly Shmatikov managed to re-identify part of the users, using IMDb (where users rate the movies they've watched)
standard process for un-anonymizing datasets: combine with other dataset(s); even if each of them is mostly uninformative, taken together, the information can be retrieved.
other example: anonymized electricity consumer dataset (including approximate location) + white pages + ... $\implies$ re-identify

Why care if no dataset sharing?
If you (e.g., Google) train an algorithm on your client database (containing private data) and provide the trained algorithm to all clients as a service: it might be possible to extract private data (of other clients) from it

Queries on a database:

arbitrary queries on a private statistical database necessarily reveal some amount of private information; the entire information content of the database can be retrieved with surprisingly small number of random queries.
[Revealing information while preserving privacy; Kobbi Nissim and Irit Dinur; SIGMOD-SIGACT-SIGART symposium on Principles of database systems 2003]

$\epsi$-differentiable privacy

Formalization of the amount of noise needed to be added to query answers to keep privacy, i.e. not be able to distinguish a dataset from the same dataset + one more element : $\epsi$-differential privacy

[Calibrating noise to sensitivity in private data analysis; Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith; Conference on Theory of Cryptography, 2006]
Notations:
- algo: $A$
- dataset: $D_1$
- dataset $D_1$ + one element: $D_2$
Definition:
Algo $A$ has $(\epsi, \delta)$-privacy iff:
for all subsets $S$ of Im$(A)$, for all datasets $D_1$ and $D_2$ differing by one element only, $$ p\left( A(D_1) \in S\right) \;\; \leqslant \;\; e^\epsi\; p\left( A(D_2) \in S\right) \,+\, \delta$$ i.e. proba very close (interesting for small $\epsi$ and $\delta$)
Variant: $\epsi$-privacy: idem with $\delta = 0$.
How to ensure $\epsi$-privacy?
- add noise to query answers
- provably (and quantifyably), the fewer individuals involved in a query, the more noise needed
$\implies$ Gödel prize in 2017

To go further:

Example of privacy-preserving pipeline

Example of advanced ML pipeline taking into account privacy:
[Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data; Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, Kunal Talwar; ICLR 2017]
Keypoints:

train several classifiers on different datasets [the more classifiers, the more private the result will be, but not too many otherwise only small data left to train each classifier]
make an ensemble method, with noise [crucial for $(\epsi, \delta)$-privacy proof]
label (a small part of) a publicly-available dataset using that ensemble classifier
train another network ("student") to learn to imitate that ensemble classifier on that public dataset ("teacher") (in a weakly-supervised manner)
share the student network $\implies$ has not seen any private data!
proofs rely on the number of requests made to the ensemble classifier: so, train the student with as few labeled examples as possible (which gives a privacy bound), hence weakly-supervision; then when sharing the student, this number of requests will not grow, as new requests are addressed to the student, not to the private classifiers, hence intensive usage is possible without privacy drop
results: small accuracy drop only; yet the proven privacy level $\epsi$ is not really "small":

Back to the main page of the course

Principle:
Pipeline:
Results: