
# Chapter 3: Interpretability

NB: turn on javascript to get beautiful mathematical formulas thanks to MathJax
NB2: an old raw text file is also available if you wish for a more compact summary

$\newcommand{\epsi}{\varepsilon}$

Overview:

I - Visualization / Analysis (of a neural network trained)
II - Interpretability: societal impact and approaches
III - Issues related to datasets

## I - Visualization / Analysis (of a neural network trained)

### Analysis, invariances

→ Edouard's slides
→ Pierre Stock's presentation slides

NB: in the following, most is not specific to deep learning, but applicable to ML in general

## II - Interpretability: societal impact and approaches

### Why interpretability is important: what is at stake

#### Example: medical diagnosis

• need for explanation of the final score (help to diagnosis): why should we trust the prediction? Can we see on which elements the decision was based?
• Isabelle Guyon's skin disease classification tool [patent (pdf), (html)]
• hand-crafted features are designed, and labeled by their type (are they based on: color? texture? shape? etc.);
• at test time, for each diagnosis, the importance of each feature is estimated (a bit like Grad-CAM's spirit); then a score for each type of feature is given (e.g., this decision was based for 40% on color-type features, for 25% on shape-type features, etc.)

#### Societal impact: "Weapons of Maths Destruction" by Cathy O'Neil

• companies using black-box software (provided by other companies) for important matters (for the life of people involved), such as hiring (example in the book: waiter position), firing (example in a big school), loans (whom should the bank lend money to?)
• examples of widespread algorithms behaving arbitrarily, sometimes even stochastically
• no feedback or questioning possible (in spite of heavy consequences)
• same arbitrary algorithm used everywhere $\implies$ people trapped in arbitrariness nightmare (no loan / hiring possible nowhere if everyone uses the same software)
• using illegal criteria, or proxy for them (e.g., living neighborhood for ethnicity)
• It has been found in 2016 that COMPAS, the algorithm used for recidivism prediction (in the US) produces much higher false positive rate for black people than white people (and jail duration is based on it!)
• self-reinforcing/predicting (self-fulfilling prophecy): police patrol optimization: go more often in ghettos $\implies$ arrest more people in ghettos $\implies$ go more in ghettos $\implies$ etc. $\implies$ focus on ghettos and forget the rest (during that time, no white-collar crime investigation)

$\implies$ crucial: feedback (from people involved), explanability, right to contest/appeal
$\implies$ think twice about the impact of your algorithms before deploying them

#### Be responsible and careful

"With great power comes great responsability" (guess the source;)
• machine learning tools are becoming more and more powerful
• software: easy to deploy, potential great impact
• choose which impact and where: military, finance, advertising, or humanitarian? (shortage of machine learners, so nobody will do what you refuse to do)
• Thales announced it will not produce killer robots; Google left a military drone project after employees' revolt; NB: France/Europe/Russia/US = world biggest weapon producers
• many discussions about AI ethics; in particular, Montreal declaration for responsible AI
• "FAT"-ML: Fairness, Accountability, and Transparency in Machine Learning
• principles defined on fatml.org
• key concepts: Responsibility, Explainability, Accuracy, Auditability, Fairness

### Interpretability by design: "Explainable AI"

By breaking the pipeline into interpretable steps
Example: image captioning

### Interpretability of data: causality

Growing field of machine learning

## III - Issues related to datasets

### Dataset poisoning

Possible to forge a dataset:
• in each image, add some invisible noise (e.g. color of one particular pixel) extremely correlated with the label to predict
• machine learning algorithms trained on that dataset will learn that obvious dependency (invisible noise / label) and nothing else
• anyone who train will not be able to generalize (to examples not in the dataset)

Variation:
• not pixel noise, but other objects. For instance, in a classification task including the category 'cat', build a dataset where all cat pictures also include chairs, so that the algorithm actually learns to detect chairs, and not necessarily cats.
• don't explicitely build such a dataset, but put such pictures on the web, well indexed by search engines, so that automatic dataset builders include them

### Fairness

Overview:
• problems at stake (societal impact)
• a number of different definitions
• some of which are not compatible
• ensuring fairness decreases accuracy
• examples of algorithms

#### Intro

NB: unfairness might be more subtle than expected
eg: word2vec trained on Google News:

#### Definition 1: fairness by (un)awareness

Simplistic version: unawareness
• do not include sensitive features (such as gender, ethnicity...) in the data
• matches the notion of "disparate treatment"
• not sufficient: can use proxies (e.g., hair length for gender, address for ethnicity...)

[Fairness through awareness; Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel; ITCS 2012]
• relaxed notion: for each individual $x$, the prediction $f(x)$ is stochastic: distribution $D(x)$
• quantify unfairness: for any pair of samples $x, x'$, require $d_D( D(x), D(x') ) \leqslant d_X(x,x')$ : distance between the distributions of outputs is less than the original distance
• doesn't rely on predefined groups of people (as in next definitions) but on individuals directly
• issues: which metrics $d_D$ and $d_X$ ?

$\newcommand{\hY}{\widehat{Y}}$

#### Definition 2: Equal opportunity / $\epsi$-fairness

[Equality of Opportunity in Supervised Learning; Moritz Hardt, Eric Price, Nathan Srebro; NIPS 2016]
• input $(X, A)$ with $A$ = sensitive attribute (gender, ethnicity...)
• binary outcome variable, $Y=1$ is "success" (e.g., being hired)
• binary prediction $\hY$; predicted success is thus when $\hY = 1$
• the point is to ensure that chances of success ("opportunity"), for individuals deserving it, do not depend on the sensitive attribute $A$
• equal opportunity: $$\forall a,a', \;\;\; p\left(\left.\hY=1\right|A=a,Y=1\right) \;=\; p\left(\left.\hY=1\right|A=a',Y=1\right)$$ i.e. $p($predicted success$\big|A=a,$ truth=success$) \;=\; p($predicted success$\big|A=a',$ truth=success$)$ for all groups $a,a'$
• i.e. same chance to succeed when should succeed
• NB: this definition relies on the notion of groups (people with same sensitive attribute)

$\epsi$-fairness: same but approximately:

#### Definition 3: same distribution (of outputs / of errors) : group-based

Principle: probability of outcome (or success) should not depend (or not much) on the sensitive attribute

Example: study of main commercial face classification softwares, tested on a grid of different ages/genders/etc bins (check the performance on each subset: young white males, adult asian women, etc.) [Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification; Joy Buolamwini, Timnit Gebru; 1st Conference on FAT, 2018 ]
• accuracy not homogeneous at all!: many more mistakes for dark females gender classification

• $\implies$ lead to the organization of a challenge in order to perform well on whole sub-categories
• got better results than industry

Group-fairness:
• Impact disparity: outputs conditioned on a subgroup (eg., gender) have different probabilities [what we want to avoid]
• Treatment disparity: explicitely treat subgroups differently, to obtain impact parity [a possible way to solve the problem; whether good or not is debated]
• Warning! Secondary effects might happen. Example: try to achieve both impact & treatment parity, for girls/boys admission at university. If "gender" is removed from data, but "hair length" is still there, as "hair length" is a (bad) proxy for "gender", short-haired women will be rejected and long-haired men will be accepted.
• [Does mitigating ML's impact disparity require treatment disparity? Zachary C. Lipton, Alexandra Chouldechova, Julian McAuley; NIPS 2018]

$\newcommand{\hy}{\widehat{y}}$ 3 possible requirements (with the same notations as above: sensitive attribute $A$ to be independent of, prediction $\hY$, true label or value $Y$ to predict):

 independence $\hY$ independent of $A$ $\forall a, a', \hy, \;\;\;\;\;\;\;\; p(\hY=\hy|A=a) \;=\; p(\hY=\hy|A=a')$ outcome proba indep(group/sensitive info) separation $\hY$ independent of $A$ when $|Y$ $\forall a, a',y,\hy, \;\;\;\;\;p(\hY=\hy|A=a,Y=y) \;=\; p(\hY=\hy|A=a',Y=y)$ $A$ doesn't influence distribution knowing skills : Equalized odds sufficiency $Y$ independent of $A$ when $|\hY$ $\forall a, a',y,\hy, \;\;\;\;\; p(Y=y|A=a,\hY=\hy) \;=\; p(Y=y|A=a',\hY=\hy)$ $A$ doesn't influence the error distribution $y|\hy$

→ variations: do not require strict equality, but |difference| $< \epsi$, or ratio of probabilities $< 1+ \epsi$

NB: these group-based definitions are incompatible (if A and Y are correlated, you can't have any 2 of these independences at once)

#### Definition 4: Causality (Counterfactual fairness)

[Counterfactual Fairness; Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva; NIPS 2017]
• suppose we know the causality graph between attributes (e.g., variable A causes variable B, etc.)
• the sensitive attributes should not influence the outcome
• to check: replace the sensitive attributes with various values : does it change the outcome probabilities of the algorithm?
→ causality testing
• formulation, for a binary sensitive attribute $A \in \{0,1\}$ and input data $X$: $$\forall X,a,\hy,\;\;\;\;\;\; p( \hY_{A \leftarrow 0}= \hy | X, A=a ) \;=\; p( \hY_{A \leftarrow 1}= \hy | X, A=a )$$ where $\hY_{A \leftarrow 0}$ means "when replacing sensitive attribute $A$ with a particular value 0"
• issue: $\hY_{A \leftarrow 0}$ in practice? and which causality graph? $\implies$ hot research topic

#### Algorithms

• depend on the fairness definition (of course)
• in general: enforcing fairness will decrease accuracy $\implies$ fairness/accuracy trade-off

Type 1 [before training]: pre-process data, to remove sensitive data
Type 2 [while training]: enforce fairness while optimizing
Type 3 [after training]: at post-processing: change thresholds/biases
• Type 3 works well but requires the sensitive information at test time

Example of type 2 with adversarial approach:

or enforce (soft, relaxed) constraints explicitely.

Example of type 1 :
• ~idem: use Information Bottleneck concepts:
from data $(x,a)$, build a new representation $z$ (to be used later for classification or regression, but we don't know the task yet):
map $(x,a) \mapsto z$
such that the mutual information $I(X,Z)$ is maximized while $I(A,Z)$ is minimized: i.e., keep relevant information only

### Differential privacy

[NB: in French: "privacy" = "confidentialité"]

#### Issues regarding privacy

Why care about privacy? Isn't anonymization sufficient?
Netflix prize, 2007:
• offered 1 million dollars to anyone able to increase by 10% their recommendation system performance
• provided an anonymized dataset of users, with movie preferences (i.e. user name replaced)
• Arvind Narayanan and Vitaly Shmatikov managed to re-identify part of the users, using IMDb (where users rate the movies they've watched)
• standard process for un-anonymizing datasets: combine with other dataset(s); even if each of them is mostly uninformative, taken together, the information can be retrieved.
• other example: anonymized electricity consumer dataset (including approximate location) + white pages + ... $\implies$ re-identify

Why care if no dataset sharing?
If you (e.g., Google) train an algorithm on your client database (containing private data) and provide the trained algorithm to all clients as a service: it might be possible to extract private data (of other clients) from it

Queries on a database:

#### $\epsi$-differentiable privacy

Formalization of the amount of noise needed to be added to query answers to keep privacy, i.e. not be able to distinguish a dataset from the same dataset + one more element : $\epsi$-differential privacy

To go further:

#### Example of privacy-preserving pipeline

Example of advanced ML pipeline taking into account privacy:
[Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data; Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, Kunal Talwar; ICLR 2017]
Keypoints:
• train several classifiers on different datasets [the more classifiers, the more private the result will be, but not too many otherwise only small data left to train each classifier]
• make an ensemble method, with noise [crucial for $(\epsi, \delta)$-privacy proof]
• label (a small part of) a publicly-available dataset using that ensemble classifier
• train another network ("student") to learn to imitate that ensemble classifier on that public dataset ("teacher") (in a weakly-supervised manner)
• share the student network $\implies$ has not seen any private data!

• proofs rely on the number of requests made to the ensemble classifier: so, train the student with as few labeled examples as possible (which gives a privacy bound), hence weakly-supervision; then when sharing the student, this number of requests will not grow, as new requests are addressed to the student, not to the private classifiers, hence intensive usage is possible without privacy drop
• results: small accuracy drop only; yet the proven privacy level $\epsi$ is not really "small":

Back to the main page of the course