
# Chapter 2: Interpretability

NB: turn on javascript to get beautiful mathematical formulas thanks to MathJax

$\newcommand{\epsi}{\varepsilon}$

Overview:

I - Visualization / Analysis (of a neural network trained)
II - Interpretability: societal impact and approaches
III - Issues related to datasets

## I - Visualization / Analysis (of a neural network trained)

### At the neuron level

• pick a neuron, study its activities on the training set
• show its history (particularly relevant for recurrent networks)
• show its distribution of activities (possibly as a function of input classes in case of a classication task)
• what does it see?
• what does it react to?
• display input patterns (in computer vision: images or image patches taken from the dataset) that maximize its activity (or lead to some target activity)
• compute & display the artificial pattern that would theoretical activate that neuron the most (or: that would activate it that way, to get that same value).
$\implies$ by gradient descent: backpropagation of the activation through the layers from the neuron, iteratively modifying the input image (starting from random noise or a given image)
• does it have any impact, actually? Which neuron influenced the network decision the most?
• derivative $\frac{df}{da}$ of the output of the network w.r.t. the activity of that neuron
• causality-style analysis (replace $a$ with typical values and check influence: costly)

### At the layer level

With CCA (Canonical correspondence analysis), check whether the features developed in such layer are correlated with another set of explainable features (e.g., handmade).

### The case of CNN

• filter visualisation (first layer: easy, but next ones? deconv!)
[Feature visualization of CNN trained on ImageNet by Matthew D. Zeiler and Rob Fergus]
• display which parts of the input image were actually looked at by the network and were important in the decision process
• grad-CAM
[Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization; Selvaraju et al, ICCV 2017 / IJCV 2019]
• consider a classification task; output: $y = (y_c)$ : probability distribution over classes
• consider a convolutional layer (preferably, the last one): its activities $A^k_{ij}$ are indexed by pixel location $(i,j)$ and feature number $k$
• importance of feature $k$ for class $c$:
$\alpha^c_k = \frac{1}{\#\text{pixels}} \sum_{ij} \frac{\partial y_c}{\partial A^k_{ij}} \;\;$ (easily obtainable by averaging backpropagated quantities)
$\implies$ kind of linearization (mapping activities to outputs)
• heatmap: importance of pixel $(i,j)$ described by $\sum_k \alpha^c_k A^k_{ij}$
• compute and display $\text{ReLU}(\sum_k \alpha^c_k A^k)$
• based on CAM: Class Activation Maps
[Learning Deep Features for Discriminative Localization; Zhou et al, CVPR 2016]

### About optimization visualization:

• display the accuracy as a function of time, etc. (to pick a good learning rate, etc.)
• project the network (seen as a function) on a low-dimensional space (e.g. 2D), e.g. using t-SNE (available in scikit-learn), in order to visualize its training as a planar curve (e.g. to see the effect of random intializations; oscillations vs clear convergence).

### By sub-task design: “explainable AI”

Cf below.

NB: in the following, most is not specific to deep learning, but applicable to ML in general

## II - Interpretability: societal impact and approaches

### Why interpretability is important: what is at stake

#### Example: medical diagnosis

• need for explanation of the final score (help to diagnosis): why should we trust the prediction? Can we see on which elements the decision was based?
• Isabelle Guyon's skin disease classification tool [patent (pdf), (html)]
• hand-crafted features are designed, and labeled by their type (are they based on: color? texture? shape? etc.);
• at test time, for each diagnosis, the importance of each feature is estimated (a bit like Grad-CAM's spirit); then a score for each type of feature is given (e.g., this decision was based for 40% on color-type features, for 25% on shape-type features, etc.)

#### Societal impact: "Weapons of Maths Destruction" by Cathy O'Neil

• companies using black-box software (provided by other companies) for important matters (for the life of people involved), such as hiring (example in the book: waiter position), firing (example in a big school), loans (whom should the bank lend money to?)
• examples of widespread algorithms behaving arbitrarily, sometimes even stochastically
• no feedback or questioning possible (in spite of heavy consequences)
• same arbitrary algorithm used everywhere $\implies$ people trapped in arbitrariness nightmare (no loan / hiring possible nowhere if everyone uses the same software)
• using illegal criteria, or proxy for them (e.g., living neighborhood for ethnicity)
• It has been found in 2016 that COMPAS, the algorithm used for recidivism prediction (in the US) produces much higher false positive rate for black people than white people (and jail duration is based on it!)
• self-reinforcing/predicting (self-fulfilling prophecy): police patrol optimization: go more often in ghettos $\implies$ arrest more people in ghettos $\implies$ go more in ghettos $\implies$ etc. $\implies$ focus on ghettos and forget the rest (during that time, no white-collar crime investigation)

$\implies$ crucial: feedback (from people involved), explanability, right to contest/appeal
$\implies$ think twice about the impact of your algorithms before deploying them

#### Be responsible and careful

"With great power comes great responsability" (guess the source;)
• machine learning tools are becoming more and more powerful
• software: easy to deploy, potential great impact
• choose which impact and where: finance, advertising, or humanitarian? (shortage of machine learners, so nobody will do what you refuse to do)
• Thales announced it will not produce killer robots; Google left a military drone project after employees' revolt; NB: France/Europe/Russia/US = world biggest weapon producers
• many discussions about AI ethics; in particular, Montreal declaration for responsible AI
• "FAT"-ML: Fairness, Accountability, and Transparency in Machine Learning
• principles defined on fatml.org
• key concepts: Responsibility, Explainability, Accuracy, Auditability, Fairness

### Interpretability by design: "Explainable AI"

By breaking the pipeline into interpretable steps
Example: image captioning

### Interpretability of data: causality

Growing field of machine learning

## III - Issues related to datasets

### Dataset poisoning

Possible to forge a dataset:
• in each image, add some invisible noise (e.g. color of one particular pixel) extremely correlated with the label to predict
• machine learning algorithms trained on that dataset will learn that obvious dependency (invisible noise / label) and nothing else
• anyone who train will not be able to generalize (to examples not in the dataset)

Variation:
• not pixel noise, but other objects. For instance, in a classification task including the category 'cat', build a dataset where all cat pictures also include chairs, so that the algorithm actually learns to detect chairs, and not necessarily cats.
• don't explicitely build such a dataset, but put such pictures on the web, well indexed by search engines, so that automatic dataset builders include them

### Fairness

Overview:
• problems at stake (societal impact)
• a number of different definitions
• some of which are not compatible
• ensuring fairness decreases accuracy
• examples of algorithms

#### Intro

NB: unfairness might be more subtle than expected
eg: word2vec trained on Google News:

#### Definition 1: fairness by (un)awareness

Simplistic version: unawareness
• do not include sensitive features (such as gender, ethnicity...) in the data
• matches the notion of "disparate treatment"
• not sufficient: can use proxies (e.g., hair length for gender, address for ethnicity...)

[Fairness through awareness; Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Rich Zemel; ITCS 2012]
• relaxed notion: for each individual $x$, the prediction $f(x)$ is stochastic: distribution $D(x)$
• quantify unfairness: for any pair of samples $x, x'$, require $d_D( D(x), D(x') ) \leqslant d_X(x,x')$ : distance between the distributions of outputs is less than the original distance
• doesn't rely on predefined groups of people (as in next definitions) but on individuals directly
• issues: which metrics $d_D$ and $d_X$ ?

$\newcommand{\hY}{\widehat{Y}}$

#### Definition 2: Equal opportunity / $\epsi$-fairness

[Equality of Opportunity in Supervised Learning; Moritz Hardt, Eric Price, Nathan Srebro; NIPS 2016]
• input $(X, A)$ with $A$ = sensitive attribute (gender, ethnicity...)
• binary outcome variable, $Y=1$ is "success" (e.g., being hired)
• binary prediction $\hY$; predicted success is thus when $\hY = 1$
• the point is to ensure that chances of success ("opportunity"), for individuals deserving it, do not depend on the sensitive attribute $A$
• equal opportunity: $$\forall a,a', \;\;\; p\left(\left.\hY=1\right|A=a,Y=1\right) \;=\; p\left(\left.\hY=1\right|A=a',Y=1\right)$$ i.e. $p($predicted success$\big|A=a,$ truth=success$) \;=\; p($predicted success$\big|A=a',$ truth=success$)$ for all groups $a,a'$
• i.e. same chance to succeed when should succeed
• NB: this definition relies on the notion of groups (people with same sensitive attribute)

$\epsi$-fairness: same but approximately:

#### Definition 3: same distribution (of outputs / of errors) : group-based

Principle: probability of outcome (or success) should not depend (or not much) on the sensitive attribute

Example: study of main commercial face classification softwares, tested on a grid of different ages/genders/etc bins (check the performance on each subset: young white males, adult asian women, etc.) [Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification; Joy Buolamwini, Timnit Gebru; 1st Conference on FAT, 2018 ]
• accuracy not homogeneous at all!: many more mistakes for dark females gender classification

• $\implies$ lead to the organization of a challenge in order to perform well on whole sub-categories
• got better results than industry

Group-fairness:
• Impact disparity: outputs conditioned on a subgroup (eg., gender) have different probabilities [what we want to avoid]
• Treatment disparity: explicitely treat subgroups differently, to obtain impact parity [a possible way to solve the problem; whether good or not is debated]
• Warning! Secondary effects might happen. Example: try to achieve both impact & treatment parity, for girls/boys admission at university. If "gender" is removed from data, but "hair length" is still there, as "hair length" is a (bad) proxy for "gender", short-haired women will be rejected and long-haired men will be accepted.
• [Does mitigating ML's impact disparity require treatment disparity? Zachary C. Lipton, Alexandra Chouldechova, Julian McAuley; NIPS 2018]

$\newcommand{\hy}{\widehat{y}}$ 3 possible requirements (with the same notations as above: sensitive attribute $A$ to be independent of, prediction $\hY$, true label or value $Y$ to predict):

 independence $\hY$ independent of $A$ $\forall a, a', \hy, \;\;\;\;\;\;\;\; p(\hY=\hy|A=a) \;=\; p(\hY=\hy|A=a')$ outcome proba indep(group/sensitive info) separation $\hY$ independent of $A$ when $|Y$ $\forall a, a',y,\hy, \;\;\;\;\;p(\hY=\hy|A=a,Y=y) \;=\; p(\hY=\hy|A=a',Y=y)$ $A$ doesn't influence distribution knowing skills : Equalized odds sufficiency $Y$ independent of $A$ when $|\hY$ $\forall a, a',y,\hy, \;\;\;\;\; p(Y=y|A=a,\hY=\hy) \;=\; p(Y=y|A=a',\hY=\hy)$ $A$ doesn't influence the error distribution $y|\hy$

→ variations: do not require strict equality, but |difference| $< \epsi$, or ratio of probabilities $< 1+ \epsi$

NB: these group-based definitions are incompatible (if A and Y are correlated, you can't have any 2 of these independences at once)

#### Definition 4: Causality (Counterfactual fairness)

[Counterfactual Fairness; Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva; NIPS 2017]
• suppose we know the causality graph between attributes (e.g., variable A causes variable B, etc.)
• the sensitive attributes should not influence the outcome
• to check: replace the sensitive attributes with various values : does it change the outcome probabilities of the algorithm?
→ causality testing
• formulation, for a binary sensitive attribute $A \in \{0,1\}$ and input data $X$: $$\forall X,a,\hy,\;\;\;\;\;\; p( \hY_{A \leftarrow 0}= \hy | X, A=a ) \;=\; p( \hY_{A \leftarrow 1}= \hy | X, A=a )$$ where $\hY_{A \leftarrow 0}$ means "when replacing sensitive attribute $A$ with a particular value 0"
• issue: $\hY_{A \leftarrow 0}$ in practice? and which causality graph? $\implies$ hot research topic

#### Algorithms

• depend on the fairness definition (of course)
• in general: enforcing fairness will decrease accuracy $\implies$ fairness/accuracy trade-off

Type 1 [before training]: pre-process data, to remove sensitive data
Type 2 [while training]: enforce fairness while optimizing
Type 3 [after training]: at post-processing: change thresholds/biases
• Type 3 works well but requires the sensitive information at test time

Example of type 2 with adversarial approach:

or enforce (soft, relaxed) constraints explicitely.

Example of type 1 :
• ~idem: use Information Bottleneck concepts:
from data $(x,a)$, build a new representation $z$ (to be used later for classification or regression, but we don't know the task yet):
map $(x,a) \mapsto z$
such that the mutual information $I(X,Z)$ is maximized while $I(A,Z)$ is minimized: i.e., keep relevant information only

### Differential privacy

[NB: in French: "privacy" = "confidentialité"]

#### Issues regarding privacy

Why care about privacy? Isn't anonymization sufficient?
Netflix prize, 2007:
• offered 1 million dollars to anyone able to increase by 10% their recommendation system performance
• provided an anonymized dataset of users, with movie preferences (i.e. user name replaced)
• Arvind Narayanan and Vitaly Shmatikov managed to re-identify part of the users, using IMDb (where users rate the movies they've watched)
• standard process for un-anonymizing datasets: combine with other dataset(s); even if each of them is mostly uninformative, taken together, the information can be retrieved.
• other example: anonymized electricity consumer dataset (including approximate location) + white pages + ... $\implies$ re-identify

Why care if no dataset sharing?
If you (e.g., Google) train an algorithm on your client database (containing private data) and provide the trained algorithm to all clients as a service: it might be possible to extract private data (of other clients) from it

Queries on a database:

#### $\epsi$-differentiable privacy

Formalization of the amount of noise needed to be added to query answers to keep privacy, i.e. not be able to distinguish a dataset from the same dataset + one more element : $\epsi$-differential privacy

To go further:

#### Example of privacy-preserving pipeline

Example of advanced ML pipeline taking into account privacy:
[Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data; Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, Kunal Talwar; ICLR 2017]
Keypoints:
• train several classifiers on different datasets [the more classifiers, the more private the result will be, but not too many otherwise only small data left to train each classifier]
• make an ensemble method, with noise [crucial for $(\epsi, \delta)$-privacy proof]
• label (a small part of) a publicly-available dataset using that ensemble classifier
• train another network ("student") to learn to imitate that ensemble classifier on that public dataset ("teacher") (in a weakly-supervised manner)
• share the student network $\implies$ has not seen any private data!

• proofs rely on the number of requests made to the ensemble classifier: so, train the student with as few labeled examples as possible (which gives a privacy bound), hence weakly-supervision; then when sharing the student, this number of requests will not grow, as new requests are addressed to the student, not to the private classifiers, hence intensive usage is possible without privacy drop
• results: small accuracy drop only; yet the proven privacy level $\epsi$ is not really "small":

#### Federated learning

When training on sensitive data that should not be shared, for instance:
• healthcare: hospital datasets (e.g., Owkin)
• predictive keyboards: do not send everything typed by every user to a central server! (Google's Gboard)

Setup:
• $N$ local servers (hospital, client, user...) with their own private dataset
• train only one algorithm, the same on all servers
• share the parameters (send parameter updates), either with a central server, or in a peer-to-peer fashion
$\implies$ no data transfer, hence privacy
$\implies$ yet parameter transfer might leak information (cf above)

Back to the main page of the course