Chapter 3 - Interpretability ---------------------------- Part A : Visualization / Analysis (of a neural network trained) --> Edouard + adversarial attacks NB: in the following, most is not specific to deep learning, but ML in general Why interpretability is important: what is at stake --------------------------------------------------- Example: medical diagnosis --> need for explanation of the final score (help to diagnosis) --> Isabelle Guyon's skin disease classification with feature importance by type (a bit like Grad-CAM's spirit) Societal impact: "Weapons of Maths Destruction" by Cathy O'Neil . example: hiring (waiter) / firing (in school) ; loans --> same arbitrary algorithm used everywhere ==> people trapped in arbitrariness nightmare (no loan / hiring possible) --> using illegal criteria, or proxy for them (e.g., living neighborhood for ethnicity) --> It has been found in 2016 that COMPAS, the algorithm used for recidivism prediction (in the US) produces much higher false positive rate for black people than white people (and jail duration is based on it!) . self-reinforcing/predicting (self-fulfilling prophecy): police patrol optimization: go more often in ghettos ==> arrest more people in ghettos ==> go more in ghettos ==> etc. ==> focus on ghettos and forget the rest (during that time, no white-collar crime investigation) ==> crucial: feedback (from people involved), explanability, right to contest/appeal ==> think twice about the impact of your algorithms before deploying them "With great power comes great responsability" (guess the source;) . machine learning tools are becoming more and more powerful . software: easy to deploy, potential great impact . choose which impact and where: military, finance, or humanitarian? (shortage of machine learners, so nobody will do what you refuse to do) . Thales announced it will not produce killer robots; Google left a military drone project after employees' revolt; NB: France/Europe/Russia/US = world biggest weapon producers . many discussions about AI ethics; in particular, Montreal declaration for responsible AI: https://www.montrealdeclaration-responsibleai.com/ . "FAT"-ML: Fairness, Accountability, and Transparency in Machine Learning --> http://www.fatml.org/resources/principles-for-accountable-algorithms --> key concepts: Responsibility, Explainability, Accuracy, Auditability, Fairness [show declarations + examples below] Interpretability by design: "Explainable AI" -------------------------------------------- By breaking the pipeline into interpretable steps Example: image captioning . [Women also Snowboard: Overcoming Bias in Captioning Models, Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach, https://arxiv.org/abs/1807.00517] . input image --> regions of interest --> object classication (for each region) --> captioning based on objects found . grad-CAM on mistaken caption indicates what the neural network was looking at to take its decision . example of bias found by analysing mistakes: "man sitting in front of computer" with "man" linked to the computer, not the person sitting . [Grounding Visual Explanations, Lisa Anne Hendricks, Ronghang Hu, Trevor Darrell, Zeynep Akata, https://arxiv.org/abs/1807.09685] . captioning pipeline with a criterion favoring words (object subparts) that are both discrimant (for the object class) and relevant (for the input image) Interpretability of data: causality ----------------------------------- Growing field of machine learning . given a set of random variables, determine which ones depend on which ones (oriented dependency graph) . NB: causality is not correlation . eg, sometimes, A and B are correlated because they're both caused by another variable, C . [Bernhard Schoelkopf's team], [Isabelle Guyon's team] =================================== Part B - Issues related to datasets =================================== Dataset poisoning ----------------- Possible to forge a dataset: - in each image, add some invisible noise (e.g. color of one particular pixel) extremely correlated with the label to predict - machine learning algorithms trained on that dataset will learn that obvious dependency (invisible noise / label) and nothing else - anyone who train will not be able to generalize Variation: - not pixel noise, but other objects. For instance, in a classification task including the category 'cat', build a dataset where all cat pictures also include chairs, so that the algorithm actually learns to detect chairs, and not necessarily cats. - don't explicitely build such a dataset, but put such pictures on the web, well indexed by search engines, so that automatic dataset builders include them Fairness ======== Problems at stake (societal impact) . 4 definitions . which are not compatible . and decrease accuracy . examples of algorithms Intro ----- NB: unfairness might be more subtle than expected eg: word2vec trained on Google News: . provides a "linear" embedding of words, such that f(Paris) - f(London) = f(France) - f(UK) for instance, f(man) - f(woman) = f(king) - f(queen), etc. . [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai, NIPS 2016, https://arxiv.org/abs/1607.06520] . but f(man) - f(woman) = f(computer programmer) - f(homemaker) . = f(surgeon) - f(nurse)... --> de-bias... Definition 1: fairness by (un)awareness --------------------------------------- Simplistic version: unawareness --> do not include sensitive feature in the data --> matches the notion "disparate treatment" --> not sufficient: can use proxies (e.g., hair length for gender, address for ethnicity...) [Fairness through awareness, Cynthia Dwork et al, 2012] --> relaxed notion: for each individual x, the prediction f(x) is stochastic: distribution D(x) --> quantify unfairness: for any pair of samples x, x': d_D( D(x), D(x') ) <= d_X(x,x') : distance between the distributions of outputs is less than the original distance --> doesn't rely on predefined groups --> issues: which metrics d_D and d_X? Definition 2: Equal opportunity / $\epsilon$-fairness ----------------------------------------------------- [Equality of Opportunity in Supervised Learning, Moritz Hardt, Eric Price, Nathan Srebro, NIPS 2016, https://arxiv.org/abs/1610.02413] . input (X, A) with A = sensitive attribute (gender, ethnicity...) . binary outcome variable, Y=1 is "success" (being hired) . equal opportunity: P(^Y=1|A=a,Y=1) - P(^Y=1|A=a',Y=1) for all a,a' i.e. P(predicted success|A=a, truth=success) = P(predicted success|A=a', truth=success) for all groups a,a' predicted success := (^Y == 1) i.e. same chance to succeed when should succeed epsilon-fairness: same but approximately: difference < epsilon : | P(^Y=1|A=a,Y=1) - P(^Y=1|A=a',Y=1) | < epsilon [Empirical Risk Minimization under Fairness Constraints, Michele Donini, Luca Oneto, Shai Ben-David, John Shawe-Taylor, Massimiliano Pontil, NIPS 2018, https://arxiv.org/abs/1802.08626] (variation on the loss) [Decoupled classifiers for group-fair and efficient machine learning, Cynthia Dwork, Nicole Immorlica, Adam Tauman Kalai, Max Leiserson, 1st Conference on Fairness, Accountability and Transparency 2018, https://arxiv.org/abs/1707.06613] Definition 3: same distribution (of outputs / of errors) : group-based ---------------------------------------------------------------------- Principle: probability of outcome (or success) should not depend (or not much) on the sensitive attribute Commercial face classification softwares, tested on a grid of ages/genders/etc (check the performance on each subset: young white males, adult asian women, etc.): [Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, Joy Buolamwini, Timnit Gebru, 1st Conference on FAT, 2018] --> accuracy not homogeneous at all!: many more mistakes for dark females gender classification (show image) --> challenge in order to perform well on whole sub-categories [https://sites.google.com/site/eccvbefa2018/home/organizers] --> get better results than industry Group-fairness: . Impact disparity: outputs conditioned on a subgroup (eg., gender) have different probabilities [what we want to avoid] . Treatment disparity: explicitely treat subgroups differently, to obtain impact parity [a possible way to solve the problem; whether good or not is debated] . Warning! Secondary effects might happen. Example: try to achieve both impact & treatment parity, for girls/boys admission at university. If "gender" is removed from data, but "hair length" is still there, as "hair length" is a (bad) proxy for "gender", short-haired women will be rejected and long-haired men will be accepted. [Does mitigating ML's impact disparity require treatment disparity? Zachary C. Lipton, Alexandra Chouldechova, Julian McAuley, NIPS 2018, https://arxiv.org/abs/1711.07076] 3 cases: - independence: predictor C : independent of A i.e. P(C=c|A=a) = P(C=c|A=a') : outcome proba indep(group/sensitive info) - separation : C independent of A |Y (true label) P(C=c|A=a,Y=y) = P(C=c|A=a',Y=y) : A doesn't influence distribution : Equalized odds - sufficiency : Y independent of A |C P(Y=y|A=a,C=c) = P(Y=y|A=a',C=c) : A doesn't influence c-->y variations: not strict = but ratio < 1+epsilon or |difference| < epsilon NB: these group-based definitions are incompatible (if A and Y are correlated, you can't have any 2 of these independences at once) Definition 4: Causality (Counterfactual fairness) ------------------------------------------------- [Counterfactual Fairness, Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva, NIPS 2017, https://arxiv.org/abs/1703.06856] - suppose we know the causality graph between attributes (e.g., variable A causes variable B, etc.) - the sensitive attributes should not influence the outcome - to check: replace the sensitive attributes with various values : does it change the outcome probabilities of the algorithm? --> causality testing - formulation: P[ C_{A <-- 0}=c | X, A=a ] = P[ C_{A <-- 1}=c | X, A=a ] where C_{A <-- 0}=c means "when replacing sensitive attribute A with a particular value 0" - issue: in practice? and which causality graph? Algorithms ---------- - depend on the fairness definition (of course) - in general: enforcing fairness will decrease accuracy ==> fairness/accuracy trade-off Type 1: pre-process data, to remove sensitive data Type 2: enforce fairness while optimizing Type 3: at post-processing: change thresholds/biases - works well but requires the sensitive information at test time Example of type 2 with adversarial approach: Biases in face datasets (age, gender, ethnicity) - can remove a bias when learning a network with an adversarial approach trying to classify from a middle representation [Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings, Mohsan Alvi, Andrew Zisserman and Christoffer Nellaker, ECCV 2018, https://arxiv.org/abs/1809.02169] or enforce (soft, relaxed) constraints explicitely. Example of type 1 : ~idem: use Information Bottleneck concepts: map (x,a) --> z --> used later for classification, such that I(x,z) is maximized while I(a,z) is minimized. Differential privacy ==================== [NB: in French: privacy = confidentialite] Why care about privacy? Isn't anonymization sufficient? Netflix prize, 2007: . offers 1 million dollars to anyone able to increase by 10% their recommendation system performance . provides an anonymized dataset of users, with movie preferences (i.e. user name replaced) . Arvind Narayanan and Vitaly Shmatikov managed to re-identify part of the users, using IMDb (where users rate the movies they've watched) . standard process for un-anonymizing datasets: combine with other dataset(s); even if each of them is mostly uninformative, taken together, the information can be retrieved. . other example: anonymized electricity consumer dataset (including approximate location) + white pages + ... ==> re-identify Why care if no dataset sharing? If you (e.g., Google) train an algorithm on your client database (containing private data) and provide the trained algorithm to all clients as a service: it might be possible to extract private data from it Queries on a database: 2003: Kobbi Nissim and Irit Dinur: arbitrary queries on a private statistical database necessarily reveal some amount of private information; the entire information content of the database can be retrieved with surprisingly small number of random queries. 2006: [Calibrating noise to sensitivity in private data analysis, Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith, Conference on Theory of Cryptography, 2006] Formalization of the amount of noise needed to be added to query answers to keep privacy, i.e. not be able to distinguish a dataset from the same dataset + one more element : espilon-differential privacy Notations: . algo: A . dataset: D_1 . dataset D_1 + one element: D_2 Definition: Algo A has (epsilon, delta)-privacy iff for all subsets S of Im A, for all datasets D_1 and D_2 differing by one element only, P[ A(D_1) in S] <= e^epsilon P[ A(D_2) in S] + delta i.e. proba very close (interesting for small epsilon and delta) Variant: epsilon-privacy: idem with delta = 0. How to ensure epsilon-privacy? . add noise to query answers . provably (and quantifyably), the fewer individuals involved in a query, the more noise needed ==> Goedel prize in 2017 To go further: [Differential Privacy, Cynthia Dwork] [The Algorithmic Foundations of Differential Privacy, Cynthia Dwork and Aaron Roth] Example of advanced ML pipeline taking into account privacy: [Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data, Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, Kunal Talwar, ICLR 2017, https://arxiv.org/abs/1610.05755] [show image] Keypoints: . train several classifiers on different datasets [the more classifiers, the more private the result will be, but not too many otherwise only small data left to train each classifier] . make an ensemble method, with noise [crucial for (epsilon, delta)-privacy proof] . label (a small part of) a publicly-available dataset using that ensemble classifier . train another network ("student") to learn to imitate that ensemble classifier on that public dataset ("teacher") (in a weakly-supervised manner) . share the student network ==> has not seen any private data! . proofs rely on the number of requests made to the ensemble classifier: so, train the student with as few labeled examples as possible (which gives a privacy bound), hence weakly-supervision; then when sharing the student, this number of requests will not grow, as new requests are addressed to the student, not to the private classifiers, hence intensive usage is possible without privacy drop . results: small accuracy drop only (show results)