Counterfactual Fairness

Counterfactual F air ness Matt Kusner ∗ The Alan T uring Institute and Univ ersity of W arwick mkusner@turing.ac.uk Joshua Loftus ∗ New Y ork Univ ersity loftus@nyu.edu Chris Russell ∗ The Alan T uring Institute and Univ ersity of Surrey crussell@turing.ac.uk Ricardo Silva The Alan T uring Institute and Univ ersity College London ricardo@stats.ucl.ac.uk Abstract Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predicti ve policing. In many of these scenarios, pre vious decisions hav e been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender , or se xual orientation. Since this past data may be biased, machine learning predictors must account for this to av oid perpetuating or creating discriminatory practices. In this paper, we develop a framew ork for modeling fairness using tools from causal inference. Our deﬁnition of counterfactual fairness captures the intuition that a decision is f air tow ards an individual if it is the same in (a) the actual world and (b) a counterfactual w orld where the individual belonged to a different demographic group. W e demonstrate our framew ork on a real-world problem of fair prediction of success in law school. 1 Contribution Machine learning has spread to ﬁelds as di verse as credit scoring [ 20 ], crime prediction [ 5 ], and loan assessment [ 25 ]. Decisions in these areas may hav e ethical or legal implications, so it is necessary for the modeler to think beyond the objecti ve of maximizing prediction accurac y and consider the societal impact of their work. For many of these applications, it is crucial to ask if the predictions of a model are fair . Training data can contain unfairness for reasons having to do with historical prejudices or other factors outside an indi vidual’ s control. In 2016, the Obama administration released a report 2 which urged data scientists to analyze “how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination." There has been much recent interest in designing algorithms that make fair predictions [ 4 , 6 , 10 , 12 , 14 , 16 – 19 , 22 , 24 , 36 – 39 ]. In large part, the literature has focused on formalizing fairness into quantitativ e deﬁnitions and using them to solve a discrimination problem in a certain dataset. Unfortunately , for a practitioner, la w-maker , judge, or anyone else who is interested in implementing algorithms that control for discrimination, it can be difﬁcult to decide whic h deﬁnition of fairness to choose for the task at hand. Indeed, we demonstrate that depending on the relationship between a protected attribute and the data, certain deﬁnitions of f airness can actually increase discrimination . ∗ Equal contribution. This work was done while JL w as a Research Fellow at the Alan T uring Institute. 2 https://obamawhitehouse.archi ves.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-data- and-civil-rights 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. In this paper , we introduce the ﬁrst explicitly causal approach to address fairness. Speciﬁcally , we lev erage the causal frame work of Pearl [30] to model the relationship between protected attrib utes and data. W e describe ho w techniques from causal inference can be effecti ve tools for designing fair algorithms and argue, as in DeDeo [9] , that it is essential to properly address causality in fairness. In perhaps the most closely related prior work, Johnson et al. [15] make similar ar guments but from a non-causal perspectiv e. An alternativ e use of causal modeling in the context of fairness is introduced independently by [21]. In Section 2, we provide a summary of basic concepts in fairness and causal modeling. In Section 3, we provide the formal deﬁnition of counterfactual fairness , which enforces that a distribution o ver possible predictions for an individual should remain unchanged in a world where an individual’ s protected attributes had been dif ferent in a causal sense. In Section 4, we describe an algorithm to implement this deﬁnition, while distinguishing it from e xisting approaches. In Section 5, we illustrate the algorithm with a case of fair assessment of law school success. 2 Background This section provides a basic account of two separate areas of research in machine learning, which are formally uniﬁed in this paper . W e suggest Berk et al. [1] and Pearl et al. [29] as references. Throughout this paper , we will use the following notation. Let A denote the set of pr otected attrib utes of an individual, v ariables that must not be discriminated against in a formal sense deﬁned differently by each notion of fairness discussed. The decision of whether an attribute is protected or not is taken as a primiti ve in an y giv en problem, regardless of the deﬁnition of fairness adopted. Moreover , let X denote the other observable attributes of any particular individual, U the set of relev ant latent attributes which are not observed, and let Y denote the outcome to be predicted, which itself might be contaminated with historical biases. Finally , ˆ Y is the pr edictor , a random variable that depends on A, X and U , and which is produced by a machine learning algorithm as a prediction of Y . 2.1 Fair ness There has been much recent work on fair algorithms. These include fairness through una wareness [ 12 ], indi vidual fairness [ 10 , 16 , 24 , 38 ], demographic parity/disparate impact [ 36 ], and equality of opportunity [ 14 , 37 ]. For simplicity we often assume A is encoded as a binary attribute, b ut this can be generalized. Deﬁnition 1 (Fairness Through Una wareness (FTU)) . An algorithm is fair so long as any pr otected attributes A are not e xplicitly used in the decision-making pr ocess. Any mapping ˆ Y : X → Y that e xcludes A satisﬁes this. Initially proposed as a baseline, the approach has found fa vor recently with more general approaches such as Grgic-Hlaca et al. [12] . Despite its compelling simplicity , FTU has a clear shortcoming as elements of X can contain discriminatory information analogous to A that may not be obvious at ﬁrst. The need for expert knowledge in assessing the relationship between A and X was highlighted in the work on indi vidual fairness: Deﬁnition 2 (Individual F airness (IF)) . An algorithm is fair if it gives similar pr edictions to similar individuals. F ormally , given a metric d ( · , · ) , if individuals i and j ar e similar under this metric (i.e., d ( i, j ) is small) then their pr edictions should be similar: ˆ Y ( X ( i ) , A ( i ) ) ≈ ˆ Y ( X ( j ) , A ( j ) ) . As described in [ 10 ], the metric d ( · , · ) must be carefully chosen, requiring an understanding of the domain at hand be yond black-box statistical modeling. This can also be contrasted against population lev el criteria such as Deﬁnition 3 (Demographic Parity (DP)) . A pr edictor ˆ Y satisﬁes demographic parity if P ( ˆ Y | A = 0) = P ( ˆ Y | A = 1) . Deﬁnition 4 (Equality of Opportunity (EO)) . A pr edictor ˆ Y satisﬁes equality of opportunity if P ( ˆ Y = 1 | A = 0 , Y = 1) = P ( ˆ Y = 1 | A = 1 , Y = 1) . These criteria can be incompatible in general, as discussed in [ 1 , 7 , 22 ]. Follo wing the motiv ation of IF and [ 15 ], we propose that kno wledge about relationships between all attrib utes should be taken into consideration, e ven if strong assumptions are necessary . Moreover , it is not immediately clear 2 for any of these approaches in which ways historical biases can be tackled. W e approach such issues from an explicit causal modeling perspecti ve. 2.2 Causal Models and Counterfactuals W e follo w Pearl [28], and deﬁne a causal model as a triple ( U, V , F ) of sets such that • U is a set of latent background variables,which are factors not caused by an y variable in the set V of observable v ariables; • F is a set of functions { f 1 , . . . , f n } , one for each V i ∈ V , such that V i = f i ( pa i , U pa i ) , pa i ⊆ V \{ V i } and U pa i ⊆ U . Such equations are also known as structural equations [ 2 ]. The notation “ pa i ” refers to the “parents” of V i and is moti vated by the assumption that the model factorizes as a directed graph, here assumed to be a directed ac yclic graph (D AG). The model is causal in that, giv en a distribution P ( U ) ov er the background v ariables U , we can deri ve the distrib ution of a subset Z ⊆ V follo wing an intervention on V \ Z . An intervention on variable V i is the substitution of equation V i = f i ( pa i , U pa i ) with the equation V i = v for some v . This captures the idea of an agent, external to the system, modifying it by forcefully assigning v alue v to V i , for example as in a randomized experiment. The speciﬁcation of F is a strong assumption but allows for the calculation of counterfactual quantities. In brief, consider the following counterf actual statement, “the value of Y if Z had taken v alue z ”, for two observ able v ariables Z and Y . By assumption, the state of any observ able v ariable is fully determined by the background v ariables and structural equations. The counterfactual is modeled as the solution for Y for a giv en U = u where the equations for Z are replaced with Z = z . W e denote it by Y Z ← z ( u ) [28], and sometimes as Y z if the context of the notation is clear . Counterfactual inference, as speciﬁed by a causal model ( U, V , F ) gi ven e vidence W , is the computa- tion of probabilities P ( Y Z ← z ( U ) | W = w ) , where W , Z and Y are subsets of V . Inference proceeds in three steps, as explained in more detail in Chapter 4 of Pearl et al. [29] : 1. Abduction : for a giv en prior on U , compute the posterior distribution of U gi ven the evidence W = w ; 2. Action : substitute the equations for Z with the interventional values z , resulting in the modiﬁed set of equations F z ; 3. Prediction : compute the implied distribution on the remaining elements of V using F z and the posterior P ( U | W = w ) . 3 Counterfactual F airness Giv en a predictiv e problem with fairness considerations, where A , X and Y represent the protected attributes, remaining attrib utes, and output of interest respectiv ely , let us assume that we are giv en a causal model ( U, V , F ) , where V ≡ A ∪ X . W e postulate the following criterion for predictors of Y . Deﬁnition 5 (Counterfactual fairness) . Pr edictor ˆ Y is counterfactually fair if under any context X = x and A = a , P ( ˆ Y A ← a ( U ) = y | X = x, A = a ) = P ( ˆ Y A ← a 0 ( U ) = y | X = x, A = a ) , (1) for all y and for any value a 0 attainable by A . This notion is closely related to actual causes [ 13 ], or token causality in the sense that, to be fair, A should not be a cause of ˆ Y in any indi vidual instance. In other words, changing A while holding things which are not causally dependent on A constant will not change the distribution of ˆ Y . W e also emphasize that counterfactual fairness is an individual-le vel deﬁnition. This is substantially different from comparing different indi viduals that happen to share the same “treatment” A = a and coincide on the values of X , as discussed in Section 4.3.1 of [ 29 ] and the Supplementary Material. Dif ferences between X a and X a 0 must be caused by variations on A only . Notice also that this deﬁnition is agnostic with respect to how good a predictor ˆ Y is, which we discuss in Section 4. Relation to individual fairness . IF is agnostic with respect to its notion of similarity metric, which is both a strength (generality) and a weakness (no uniﬁed w ay of deﬁning similarity). Counterfactuals and similarities are related, as in the classical notion of distances between “worlds” corresponding to different counterfactuals [ 23 ]. If ˆ Y is a deterministic function of W ⊂ A ∪ X ∪ U , as in se veral of 3 A X Y U A X Y U ( a ) ( b ) A Y U Y U A Employed A Y U Y U A Prejudiced Qualiﬁcations a Employed a Y a Employed Y a 0 a 0 a 0 Employed A Y U Y U A Prejudiced Qualiﬁcations ( c ) ( d ) ( e ) Figure 1: (a), (b) T w o causal models for dif ferent real-world fair prediction scenarios. See Section 3.1 for discussion. (c) The graph corresponding to a causal model with A being the protected attrib ute and Y some outcome of interest, with background v ariables assumed to be independent. (d) Expanding the model to include an intermediate variable indicating whether the individual is employed with two (latent) background variables Prejudiced (if the person offeri ng the job is prejudiced) and Qualiﬁcations (a measure of the indi vidual’ s qualiﬁcations). (e) A twin netw ork representation of this system [ 28 ] under two different counterf actual levels for A . This is created by copying nodes descending from A , which inherit unaffected parents from the f actual world. our examples to follo w , then IF can be deﬁned by treating equally two individuals with the same W in a way that is also counterfactually f air . Relation to Pearl et al. [29] . In Example 4.4.4 of [ 29 ], the authors condition instead on X , A , and the observed realization of ˆ Y , and calculate the probability of the counterf actual realization ˆ Y A ← a 0 differing from the factual. This example conﬂates the predictor ˆ Y with the outcome Y , of which we remain agnostic in our deﬁnition but which is used in the construction of ˆ Y as in Section 4. Our framing makes the connection to machine learning more explicit. 3.1 Examples T o pro vide an intuition for counterfactual fairness, we will consider tw o real-world f air prediction sce- narios: insurance pricing and crime prediction . Each of these correspond to one of the two causal graphs in Figure 1(a),(b). The Supplementary Material provides a more mathematical discussion of these examples with more detailed insights. Scenario 1: The Red Car . A car insurance company wishes to price insurance for car owners by predicting their accident rate Y . They assume there is an unobserved factor corresponding to aggressiv e driving U , that (a) causes dri vers to be more likely hav e an accident, and (b) causes individuals to prefer red cars (the observed variable X ). Moreo ver , indi viduals belonging to a certain race A are more likely to dri ve red cars. Howe ver , these indi viduals are no more likely to be aggressiv e or to get in accidents than any one else. W e show this in Figure 1(a). Thus, using the red car feature X to predict accident rate Y would seem to be an unfair prediction because it may charge indi viduals of a certain race more than others, e ven though no race is more likely to ha ve an accident. Counterfactual fairness agrees with this notion: changing A while holding U ﬁxed will also change X and, consequently , ˆ Y . Interestingly , we can show (Supplementary Material) that in a linear model, regressing Y on A and X is equiv alent to regressing on U , so off-the-shelf re gression here is counterfactually fair . Regressing Y on X alone obeys the FTU criterion b ut is not counterfactually fair , so omitting A (FTU) may intr oduce unfairness into an otherwise fair world. Scenario 2: High Crime Regions. A city gov ernment wants to estimate crime rates by neighbor - hood to allocate policing resources. Its analyst constructed training data by merging (1) a re gistry of residents containing their neighborhood X and race A , with (2) police records of arrests, giving each resident a binary label with Y = 1 indicating a criminal arrest record. Due to historically segre gated housing, the location X depends on A . Locations X with more police resources have lar ger numbers of arrests Y . And ﬁnally , U represents the totality of socioeconomic f actors and policing practices that both inﬂuence where an indi vidual may liv e and how likely the y are to be arrested and charged. This can all be seen in Figure 1(b). In this example, higher observed arrest rates in some neighborhoods are due to greater policing there, not because people of dif ferent races are any more or less likely to break the law . The label Y = 0 4 does not mean someone has ne ver committed a crime, b ut rather that they hav e not been caught. If individuals in the tr aining data have not alr eady had equal opportunity , algorithms enfor cing EO will not r emedy such unfairness . In contrast, a counterfactually fair approach would model dif ferential enforcement rates using U and base predictions on this information rather than on X directly . In general, we need a multistage procedure in which we ﬁrst deriv e latent v ariables U , and then based on them we minimize some loss with respect to Y . This is the core of the algorithm discussed next. 3.2 Implications One simple but important implication of the deﬁnition of counterf actual fairness is the following: Lemma 1. Let G be the causal graph of the given model ( U, V , F ) . Then ˆ Y will be counterfactually fair if it is a function of the non-descendants of A . Pr oof. Let W be any non-descendant of A in G . Then W A ← a ( U ) and W A ← a 0 ( U ) hav e the same distribution by the three inferential steps in Section 2.2. Hence, the distribution of any function ˆ Y of the non-descendants of A is in v ariant with respect to the counterfactual values of A . This does not exclude using a descendant W of A as a possible input to ˆ Y . Howe ver , this will only be possible in the case where the overall dependence of ˆ Y on A disappears, which will not happen in general. Hence, Lemma 1 provides the most straightforward way to achie ve counterfactual f airness. In some scenarios, it is desirable to deﬁne path-speciﬁc v ariations of counterfactual fairness that allo w for the inclusion of some descendants of A , as discussed by [ 21 , 27 ] and the Supplementary Material. Ancestral closure of protected attributes. Suppose that a parent of a member of A is not in A . Counterfactual fairness allo ws for the use of it in the deﬁnition of ˆ Y . If this seems counterintuitiv e, then we argue that the fault should be at the postulated set of protected attributes rather than with the deﬁnition of counterfactual fairness, and that typically we should expect set A to be closed under ancestral relationships gi ven by the causal graph. For instance, if Race is a protected attrib ute, and Mother’ s race is a parent of Race , then it should also be in A . Dealing with historical biases and an existing fairness paradox. The explicit dif ference between ˆ Y and Y allows us to tackle historical biases. For instance, let Y be an indicator of whether a client defaults on a loan, while ˆ Y is the actual decision of giving the loan. Consider the D AG A → Y , shown in Figure 1(c) with the e xplicit inclusion of set U of independent background variables. Y is the objecti vely ideal measure for decision making, the binary indicator of the e vent that the indi vidual defaults on a loan. If A is postulated to be a protected attribute, then the predictor ˆ Y = Y = f Y ( A, U ) is not counterfactually fair , with the arrow A → Y being (for instance) the result of a world that punishes indi viduals in a way that is out of their control. Figure 1(d) shows a ﬁner -grained model, where the path is mediated by a measure of whether the person is employed, which is itself caused by two background factors: one representing whether the person hiring is prejudiced, and the other the employee’ s qualiﬁcations. In this world, A is a cause of defaulting, ev en if mediated by other variables 3 . The counterfactual fairness principle ho wev er forbids us from using Y : using the twin network 4 of Pearl [28] , we see in Figure 1(e) that Y a and Y a 0 need not be identically distributed giv en the background variables. In contrast, any function of variables not descendants of A can be used a basis for fair decision making. This means that any v ariable ˆ Y deﬁned by ˆ Y = g ( U ) will be counterfactually f air for any function g ( · ) . Hence, given a causal model, the functional deﬁned by the function g ( · ) minimizing some predictiv e error for Y will satisfy the criterion, as proposed in Section 4.1. W e are essentially learning a projection of Y into the space of fair decisions, removing historical biases as a by-product. Counterfactual fairness also pro vides an answer to some problems on the incompatibility of fairness criteria. In particular , consider the following problem raised independently by different authors (e.g., 3 For e xample, if the function determining employment f E ( A, P, Q ) ≡ I ( Q> 0 ,P =0 or A 6 = a ) then an indi vidual with sufﬁcient qualiﬁcations and prejudiced potential employer may ha ve a different counterfactual emplo yment value for A = a compared to A = a 0 , and a different chance of def ault. 4 In a nutshell, this is a graph that simultaneously depicts “multiple worlds” parallel to the f actual realizations. In this graph, all multiple w orlds share the same background variables, but with dif ferent consequences in the remaining variables depending on which counterf actual assignments are provided. 5 [ 7 , 22 ]), illustrated below for the binary case: ideally , we would like our predictors to obey both Equality of Opportunity and the pr edictive parity criterion deﬁned by satisfying P ( Y = 1 | ˆ Y = 1 , A = 1) = P ( Y = 1 | ˆ Y = 1 , A = 0) , as well as the corresponding equation for ˆ Y = 0 . It has been shown that if Y and A are marginally associated (e.g., recidivism and race are associated) and Y is not a deterministic function of ˆ Y , then the two criteria cannot be reconciled. Counterfactual fairness throws a light in this scenario, suggesting that both EO and predictive parity may be insufﬁcient if Y and A are associated: assuming that A and Y are unconfounded (as expected for demographic attributes), this is the result of A being a cause of Y . By counterfactual fairness, we should not want to use Y as a basis for our decisions, instead aiming at some function Y ⊥ A of variables which are not caused by A but are predicti ve of Y . ˆ Y is deﬁned in such a way that is an estimate of the “closest” Y ⊥ A to Y according to some preferred risk function. This makes the incompatibility between EO and pr edictive parity irr elevant , as A and Y ⊥ A will be independent by construction giv en the model assumptions. 4 Implementing Counterfactual F airness As discussed in the previous Section, we need to relate ˆ Y to Y if the predictor is to be useful, and we restrict ˆ Y to be a (parameterized) function of the non-descendants of A in the causal graph following Lemma 1. W e next introduce an algorithm, then discuss assumptions that can be used to express counterfactuals. 4.1 Algorithm Let ˆ Y ≡ g θ ( U, X  A ) be a predictor parameterized by θ , such as a logistic regression or a neural network, and where X  A ⊆ X are non-descendants of A . Given a loss function l ( · , · ) such as squared loss or log-likelihood, and training data D ≡ { ( A ( i ) , X ( i ) , Y ( i ) ) } for i = 1 , 2 , . . . , n , we deﬁne L ( θ ) ≡ P n i =1 E [ l ( y ( i ) , g θ ( U ( i ) , x ( i )  A )) | x ( i ) , a ( i ) ] /n as the empirical loss to be minimized with respect to θ . Each expectation is with respect to random variable U ( i ) ∼ P M ( U | x ( i ) , a ( i ) ) where P M ( U | x, a ) is the conditional distribution of the background variables as gi ven by a causal model M that is av ailable by assumption. If this expectation cannot be calculated analytically , Markov chain Monte Carlo (MCMC) can be used to approximate it as in the follo wing algorithm. 1: procedur e F A I R L E A R N I N G ( D , M )  Learned parameters ˆ θ 2: For each data point i ∈ D , sample m MCMC samples U ( i ) 1 , . . . , U ( i ) m ∼ P M ( U | x ( i ) , a ( i ) ) . 3: Let D 0 be the augmented dataset where each point ( a ( i ) , x ( i ) , y ( i ) ) in D is replaced with the corresponding m points { ( a ( i ) , x ( i ) , y ( i ) , u ( i ) j ) } . 4: ˆ θ ← argmin θ P i 0 ∈D 0 l ( y ( i 0 ) , g θ ( U ( i 0 ) , x ( i 0 )  A )) . 5: end procedur e At prediction time, we report ˜ Y ≡ E [ ˆ Y ( U ? , x ?  A ) | x ? , a ? ] for a ne w data point ( a ? , x ? ) . Decon volution perspective. The algorithm can be understood as a decon volution approach that, giv en observables A ∪ X , extracts its latent sources and pipelines them into a predicti ve model. W e advocate that counterfactual assumptions must underlie all appr oaches that claim to extract the sour ces of variation of the data as “fair” latent components . As an example, Louizos et al. [24] start from the D A G A → X ← U to extract P ( U | X , A ) . As U and A are not independent giv en X in this representation, a type of penalization is enforced to create a posterior P f air ( U | A, X ) that is close to the model posterior P ( U | A, X ) while satisfying P f air ( U | A = a, X ) ≈ P f air ( U | A = a 0 , X ) . But this is neither necessary nor sufﬁcient for counterfactual fairness . The model for X giv en A and U must be justiﬁed by a causal mechanism, and that being the case, P ( U | A, X ) requires no postprocessing. As a matter of fact, model M can be learned by penalizing empirical dependence measures between U and pa i for a gi ven V i (e.g. Mooij et al. [26] ), but this concerns M and not ˆ Y , and is motiv ated by explicit assumptions about structural equations, as described ne xt. 6 4.2 Designing the Input Causal Model Model M must be provided to algorithm F A I R L E A R N I N G . Although this is well understood, it is worthwhile remembering that causal models always require strong assumptions, e ven more so when making counterfactual claims [ 8 ]. Counterfactuals assumptions such as structural equations are in general unfalsiﬁable e ven if interventional data for all v ariables is av ailable. This is because there are inﬁnitely many structural equations compatible with the same observ able distribution [ 28 ], be it observational or interv entional. Having passed testable implications, the remaining components of a counterfactual model should be understood as conjectures formulated according to the best of our knowledge. Such models should be deemed provisional and prone to modiﬁcations if, for example, new data containing measurement of v ariables pre viously hidden contradict the current model. W e point out that we do not need to specify a fully deterministic model, and structural equations can be relaxed as conditional distributions. In particular , the concept of counterfactual fairness holds under three lev els of assumptions of increasing strength: Level 1. Build ˆ Y using only the observable non-descendants of A . This only requires partial causal ordering and no further causal assumptions, but in man y problems there will be few , if any , observables which are not descendants of protected demographic f actors. Level 2. Postulate background latent variables that act as non-deterministic causes of observable variables, based on explicit domain knowledge and learning algorithms 5 . Information about X is passed to ˆ Y via P ( U | x, a ) . Level 3. Postulate a fully deterministic model with latent variables. For instance, the distribution P ( V i | pa i ) can be treated as an additi ve error model, V i = f i ( pa i ) + e i [ 31 ]. The error term e i then becomes an input to ˆ Y as calculated from the observ ed variables. This maximizes the information extracted by the fair predictor ˆ Y . 4.3 Further Considerations on Designing the Input Causal Model One might ask what we can lose by deﬁning causal fairness measures inv olving only non- counterfactual causal quantities, such as enforcing P ( ˆ Y = 1 | do ( A = a )) = P ( ˆ Y = 1 | do ( A = a 0 )) instead of our counterfactual criterion. The reason is that the above equation is only a constraint on an av erage effect. Obeying this criterion provides no guarantees against, for example, having half of the individuals being strongly “ne gativ ely” discriminated and half of the individuals strongly “positiv ely” discriminated. W e advocate that, for f airness, society should not be satisﬁed in pursuing only counterfactually-free guarantees. While one may be willing to claim posthoc that the equation abov e masks no balancing effect so that individuals recei ve approximately the same distribution of outcomes, that itself is just a counterfactual claim in disguise . Our approach is to make counterfactual assumptions explicit. When unfairness is judged to follow only some “pathways” in the causal graph (in a sense that can be made formal, see [ 21 , 27 ]), nonparametric assumptions about the independence of counterfactuals may suf ﬁce, as discussed by [ 27 ]. In general, nonparametric assumptions may not provide identiﬁable adjustments e ven in this case, as also discussed in our Supplementary Material. If competing models with different untestable assumptions are av ailable, there are ways of simultane- ously enforcing a notion of approximate counterfactual fairness in all of them, as introduced by us in [32]. Other alternatives include e xploiting bounds on the contribution of hidden v ariables [29, 33]. Another issue is the interpretation of causal claims inv olving demographic variables such as race and sex. Our view is that such constructs are the result of translating complex ev ents into random v ariables and, despite some contro versy , we consider counterproductive to claim that e.g. race and sex cannot be causes. An idealized interv ention on some A at a particular time can be seen as a notational shortcut to express a conjunction of more speciﬁc interventions, which may be individually doable but jointly impossible in practice. It is the plausibility of complex, e ven if impossible to practically manipulate, causal chains from A to Y that allows us to claim that unfairness is real [ 11 ]. Experiments for constructs exist, such as randomizing names in job applications to make them race-blind. They do not contradict the notion of race as a cause, and can be interpreted as an interv ention on a particular aspect of the construct “race, ” such as “race perception” (e.g. Section 4.4.4 of [29]). 5 In some domains, it is actually common to build a model entirely around latent constructs with few or no observable parents nor connections among observ ed variables [2]. 7 5 Illustration: Law School Success W e illustrate our approach on a practical problem that requires f airness, the prediction of success in law school . A second problem, understanding the contribution of race to police stops , is described in the Supplementary Material. Following closely the usual frame work for assessing causal models in the machine learning literature, the goal of this experiment is to quantify how our algorithm beha ves with ﬁnite sample sizes while assuming ground truth compatible with a synthetic model. Problem deﬁnition: Law school success The Law School Admission Council conducted a surve y across 163 law schools in the United States [ 35 ]. It contains information on 21,790 law students such as their entrance exam scores (LSA T), their grade-point av erage (GP A) collected prior to law school, and their ﬁrst year a verage grade (FY A). Giv en this data, a school may wish to predict if an applicant will have a high FY A. The school would also like to make sure these predictions are not biased by an individual’ s race and sex. Howe ver , the LSA T , GP A, and FY A scores, may be biased due to social factors. W e compare our framework with two unfair baselines: 1. Full : the standard technique of using all features, including sensiti ve features such as race and sex to make predictions; 2. Unaware : fairness through unawareness, where we do not use race and sex as features. For comparison, we generate predictors ˆ Y for all models using logistic regression. Fair prediction. As described in Section 4.2, there are three ways in which we can model a counterfactually fair predictor of FY A. Lev el 1 uses any features which are not descendants of race and sex for prediction. Level 2 models latent ‘f air’ variables which are parents of observed v ariables. These variables are independent of both race and sex. Lev el 3 models the data using an additive error model, and uses the independent error terms to make predictions. These models make increasingly strong assumptions corresponding to increased predicti ve power . W e split the dataset 80/20 into a train/test set, preserving label balance, to ev aluate the models. As we belie ve LSA T , GP A, and FY A are all biased by race and sex, we cannot use any observed features to construct a counterfactually fair predictor as described in Le vel 1. In Le vel 2, we postulate that a latent variable: a student’ s knowledge (K), af fects GP A, LSA T , and FY A scores. The causal graph corresponding to this model is shown in Figure 2, ( Level 2 ). This is a short-hand for the distributions: GP A ∼ N ( b G + w K G K + w R G R + w S G S, σ G ) , FY A ∼ N ( w K F K + w R F R + w S F S, 1) , LSA T ∼ Poisson (exp( b L + w K L K + w R L R + w S L S )) , K ∼ N (0 , 1) W e perform inference on this model using an observ ed training set to estimate the posterior distribution of K . W e use the probabilistic programming language Stan [ 34 ] to learn K . W e call the predictor constructed using K , Fair K . K now G PA L SA T F YA R ace S ex G PA L SA T F YA R ace S ex Level 2 Level 3 ✏ G ✏ L ✏ F 0 1 2 3 − 1.0 − 0.5 0.0 0.5 pred_zfy a density type or iginal sw apped 0 1 2 3 − 1.0 − 0.5 0.0 0.5 pred_zfy a density type or iginal sw apped 0 1 2 3 − 1.0 − 0.5 0.0 0.5 pred_zfy a density type or iginal sw apped 0 1 2 3 − 1.0 − 0.5 0.0 0.5 pred_zfy a density type or iginal sw apped 0.0 0.5 1.0 1.5 2.0 − 0.5 0.0 0.5 pred_zfy a density type or iginal sw apped 0.0 0.5 1.0 1.5 2.0 − 0.4 0.0 0.4 0.8 pred_zfy a density type or iginal sw apped 0.0 0.5 1.0 1.5 2.0 − 0.4 0.0 0.4 0.8 pred_zfy a density type or iginal sw apped 0.0 0.5 1.0 1.5 2.0 − 0.4 0.0 0.4 0.8 pred_zfy a density type or iginal sw apped FY A V FY A V FY A V FY A V FY A V FY A V FY A V density density density density density density density density female $ male black $ white asian $ white mexican $ white Full Unaware original data counter- factual Figure 2: Left: A causal model for the problem of predicting law school success fairly . Right: Density plots of predicted FY A a and FY A a 0 . In Level 3, we model GP A, LSA T , and FY A as continuous v ariables with additi ve error terms independent of race and sex (that may in turn be correlated with one-another). This model is shown 8 T able 1: Prediction results using logistic regression. Note that we must sacriﬁce a small amount of accuracy to ensuring counterfactually f air prediction (Fair K , Fair Add), v ersus the models that use unfair features: GP A, LSA T , race, sex (Full, Unaware). Full Unaware F air K Fair Add RMSE 0.873 0.894 0.929 0.918 in Figure 2, ( Level 3 ), and is e xpressed by: GP A = b G + w R G R + w S G S +  G ,  G ∼ p (  G ) LSA T = b L + w R L R + w S L S +  L ,  L ∼ p (  L ) FY A = b F + w R F R + w S F S +  F ,  F ∼ p (  F ) W e estimate the error terms  G ,  L by ﬁrst ﬁtting two models that each use race and sex to indi vidually predict GP A and LSA T . W e then compute the residuals of each model (e.g.,  G = GP A − ˆ Y GP A ( R, S ) ). W e use these residual estimates of  G ,  L to predict FY A. W e call this F air Add . Accuracy . W e compare the RMSE achie ved by logistic regression for each of the models on the test set in T able 1. The Full model achiev es the lowest RMSE as it uses race and se x to more accurately reconstruct FY A. Note that in this case, this model is not fair ev en if the data was generated by one of the models shown in Figure 2 as it corresponds to Scenario 3. The (also unfair) Unaware model still uses the unfair variables GP A and LSA T , but because it does not use race and sex it cannot match the RMSE of the Full model. As our models satisfy counterf actual fairness, they trade of f some accurac y . Our ﬁrst model Fair K uses weaker assumptions and thus the RMSE is highest. Using the Lev el 3 assumptions, as in Fair Add we produce a counterfactually f air model that trades slightly stronger assumptions for lower RMSE. Counterfactual fairness. W e would like to empirically test whether the baseline methods are counterfactually fair . T o do so we will assume the true model of the world is giv en by Figure 2, ( Level 2 ). W e can ﬁt the parameters of this model using the observed data and ev aluate counterfactual fairness by sampling from it. Speciﬁcally , we will generate samples from the model giv en either the observed race and sex, or counterfactual race and se x variables. W e will ﬁt models to both the original and counterfactual sampled data and plot how the distrib ution of predicted FY A changes for both baseline models. Figure 2 shows this, where each row corresponds to a baseline predictor and each column corresponds to the counterfactual change. In each plot, the blue distribution is density of predicted FY A for the original data and the red distribution is this density for the counterfactual data. If a model is counterf actually fair we w ould expect these distrib utions to lie e xactly on top of each other . Instead, we note that the Full model exhibits counterfactual unf airness for all counterfactuals except sex. W e see a similar trend for the Unaware model, although it is closer to being counterfactually fair . T o see why these models seem to be fair w .r .t. to sex we can look at weights of the D A G which generates the counterfactual data. Speciﬁcally the DA G weights from (male,female) to GP A are ( 0 . 93 , 1 . 06 ) and from (male,female) to LSA T are ( 1 . 1 , 1 . 1 ). Thus, these models are fair w .r .t. to sex simply because of a very weak causal link between se x and GP A/LSA T . 6 Conclusion W e hav e presented a new model of fairness we refer to as counterfactual fairness . It allo ws us to propose algorithms that, rather than simply ignoring protected attributes, are able to take into account the different social biases that may arise tow ards individuals based on ethically sensitive attributes and compensate for these biases ef fectiv ely . W e experimentally contrast ed our approach with pre vious fairness approaches and sho w that our explicit causal models capture these social biases and make clear the implicit trade-off between prediction accuracy and f airness in an unfair world. W e propose that fairness should be regulated by explicitly modeling the causal structure of the world. Criteria based purely on probabilistic independence cannot satisfy this and are unable to address how unfairness is occurring in the task at hand. By providing such causal tools for addressing fairness questions we hope we can provide practitioners with customized techniques for solving a wide array of fairness modeling problems. 9 Acknowledgments This work w as supported by the Alan T uring Institute under the EPSRC grant EP/N510129/1. CR acknowledges additional support under the EPSRC Platform Grant EP/P022529/1. W e thank Adrian W eller for insightful feedback, and the anonymous re viewers for helpful comments. References [1] Berk, R., Heidari, H., Jabbari, S., Kearns, M., and Roth, A. Fairness in criminal justice risk assessments: The state of the art. , 2017. [2] Bollen, K. Structural Equations with Latent Variables . John W iley & Sons, 1989. [3] Bollen, K. and (eds.), J. Long. T esting Structural Equation Models . SA GE Publications, 1993. [4] Bolukbasi, T olga, Chang, Kai-W ei, Zou, James Y , Saligrama, V enkatesh, and Kalai, Adam T . Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in Neural Information Pr ocessing Systems , pp. 4349–4357, 2016. [5] Brennan, T im, Dieterich, W illiam, and Ehret, Beate. Evaluating the predicti ve v alidity of the compas risk and needs assessment system. Criminal J ustice and Behavior , 36(1):21–40, 2009. [6] Calders, T oon and V erwer , Sicco. Three nai ve bayes approaches for discrimination-free classiﬁ- cation. Data Mining and Knowledge Discovery , 21(2):277–292, 2010. [7] Chouldechov a, A. Fair prediction with disparate impact: a study of bias in recidi vism prediction instruments. Big Data , 2:153–163, 2017. [8] Dawid, A. P . Causal inference without counterfactuals. Journal of the American Statistical Association , pp. 407–448, 2000. [9] DeDeo, Simon. Wrong side of the tracks: Big data and protected categories. arXiv preprint arXiv:1412.4643 , 2014. [10] Dwork, Cynthia, Hardt, Moritz, Pitassi, T oniann, Reingold, Omer , and Zemel, Richard. Fairness through awareness. In Proceedings of the 3r d Innovations in Theor etical Computer Science Confer ence , pp. 214–226. A CM, 2012. [11] Glymour , C. and Glymour , M. R. Commentary: Race and sex are causes. Epidemiology , 25(4): 488–490, 2014. [12] Grgic-Hlaca, Nina, Zafar , Muhammad Bilal, Gummadi, Krishna P , and W eller , Adrian. The case for process fairness in learning: Feature selection for fair decision making. NIPS Symposium on Machine Learning and the Law , 2016. [13] Halpern, J. Actual Causality . MIT Press, 2016. [14] Hardt, Moritz, Price, Eric, Srebro, Nati, et al. Equality of opportunity in supervised learning. In Advances in Neural Information Pr ocessing Systems , pp. 3315–3323, 2016. [15] Johnson, K ory D, Foster , Dean P , and Stine, Robert A. Impartial predicti ve modeling: Ensuring fairness in arbitrary models. arXiv pr eprint arXiv:1608.00528 , 2016. [16] Joseph, Matthew , Kearns, Michael, Morgenstern, Jamie, Neel, Seth, and Roth, Aaron. Rawlsian fairness for machine learning. arXiv pr eprint arXiv:1610.09559 , 2016. [17] Kamiran, Faisal and Calders, T oon. Classifying without discriminating. In Computer , Contr ol and Communication, 2009. IC4 2009. 2nd International Confer ence on , pp. 1–6. IEEE, 2009. [18] Kamiran, Faisal and Calders, T oon. Data preprocessing techniques for classiﬁcation without discrimination. Knowledge and Information Systems , 33(1):1–33, 2012. [19] Kamishima, T oshihiro, Akaho, Shotaro, and Sakuma, Jun. Fairness-aware learning through regularization approach. In Data Mining W orkshops (ICDMW), 2011 IEEE 11th International Confer ence on , pp. 643–650. IEEE, 2011. 10 [20] Khandani, Amir E, Kim, Adlar J, and Lo, Andrew W . Consumer credit-risk models via machine-learning algorithms. Journal of Banking & F inance , 34(11):2767–2787, 2010. [21] Kilbertus, N., Carulla, M. R., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. A voiding discrimination through causal reasoning. Advances in Neural Information Pr ocessing Systems 30 , 2017. [22] Kleinberg, J., Mullainathan, S., and Ragha van, M. Inherent trade-offs in the fair determination of risk scores. Pr oceedings of The 8th Inno vations in Theor etical Computer Science Confer ence (ITCS 2017) , 2017. [23] Lewis, D. Counterfactuals . Harv ard Univ ersity Press, 1973. [24] Louizos, Christos, Swersk y , K evin, Li, Y ujia, W elling, Max, and Zemel, Richard. The variational fair autoencoder . arXiv preprint , 2015. [25] Mahoney , John F and Mohen, James M. Method and system for loan origination and underwrit- ing, October 23 2007. US Patent 7,287,008. [26] Mooij, J., Janzing, D., Peters, J., and Scholkopf, B. Regression by dependence minimization and its application to causal inference in additiv e noise models. In Pr oceedings of the 26th Annual International Confer ence on Machine Learning , pp. 745–752, 2009. [27] Nabi, R. and Shpitser , I. Fair inference on outcomes. , 2017. [28] Pearl, J. Causality: Models, Reasoning and Infer ence . Cambridge University Press, 2000. [29] Pearl, J., Glymour , M., and Jewell, N. Causal Inference in Statistics: a Primer . Wile y , 2016. [30] Pearl, Judea. Causal inference in statistics: An ov erview . Statistics Surveys , 3:96–146, 2009. [31] Peters, J., Mooij, J. M., Janzing, D., and Schölkopf, B. Causal discovery with continuous additiv e noise models. Journal of Machine Learning Resear ch , 15:2009–2053, 2014. URL http://jmlr.org/papers/v15/peters14a.html . [32] Russell, C., Kusner , M., Loftus, J., and Silva, R. When worlds collide: integrating different counterfactual assumptions in f airness. Advances in Neural Information Pr ocessing Systems , 31, 2017. [33] Silva, R. and Evans, R. Causal inference through a witness protection program. Journal of Machine Learning Resear ch , 17(56):1–53, 2016. [34] Stan Dev elopment T eam. Rstan: the r interface to stan, 2016. R package version 2.14.1. [35] W ightman, Linda F . Lsac national longitudinal bar passage study . lsac research report series. 1998. [36] Zafar , Muhammad Bilal, V alera, Isabel, Rodriguez, Manuel Gomez, and Gummadi, Krishna P . Learning fair classiﬁers. arXiv pr eprint arXiv:1507.05259 , 2015. [37] Zafar , Muhammad Bilal, V alera, Isabel, Rodriguez, Manuel Gomez, and Gummadi, Krishna P . Fairness be yond disparate treatment & disparate impact: Learning classiﬁcation without dis- parate mistreatment. arXiv pr eprint arXiv:1610.08452 , 2016. [38] Zemel, Richard S, W u, Y u, Swersky , K evin, Pitassi, T oniann, and Dwork, Cynthia. Learning fair representations. ICML (3) , 28:325–333, 2013. [39] Zliobaite, Indre. A survey on measuring indirect discrimination in machine learning. arXiv pr eprint arXiv:1511.00148 , 2015. 11 S1 Population Le vel vs Individual Le vel Causal Effects As discussed in Section 3, counterfactual fairness is an individual-le vel deﬁnition. This is funda- mentally dif ferent from comparing different units that happen to share the same “treatment” A = a and coincide on the values of X . T o see in detail what this means, consider the following thought experiment. Let us assess the causal effect of A on ˆ Y by controlling A at two le vels, a and a 0 . In Pearl’ s notation, where “ do ( A = a ) ” expresses an intervention on A at lev el a , we have that E [ ˆ Y | do ( A = a ) , X = x ] − E [ ˆ Y | do ( A = a 0 ) , X = x ] , (2) is a measure of causal effect, sometimes called the av erage causal effect (A CE). It expresses the change that is expected when we intervene on A while observing the attribute set X = x , under two lev els of treatment. If this ef fect is non-zero, A is considered to be a cause of ˆ Y . This raises a subtlety that needs to be addressed: in general, this effect will be non-zero e ven if ˆ Y is counterfactually fair . This may sound counter-intuiti ve: protected attributes such as race and gender are causes of our counterfactually fair decisions. In fact, this is not a contradiction, as the ACE in Equation (2) is dif ferent from counterfactual effects. The A CE contrasts two independent exchangeable units of the population, and it is a perfectly valid way of performing decision analysis. Howe ver , the value of X = x is affected by different background v ariables corresponding to different indi viduals. That is, the causal ef fect (2) contrasts two units that receiv e different treatments but which happen to coincide on X = x . T o gi ve a synthetic example, imagine the simple structural equation X = A + U. The A CE quantiﬁes what happens among people with U = x − a against people with U 0 = x − a 0 . If, for instance, ˆ Y = λU for λ 6 = 0 , then the effect (2) is λ ( a − a 0 ) 6 = 0 . Contrary to that, the counterfactual dif ference is zero. That is, E [ ˆ Y A ← a ( U ) | A = a, X = x ] − E [ ˆ Y A ← a 0 ( U ) | A = a, X = x ] = λU − λU = 0 . In another perspecti ve, we can interpret the abov e just as if we had measur ed U from the beginning rather than performing abduction. W e then generate ˆ Y from some g ( U ) , so U is the within-unit cause of ˆ Y and not A . If U cannot be deterministically derived from { A = a, X = x } , the reasoning is similar . By abduction, the distribution of U will typically depend on A , and hence so will ˆ Y when marginalizing ov er U . Again, this seems to disagree with the intuition that our predictor should be not be caused by A . Howe ver , this once again is a comparison acr oss individuals , not within an indi vidual. It is this balance among ( A, X, U ) that explains, in the examples of Section 3.1, why some predictors are counterfactually fair e ven though they are functions of the same variables { A, X } used by unfair predictors: such functions must correspond to particular ways of balancing the observ ables that, by way of the causal assumptions, cancel out the ef fect of A . More on conditioning and alternative deﬁnitions. As discussed in Example 4.4.4 of Pearl et al. [29], a different proposal for assessing f airness can be deﬁned via the following concept: Deﬁnition 6 (Probability of suf ﬁciency) . W e deﬁne the pr obability of event { A = a } being a sufﬁcient cause for our decision ˆ Y , contrasted against { A = a 0 } , as P ( ˆ Y A ← a 0 ( U ) 6 = y | X = x, A = a, ˆ Y = y ) . (3) W e can then, for instance, claim that ˆ Y is a f air predictor if this probability is below some pre-speciﬁed bound for all ( x, a, a 0 ) . The shortcomings of this deﬁnition come from its original motiv ation: to explain the behavior of an existing decision protocol, where ˆ Y is the current practice and which in a unclear way is conﬂated with Y . The implication is that if ˆ Y is to be designed instead of being a natural measure of existing behaviour , then we are using ˆ Y itself as evidence for the background 12 vari ables U . This does not make sense if ˆ Y is yet to be designed by us. If ˆ Y is to be interpreted as Y , then this does not provide a clear recipe on how to build ˆ Y : while we can use Y to learn a causal model, we cannot use it to collect training data e vidence for U as the outcome Y will not be available to us at pr ediction time . For this reason, we claim that while probability of sufﬁcienc y is useful as a way of assessing an existing decision making process, it is not as natural as counterfactual fairness in the context of machine learning. Appr oximate fairness and model validation. The notion of probability of sufﬁciency raises the question on how to deﬁne approximate, or high probability , counterfactual fairness. This is an important question that we address in [ 32 ]. Before deﬁning an approximation, it is important to ﬁrst expose in detail what the e xact deﬁnition is, which is the goal of this paper . W e also do not address the validation of the causal assumptions used by the input causal model of the F A I R L E A R N I N G algorithm in Section 4.1. The reason is straightforward: this v alidation is an entirely self-contained step of the implementation of counterfactual fairness. An extensi ve literature already exists in this topic which the practitioner can refer to (a classic account for instance is [ 3 ]), and which can be used as-is in our context. The experiments performed in Section 5 can be criticized by the fact that they rely on a model that obeys our assumptions, and “obviously” our approach should work better than alternativ es. This criticism is not warranted: in machine learning, causal inference is typically assessed through simulations which assume that the true model lies in the family co vered by the algorithm. Algorithms, including F A I R L E A R N I N G , are justiﬁed in the population sense. How dif ferent competitors beha ve with ﬁnite sample sizes is the primary question to be studied in an empirical study of a new concept, where we control for the correctness of the assumptions. Although sensiti vity analysis is important, there are many de grees of freedom on how this can be done. Robustness issues are better addressed by extensions focusing on approximate versions of counterf actual fairness. This will be covered in later work. Stricter version. For completeness of exposition, notice that the deﬁnition of counterfactual fairness could be strengthened to P ( ˆ Y A ← a ( U ) = ˆ Y A ← a 0 ( U ) | X = x, A = a ) = 1 . (4) This is different from the original deﬁnition in the case where ˆ Y ( U ) is a random variable with a different source of randomness for different counterfactuals (for instance, if ˆ Y is given by some black-box function of U with added noise that is independent across each countefactual value of A ). In such a situation, the event { ˆ Y A ← a ( U ) = ˆ Y A ← a 0 ( U ) } will itself have probability zero ev en if P ( ˆ Y A ← a ( U ) = y | X = x, A = a ) = P ( ˆ Y A ← a 0 ( U ) = y | X = x, A = a ) for all y . W e do not consider version (4) as in our view it does not feel as elegant as the original, and it is also unclear whether adding an independent source of randomness fed to ˆ Y would itself be considered unfair . Moreov er , if ˆ Y ( U ) is assumed to be a deterministic function of U and X , as in F A I R L E A R N I N G , then the two deﬁnitions are the same 6 . Informally , this stricter deﬁnition corresponds to a notion of “almost surely equality” as opposed to “equality in distribution. ” Without assuming that ˆ Y is a deterministic function of U and X , ev en the stricter version does not protect us against measure zero ev ents where the counterfactuals are different. The deﬁnition of counterfactual fairness concisely emphasizes that U can be a random v ariable, and clariﬁes which conditional distribution it follows. Hence, it is our preferred w ay of introducing the concept e ven though it does not explicit suggests whether ˆ Y ( U ) has random inputs besides U . S2 Relation to Demographic Parity Consider the graph A → X → Y . In general, if ˆ Y is a function of X only , then ˆ Y need not obey demographic parity , i.e. P ( ˆ Y | A = a ) 6 = P ( ˆ Y | A = a 0 ) , 6 Notice that ˆ Y ( U ) is itself a random variable if U is, but the source of randomness, U , is the same across all counterfactuals. 13 where, since ˆ Y is a function of X , the probabilities are obtained by marginalizing ov er P ( X | A = a ) and P ( X | A = a 0 ) , respectiv ely . If we postulate a structural equation X = α A + e X , then giv en A and X we can deduce e X . If ˆ Y is a function of e X only and, by assumption, e X is marginally independent of A , then ˆ Y is marginally independent of A : this follo ws the interpretation given in the pre vious section, where we interpret e X as “known” despite being mathematically deduced from the observation ( A = a, X = x ) . Therefore, the assumptions imply that ˆ Y will satisfy demographic parity , and that can be falsiﬁed. By way of contrast, if e X is not uniquely identiﬁable from the structural equation and ( A, X ) , then the distribution of ˆ Y depends on the value of A as we marginalize e X , and demographic parity will not follow . This leads to the following: Lemma 2. If all backgr ound variables U 0 ⊆ U in the deﬁnition of ˆ Y ar e determined fr om A and X , and all observable variables in the deﬁnition of ˆ Y ar e independent of A given U 0 , then ˆ Y satisﬁes demographic parity . Thus, counterfactual fairness can be thought of as a counterfactual analog of demographic parity , as present in the Red Car example further discussed in the ne xt section. S3 Examples Revisited In Section 3.1, we discussed two examples. W e reintroduce them here brieﬂy , add a third example, and explain some consequences of their causal structure to the design of counterf actually fair predictors. Scenario 1: The Red Car Revisited. In that scenario, the structure A → X ← U → Y implies that ˆ Y should not use either X or A . On the other hand, it is acceptable to use U . It is interesting to realize, ho wever , that since U is related to A and X , there will be some association between Y and { A, X } as discussed in Section S1. In particular , if the structural equation for X is linear , then U is a linear function of A and X , and as such ˆ Y will also be a function of both A and X . This is not a problem, as it is still the case that the model implies that this is merely a functional dependence that disappears by conditioning on a postulated latent attrib ute U . Surprisingly , we must make ˆ Y a indirect function of A if we want a counterfactually fair predictor , as shown in the follo wing Lemma. Lemma 3. Consider a linear model with the structure in F igure 1(a). F itting a linear predictor to X only is not counterfactually fair , while the same algorithm will pr oduce a fair predictor using both A and X . Pr oof. As in the deﬁnition, we will consider the population case, where the joint distribution is kno wn. Consider the case where the equations described by the model in Figure 1(a) are deterministic and linear: X = α A + β U, Y = γ U. Denote the variance of U as v U , the variance of A as v A , and assume all coefﬁcients are non-zero. The predictor ˆ Y ( X ) deﬁned by least-squares regression of Y on only X is giv en by ˆ Y ( X ) ≡ λX , where λ = C ov ( X, Y ) /V ar ( X ) = β γ v U / ( α 2 v A + β 2 v U ) 6 = 0 . This predictor follows the concept of fairness through unaw areness. W e can test whether a predictor ˆ Y is counterfactually fair by using the procedure described in Section 2.2: (i) Compute U giv en observations of X, Y , A ; (ii) Substitute the equations in volving A with an interventional v alue a 0 ; (iii) Compute the v ariables X, Y with the interventional v alue a 0 . It is clear here that ˆ Y a ( U ) = λ ( αa + β U ) 6 = ˆ Y a 0 ( U ) . This predictor is not counterfactually fair . Thus, in this case fairness through unaw areness actually perpetuates unfairness. Consider instead doing least-squares regression of Y on X and A . Note that ˆ Y ( X , A ) ≡ λ X X + λ A A where λ X , λ A can be deriv ed as follows: 14  λ X λ A  =  V ar ( X ) C ov ( A, X ) C ov ( X , A ) V ar ( A )  − 1  C ov ( X , Y ) C ov ( A, Y )  = 1 β 2 v U v A  v A − αv A − αv A α 2 v A + β 2 v U   β γ v U 0  =  γ β − αγ β  (5) Now imagine we have observed A = a . This implies that X = αa + β U and our predictor is ˆ Y ( X , a ) = γ β ( αa + β U ) + − αγ β a = γ U . Thus, if we substitute a with a counterf actual a 0 (the action step described in Section 2.2) the predictor ˆ Y ( X , A ) is unchanged. This is because our predictor is constructed in such a way that an y change in X caused by a change in A is cancelled out by the λ A . Thus this predictor is counterfactually fair . Note that if Figure 1(a) is the true model for the real world then ˆ Y ( X , A ) will also satisfy demographic parity and equality of opportunity as ˆ Y will be unaffected by A . The abo ve lemma holds in a more general case for the structure giv en in Figure 1(a): any non-constant estimator that depends only on X is not counterfactually fair as changing A always alters X . Scenario 2: High Crime Regions Revisited. The causal structure differs from the pre vious exam- ple by the extra edge X → Y . For illustration purposes, assume again that the model is linear . Unlike the previous case, a predictor ˆ Y trained using X and A is not counterfactually fair . The only change from Scenario 1 is that now Y depends on X as follo ws: Y = γ U + θ X . Now if we solv e for λ X , λ A it can be shown that ˆ Y ( X , a ) = ( γ − α 2 θv A β v U ) U + αθ a . As this predictor depends on the values of A that are not explained by U , then ˆ Y ( X , a ) 6 = ˆ Y ( X , a 0 ) and thus ˆ Y ( X , A ) is not counterfactually fair . The following e xtra example complements the pre vious two examples. Scenario 3: University Success. A univ ersity wants to know if students will be successful post- graduation Y . They have information such as: grade point av erage (GP A), advanced placement (AP) e xams results, and other academic features X . The univ ersity belie ves ho wev er , that an individual’ s gender A may inﬂuence these features and their post-graduation success Y due to social discrimination. They also belie ve that independently , an individual’ s latent talent U causes X and Y . The structure is similar to Figure 1(a), with the extra edge A → Y . W e can again ask, is the predictor ˆ Y ( X , A ) counterfactually fair? In this case, the different between this and Scenario 1 is that Y is a function of U and A as follows: Y = γ U + η A . W e can again solve for λ X , λ A and show that ˆ Y ( X , a ) = ( γ − αη v A β v U ) U + η a . Again ˆ Y ( X , A ) is a function of A not explained by U , so it cannot be counterfactually fair . S4 Analysis of Individual Pathways By way of an example, consider the following adaptation of the scenario concerning claims of gender bias in UC Berkele y’ s admission process in the 1970s, commonly used a textbook e xample of Simpson’ s Paradox. For each candidate student’ s application, we have A as a binary indicator of whether the applicant is female, X as the choice of course to apply for , and Y a binary indicator of whether the application was successful or not. Let us postulate the causal graph that includes the edges A → X and X → Y only . W e observe that A and Y are negativ ely associated, which in ﬁrst instance might suggest discrimination, as gender is commonly accepted here as a protected attribute for college admission. Howe ver , in the postulated model it turns out that A and Y are causally independent gi ven X . More speciﬁcally , women tend to choose more competiti ve courses (those with higher rejection rate) than men when applying. Our judgment is that the higher rejection among female than male applicants is acceptable, if the mechanism A → X is interpreted as a choice which is under the control of the applicant. That is, free-will overrides whate ver possible cultural background conditions that led to this discrepancy . In the framework of counterf actual fairness, we 15 could claim that A is not a protected attribute to begin with once we understand how the world works, and that including A in the predictor of success is irrelev ant anyway once we include X in the classiﬁer . Howe ver , consider the situation where there is an edge A → Y , interpreted purely as the effect of discrimination after causally controlling for X . While it is now reasonable to postulate A to be a protected attribute, we can still j udge that X is not an unfair outcome: there is no need to “decon volv e” A out of X to obtain an estimate of the other causes U X in the A → X mechanism. This suggests a simple modiﬁcation of the deﬁnition of counterfactual fairness. First, giv en the causal graph G assumed to encode the causal relationships in our system, deﬁne P G A as the set of all directed paths from A to Y in G which are postulated to correspond to all unfair chains of ev ents where A causes Y . Let X P c G A ⊆ X be the subset of cov ariates not present in any path in P G A . Also, for any vector x , let x s represent the corresponding subvector inde xed by S . The corresponding uppercase version X S is used for random vectors. Deﬁnition 7 ((Path-dependent) counterfactual f airness) . Pr edictor ˆ Y is (path-dependent) counter - factually fair with r espect to path set P G A if under any context X = x and A = a , P ( ˆ Y A ← a,X P c G A ← x P c G A ( U ) = y | X = x, A = a ) = P ( ˆ Y A ← a 0 ,X 6P c G A ← x P c G A ( U ) = y | X = x, A = a ) , (6) for all y and for any value a 0 attainable by A . This notion is related to contr olled dir ect effects [ 29 ], where we intervene on some paths from A to Y , but not others. Paths in P G A are considered here to be the “direct” paths, and we condition on X and A similarly to the deﬁnition of probability of suf ﬁciency (3). This deﬁnition is the same as the original counterfactual fairness deﬁnition for the case where P c G A = ∅ . Its interpretation is analogous to the original, indicating that for any X 0 ∈ X P c G A we are allowed to propagate information from the factual assigment A = a , along with what we learned about the background causes U X 0 , in order to reconstruct X 0 . The contribution of A is considered acceptable in this case and does not need to be “decon volv ed. ” The implication is that any member of X 6P c G A can be included in the deﬁnition of ˆ Y . In the example of college applications, we are allo wed to use the choice of course X ev en though A is a confounder for X and Y . W e are still not allowed to use A directly , bypassing the background variables. As discussed by [ 27 ], there are some counterfactual manipulations usable in a causal deﬁnition of fairness that can be performed by e xploiting only independence constraints among the counterfactuals: that is, without requiring the explicit description of structural equations or other models for latent variables. A contrast between the two approaches is left for future work, although we stress that they are in some sense complementary: we are moti vated mostly by problems such as the one in Figure 1(d), where many of the mediators themselves are considered to be unfairly af fected by the protected attribute, and independence constraints among counterfactuals alone are less lik ely to be useful in identifying constraints for the ﬁtting of a fair predictor . S5 The Multifaceted Dynamics of F airness One particularly interesting question was raised by one of the revie wers: what is the effect of continuing discrimination after fair decisions are made? For instance, consider the case where banks enforce a fair allocation of loans for business o wners regardless of, say , gender . This does not mean such businesses will thri ve at a balanced rate if customers continue to av oid female o wned business at a disproportionate rate for unf air reasons. Is there anything useful that can be said about this issue from a causal perspectiv e? The work here proposed regards only what we can inﬂuence by changing how machine learning- aided decision making takes place at speciﬁc problems. It cannot change directly ho w society as a whole carry on with their biases. Ironically , it may sound unf air to banks to enforce the allocation of resources to businesses at a rate that does not correspond to the probability of their respective success, ev en if the owners of the corresponding businesses are not to be blamed by that. One way of conciliating the dif ferent perspectiv es is by modeling ho w a fair allocation of loans, e ven if it does not come without a cost, can nevertheless increase the proportion of successful female businesses 16 C riminality R ace A rrest Fris k ed S earched W eapon F orce Figure 3: A causal model for the stop and frisk dataset. compared to the current baseline. This change can by itself have an indirect ef fect on the culture and behavior of a society , leading to diminishing continuing discrimination by a feedback mechanism, as in af ﬁrmativ e action. W e belie ve that in the long run isolated acts of fairness are beneﬁcial ev en if we do not hav e direct control on all sources of unfairness in any speciﬁc problem. Causal modeling can help on creating arguments about the long run impact of indi vidual contributions as e.g. a type of macroeconomic assessment. There are many challenges, and we should not pretend that precise answers can be obtained, but in theory we should aim at educated quantitative assessments v alidating how a systemic impro vement in society can emer ge from localized ways of addressing fairness. S6 Case Study: NYC Stop-and-Frisk Data Since 2002, the Ne w Y ork Police Department (NYPD) has recorded information about ev ery time a police ofﬁcer has stopped someone. The ofﬁcer records information such as if the person was searched or frisked, if a weapon was found, their appearance, whether an arrest was made or a summons issued, if force was used, etc. W e consider the data collected on males stopped during 2014 which constitutes 38,609 records. W e limit our analysis to looking at just males stopped as this accounts for more than 90% of the data. W e ﬁt a model which postulates that police interactions is caused by race and a single latent f actor labeled Criminality that is meant to index other aspects of the individual that ha ve been used by the police and which are independent of race. W e do not claim that this model has a solid theoretical basis, we use it belo w as an illustration on ho w to carry on an analysis of counterfactually fair decisions. W e also describe a spatial analysis of the estimated latent factors. Model. W e model this stop-and-frisk data using the graph in Figure 3. Speciﬁcally , we posit main causes for the observations: Arrest (if an individual was arrested), F or ce (some sort of force was used during the stop), F risked , and Sear ched . The ﬁrst cause of these observations is some measure of an individual’ s latent Criminality , which we do not observe. W e believ e that Criminality also directly affects W eapon (an individual w as found to be carrying a weapon). For all of the features previously mentioned we believe there is an additional cause, an individual’ s Race which we do observe. This factor is introduced as we belie ve that these observations may be biased based on an ofﬁcer’ s perception of whether an individual is likely a criminal or not, af fected by an indi vidual’ s Race . Thus note that, in this model, Criminality is counterfactually fair for the prediction of any characteristic of the individual for problems where Race is a protected attribute. V isualization on a map of New Y ork City . Each of the stops can be mapped to longitude and latitude points for where the stop occurred 7 . This allows us to visualize the distribution of two distinct populations: the stops of White and Black Hispanic individuals, shown in Figure 4. W e note that there are more White indi viduals stopped ( 4492 ) than Black Hispanic indi viduals ( 2414 ). Howe ver , if we look at the arrest distribution (visualized geographically in the second plot) the rate of arrest for White indi viduals is lower ( 12 . 1% ) than for Black Hispanic indi viduals ( 19 . 8% , the highest rate for any race in the dataset). Given our model we can ask: “If ev ery individual had been White, 7 https://github .com/stablemarkets/StopAndFrisk 17 Figure 4: How race af fects arrest. The above maps sho w ho w altering one’ s race af fects whether or not they will be arrested, according to the model. The left-most plot shows the distrib ution of White and Black Hispanic populations in the stop-and-frisk dataset. The second plot shows the true arrests for all of the stops. Given our model we can comput e whether or not every individual in the dataset would be arrest had the y been white . W e show this counterfactual in the third plot. Similarly , we can compute this counterfactual if e veryone had been Black Hispanic, as sho wn in the fourth plot. would they hav e been arrested?”. The answer to this is in the third plot. W e see that the overall number of arrests decreases (from 5659 to 3722 ). What if ev ery individual had been Black Hispanic? The fourth plot shows an increase in the number of arrests had individuals been Black Hispanic, according to the model (from 5659 to 6439 ). The yellow and purple circles show tw o regions where the difference in counterfactual arrest rates is particularly striking. Thus, the model indicates that, ev en when e verything else in the model is held constant, race has a differential af fect on arrest rate under the (strong) assumptions of the model. 18

Counterfactual Fairness

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment