On the (im)possibility of fairness

On the (im)p ossibilit y of fairness ∗ So relle A. F riedler Haverfo rd College † Ca rlos Scheidegger Universit y of Arizona ‡ Suresh V enk atasubramanian Universit y of Utah § Abstract What does it mean for an algorithm to be fair? Different papers use dif ferent notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. W e present a mathematical setting in which the distinctions in pr evious papers can be made formal. In addition to characterizing the spaces of inputs (the “observed” space) and outputs (the “decision” space), we introduce the notion of a construct space : a space that captures unobservable, but meaningful variables for the pr ediction. W e show that in or der to prove desirable properties of the entire decision-making pr ocess, different mechanisms for fairness requir e different assumptions about the natur e of the mapping from constr uct space to decision space. The results in this paper imply that futur e treatments of algorithmic fairness should more explicitly state assumptions about the r elationship between constructs and observations. 1 Intro duction Machine learning has embedded itself deep inside many of the decision-making systems that used to be driven by humans. Whether it’s resume ﬁltering for jobs, or admissions to college, credit ratings for loans, or all components of the criminal justice pipeline, automated tools ar e being used to ﬁnd patterns, make predictions, and assist in decisions that have signiﬁcant impact on our lives. The “rise of the machines” has raised concerns about the fairness of these pr ocesses. Indeed, while one of the rationales for introducing automated decision making was to replace subjective human decisions by “objective” algorithmic methods, a lar ge body of resear ch has shown that machine learning is not free from the kinds of discriminatory behavior humans display . This area of algorithmic fairness now contains many ideas about how to prevent algorithms from learning bias 1 and how to design algorithms that are “fairness-aware.” Strangely though, the basic question “what does it mean for an algorithm to be fair?” has gone under- examined. While many papers have proposed quantitative measur es of fairness, these measur es rest on unstated assumptions about fairness in society . As we shall show , these assumptions, if br ought into the open, are often mutually incompatible, r endering it difﬁcult to compar e proposals for fair algorithms with each other . 1.1 Our W ork Deﬁnitions of fairness, nondiscrimination, and justice in society have been debated in the social science community extensively , fr om Rawls to Nozick to Roemer , and many others. A parallel debate is ongoing within the computer science community , including discussions of individual fairness [8] vs. group fairness (e.g., disparate impact’s four-ﬁfths rule [10] and a dif ference formulation of a discrimination score [3, 13, 14, 20, 25]). These discussions reveal dif ferences in the understood meaning of “fairness” in decision-making centering around two different interpretations of the extent to which factors outside of an individual’s ∗ This resear ch was funded in part by the NSF under grants IIS-1251049, CNS-1302688, IIS-1513651, IIS-1633724, and IIS-1633387. † sorelle@cs.haverford.edu ‡ cscheid@cscheid.net § suresh@cs.utah.edu 1 Here, and in the r est of this paper , we will typically use “bias” to denote discriminatory behavior in society , rather than the statistical notion of bias. Similarly , “discriminate” will r efer to the societal notion. 1 2 Spaces: Construct, Observed and Decision 2 control should be factor ed into decisions made about them and the extent to which abilities ar e innate and measurable. This tension manifests itself in the debates between “equality of outcomes” and “equality of treatment” that have long appeared under many names. Our contribution to this debate will be to make these deﬁnitions mathematically precise and reveal the axiomatic differ ences at the heart of this debate. (W e will review the literatur e in light of these deﬁnitions in Section 5.) In order to make fairness mathematically pr ecise, we tease out the difference between beliefs and mecha- nisms to make clear what aspects of this debate are opinions and which choices and policies logically follow from those beliefs. Our goal is to make the embedded value systems transparent so that the belief system can be chosen, allowing mechanisms to be proven compatible with these beliefs. W e will create this separation of concerns by developing a mathematical theory of fairness in terms of transformations between differ ent kinds of spaces. Our primary insight can be summarized as: T o study algorithmic fairness is to study the interactions between different spaces that make up the decision pipeline for a task. W e will ﬁrst argue that there are mor e spaces that are implicitly involved in the decision-making pipeline than are typically speciﬁed. In fact, it is the conﬂation of these spaces that leads to much of the confusion and disagreement in the literature on algorithmic fairness. Next, we will r einterpret notions of fairness, structural bias, and non-discrimination as quantifying the way that spaces ar e transformed to each other . W ith this framework in place, we can make formal tensions between fairness and non-discrimination by revealing fundamental differ ences in worldviews underlying these deﬁnitions. Our speciﬁc contributions are as follows: • W e introduce the idea of (task-dependent) spaces that interact in any learning task, speciﬁcally intr o- ducing the construct space that captures the notion that featur es of interest for decision-making ar e necessarily imperfect proxies for the constr uct of interest. • W e reinterpr et notions of fairness, structural bias, and non-discrimination mathematically as functions of transformations between these spaces. • Surprisingly , we show that fairness can be guaranteed only with very str ong assumptions about the world: namely , that “what you see is what you get,” i.e., that we can correctly measure individual ﬁtness for a task regar dless of issues of bias and discrimination. W e complement this with an impossibility result, saying that if this str ong assumption is dropped, then fairness can no longer be guaranteed. • W e develop a theory of non-discrimination based on a quantiﬁcation of structural bias. Building non-discriminatory decision algorithms is shown to require a dif ferent worldview , namely that “we’re all equal,” i.e., that all groups are assumed to have similar abilities with respect to the task in the construct space. • W e show that virtually all methods that pr opose to address algorithmic fairness make implicit assump- tions about the nature of these spaces and how they interact with each other . 2 Spaces: Construct, Observed and Decision If we consider our guiding informal understanding of fairness - that similar people should be treated similarly - in the context of algorithm design, we must begin by determining how people will be repr esented as inputs to the algorithm, and what associated notion of similarity on this repr esentation is appr opriate. These two choices will entirely determine what we mean by fairness, and there are many subtle choices that must be made in this determination as we build up to a formal deﬁnition. Fairness-aware algorithms (and indeed all algorithms in machine learning) can be viewed as mappings between spaces, and we will adopt this viewpoint. They take inputs from some feature space and return outputs in a decision space . The question then becomes how we should deﬁne points and the associated metric to pr ecisely deﬁne these spaces. T o illuminate some of the subtleties inherent in these choices, we’ll introduce what will be a running example of fairness in a college admissions decision. 2 Spaces: Construct, Observed and Decision 3 Example: Setting up a College Admissions Decision W e can think of the college admissions process as being a procedur e that takes a set of people and applies a yes or no decision to each person. Before determining a fair procedur e, an admissions ofﬁce would need to determine what aspects of a person they wanted to make the decision based on. This might include potential, intelligence, and diligence. These choices would lead to other questions: At what point in a person’s life should these aspects of their personality and ability be measured? Is intelligence set at birth or adaptable and when does this mean it should be measured for a college admissions decision? How can potential and diligence be accurately measured? In order to deﬁne a feature space, we must answer questions about what features should be included and how (and when) they should be measured. This description illuminates our ﬁrst important distinction from a common set-up of such a problem: the feature space itself is a representation of a chosen set of possibly hidden or unmeasurable constr ucts. Determining which features should be consider ed is part of the determination of how the decision should be made; representing those constr ucts in measurable form is a separate and important step in the process. This distinction motivates our ﬁrst two deﬁnitions. Deﬁnition 2.1 (Construct space ( C S )) . The construct space is a metric space C S = ( P , d P ) consisting of individuals and a distance between them. It is assumed that the distance d P correctly captur es closeness with respect to the task. The construct space is the space containing the features that we would like to make a decision based on. These are the “desired” or “true” featur es at the time chosen for the decision, and the ability to accurately measure similarity between people with r espect to the task. This is the space r epresenting the input to a decision-making process, if we had full access to it. Example: Construct space for a College Admissions Decision Suppose that a college admissions team decided to make admissions decisions based on pr edicted potential of applicants. Personal qualities, such as self-control, growth mind-set, and grit, ar e known to be determining factors in later success [2]. Grit is roughly deﬁned as an individual’s ability to demonstrate passion, perseverance and r esilience towards their chosen goal. In this way , a college admissions decision could attempt to use grit as a feature of “potential” in the constr uct space. In reality , we might not know these features or even the true similarity between individuals. All we have is what we measure or observe , and this leads us to our next deﬁnition. Deﬁnition 2.2 (Observed space ( O S )) . The observed space (with respect to T ) is a metric space O S = ( ˆ P , ˆ d ) . We assume an observation process g : P → ˆ P that generates an entity ˆ p = g ( p ) from a person p ∈ C S . Example: Observed space for a College Admissions Decision Grit (and other such qualities) are not directly observable: r esearch attempts to measure grit indirectly through proxies such as self-reported surveys [6]. The ability to measure grit and these other qualities precisely is limited [7]. In this setting, the “amount of grit” is a feature in the construct space: it is hidden from us but appears to have some unknown inﬂuence on the desired pr edicted outcome. W e attempt to infer it through imperfect pr oxy features, such as a “survey-based grit score,” that lie in the observed space. The ﬁnal part of a task is a decision space of outcomes. Deﬁnition 2.3 (Decision space ( D S )) . A decision space is a metric space D S = ( O , d O ) , where O is a space of outcomes and d O is a metric deﬁned on O. A task T can be viewed as the process of ﬁnding a map fr om P or ˆ P to O. Example: Decision space for a College Admissions Decision The decision space for a college admissions decision consists of the (potentially unobservable, hopefully predictable) information that makes up the ﬁnal admissions decision. The decision space might be simply the resulting yes/no (admit / don’t admit) decisions, or might be the pr edicted potential of an applicant or their predicted performance in college (a threshold could then be applied to this decision space to generate yes/no decisions). 2 Spaces: Construct, Observed and Decision 4 Ho w the spaces interact. Algorithmic decision-making is a set of mappings between the three spaces deﬁned above. The desired outcome is a mapping from C S to D S via an unknown and complex function o = f ( X 1 , X 2 , . . . ) of features that lie in the constr uct space. In order to implement an algorithm that pr edicts the desired outcome, we must ﬁrst extract usable data from C S : this is a collection of mappings from C S to O S . The features Y 1 , Y 2 , . . . , Y ` in O S might be: • noisy variants of the X i : Y i = g ( X i ) where g ( · ) is some stochastic function, • some unknown (and noisy) combination of X i : Y i = g ( X i 1 , X i 2 , . . . ) , or • new attributes that are independent of any of the X i . Further , some of the X i might even be omitted entirely when generating Y i . Our goal is (ideally) to determine o . W e instead design an algorithm that learns ˜ o = ˜ f ( Y 1 , Y 2 , . . . Y ` ) i.e a mapping from O S to D S . The hope is that ˜ o ' o . 2.1 Examples The easiest way to understand the interactions between the decision space, the construct space and the observed space is to start with a prediction task, posit features that seem to contr ol the prediction, and then imagine ways of measuring these featur es. W e provide a number of such examples in T able 1, described in more detail below . Decision space Construct space Observed space Performance in college Intelligence IQ Performance in college Success in High School GP A Recividism Propensity to commit crime Family history of crime Recidivism Risk-averseness Age Employee Productivity Knowledge of job Number of Y ears of Experience T ab. 1: Examples of construct space attributes and their corr esponding observed space attributes for dif ferent outcomes College Admission. Universities consider a number of factors when deciding who to admit to their university . One admissions goal might be to determine the likelihood that an admitted student will be successful in college, and the factors consider ed can include things like intelligence, high school GP A, scores on standardized tests, extracurricular activities and so on. In this example, performance in college is the decision space, while intelligence and success in high school are in the construct space. Intelligence might be repr esented in the observed space by the result of an IQ test, while success in high school could be observed by high school GP A. Recidivism Prediction. When an offender is eligible for par ole, judges assess the likelihood that the offender will re-offend after being released as part of the parole decision. Many jurisdictions now use automated prediction methods like C O M P A S [17] to generate such a likelihood. W ith the goal of predicting the likelihood of recidivism (the decision space), such an algorithm might want to determine an individual’s propensity for criminal activity and their level of risk-aversion. These construct space attributes could be modeled in the observed space by a family history of crime and the offender ’s age. Hiring. One of the most important criteria in hiring a new employee is their ability to succeed at a future job. As proxies, employers will use features like the college attended, GP A, previous work experience, interview performance and their overall r esume. The decision space in this case is employee productivity once hired, while a construct space attribute is the applicant’s current knowledge of the job. One way to observe this knowledge (an attribute in the observed space) is by their number of years of experience at a similar job. 2 Spaces: Construct, Observed and Decision 5 2.2 Quantifying transfo rmations b et ween spaces W e can describe the entir e pipeline of algorithmic decision-making (feature extraction and measur ement, prediction algorithms and even the underlying predictive mechanism) in the form of transformations between spaces. This is where the metric structure of the spaces plays a role. As we will see, we can express the quality of the various transformations between spaces in terms of how distances (that captur e dissimilarity between entities) change when one space is transformed into another . The reason to use (functions of) distances to compar e spaces is because most learning tasks rely heavily on the underlying distance geometry of the space. By measuring how the distances change relative to the original space we can get a sense of how the task outcomes might be distorted. W e introduce two dif ferent approaches to quantifying these transformations: one that is more “local” and one that compares how sets of points ar e transformed. W e start with a standar d measure of point-wise transformation cost. Deﬁnition 2.4 ((additive) Distortion) . Let ( X , d X ) and ( Y , d Y ) be two metric spaces and let f : X → Y be a map from X to Y. The distortion ρ f of f is deﬁned as the smallest value such that for all p , q ∈ X | d X ( p , q ) − d Y ( f ( p ) , f ( q )) | ≤ ρ f The distortion ρ ( X , Y ) is then the minimum achievable distortion ρ f over all mappings f . Notes. While the above notion (and its multiplicative variant) is standard in the theoretical computer science literature, it is helpful to understand why it is justiﬁed in the context of algorithmic fairness. Distortion is commonly used as way to minimize the change of geometry when doing dimensionality reduction to make a task more ef ﬁciently solvable 2 . The speciﬁc measure above is a special case ( p = ∞ ) of a general ` p -additive distortion (where the norm is computed over the vector of distance differences). In this more general setting there ar e a number of approximation algorithms for estimating ρ when the “target space” Y is restricted to a line [5], a tr ee [1], or an ultrametric [9]. There ar e many different ways to compar e metric spaces using their distances. Distortion is a worst-case notion: it is controlled by the worst-case spread between a pair of distances in the two spaces. If instead we wished to measure distances between subsets of points in a metric space, there is a more appropriate notion. Deﬁnition 2.5 (Coupling Measure) . Let X , Y be sets with associated pr obability measures µ X , µ Y . A probability measure ν over X × Y is a coupling measure if ν ( X , · ) (the projection of ν on X ) equals µ X , and similarly for ν ( · , Y ) and µ Y . The space of all such coupling measures is denoted by U ( X , Y ) . Deﬁnition 2.6 (W asserstein distance ( W D )) . Let ( X , d ) be a metric space and let Y , Y 0 be two subsets of X. Let µ be a probability measur e deﬁned on X , which in turn induces probability measur es µ Y , µ Y 0 on Y , Y 0 respectively . The Wasserstein distance between Y , Y 0 is given by W d ( Y , Y 0 ) = min ν ∈ U ( Y , Y 0 )] Z d ( y , y 0 ) ν ( y , y 0 ) The W D ﬁnds an optimal transportation between the two sets and computes the resulting distance. It is a metric when d is. Finally , we need a metric to compare subsets of points that lie in different metric spaces. Intuitively , we would like some distance function that determines whether the two subsets have the same shape with respect to the two underlying metrics. W e will make use of a distance function called the Gromov-Wasserstein distance [16] that is derived from the W asserstein distance above. 2 It is possible to take a more information-theor etic perspective on the nature of transformations between spaces. For example, we could quantify the quality of a (stochastic) transformation by the amount of mutual information between the source and target spaces. While this is relevant when we wish to determine the degree to which we can reconstruct the source space from the target, it is not necessary for algorithmic decision-making. For example, it is not necessary that we be able to reconstruct the construct space features from featur es in the observed space. But it will matter that individuals with similar features in the construct space have similar features in the observed space. 3 A mathematical formulation of fairness and bias 6 Deﬁnition 2.7 (Gromov-W asserstein distance ( G W D )) . Let ( X , d X ) , ( Y , d Y ) be two metric spaces with associated probability measur es µ X , µ Y . The Gromov-Wasserstein distance between X and Y is given by G W ( X , Y ) = 1 2 inf ν ∈ U ( X , Y ) Z Z | d X ( x , x 0 ) − d Y ( y , y 0 ) | d µ X × d µ X d µ Y × d µ Y Intuitively , the G W D computes the W D between the sets of pairs of points, to determine whether the two point sets determine similar sets of distances. W e note that both W ( X , Y ) and G W ( X , Y ) can be computed using the standar d Hungarian algorithm for optimal transport. W ( X , Y ) can be computed in time O ( n 3 ) (where | X | = | Y | = n ) and G W ( X , Y ) can be computed in time O ( n 6 ) . 3 A mathematical fo rmulation of fairness and bias W e have now introduced three spaces that play a role in algorithmic decision-making: the construct space, the observed space and the decision space. W e have also intr oduced ways to measure the ﬁdelity with which spaces map to each other . Armed with these ideas, we can now describe how notions of fairness and bias can be expressed formally . 3.1 A deﬁnition of fairness The deﬁnition of fairness is task speciﬁc, and prescribes desirable outcomes for a task. Since the solution to a task is a mapping fr om the construct space C S to the decision space D S , a deﬁnition of fairness should describe the properties of such a mapping. Inspired by the fairness deﬁnition due to Dwork et al. [8], we give the following deﬁnition of fairness: Deﬁnition 3.1 (Fairness) . A mapping f : C S → D S is said to be fair if objects that ar e close in C S are also close in D S . Speciﬁcally, ﬁx two thr esholds e , e 0 . Then f is deﬁned as ( e , e 0 ) - fair if for any x , y ∈ P , d P ( x , y ) ≤ e = ⇒ d O ( f ( x ) , f ( y ) ) ≤ e 0 Note that the deﬁnition of fairness does not requir e any particular outcome for entities that are far apart in C S . 3.2 A w orldview: what you see is what y ou get The presence of the observed space complicates claims that data-driven decision making can be fair , since features in the observed space might not reﬂect the true value of the attributes that you would like to use to make the decision. In or der to address this complication, given that the construct space is unobservable, assumptions must be introduced about the points in the construct space, or the mapping between the construct space and observed space, or both. One worldview focuses on the mapping between the construct space and observed space by asserting that the construct space and observed space are essentially the same . W e call this worldview the what you see is what you get ( W Y S I W Y G ) view . Axiom 3.1 ( W Y S I W Y G ) . There exists a mapping f : C S → O S such that the distortion ρ f is at most e for some small e > 0 . Or equivalently , the distortion ρ between C S and O S is at most e In practice, we can think of e as a very small number like 0.01. Example: W Y S I W Y G in a College Admissions Decision In the college admissions setting, the W Y S I W Y G is the assumption that features like SA T scores and high- school GP A (which are observed) correlate well with the applicant’s ability to succeed (a property of the construct space). More pr ecisely , it assumes that there is some way to use a combination of these scores to correctly compar e true applicant ability . 3 A mathematical formulation of fairness and bias 7 3.3 A w orldview: structural bias But what if the construct space isn’t accurately repr esented by the observed space? In the case of stochastic noise in the transformation between C S and O S , fairness in the system may decrease for all decisions. This case can be handled using the W Y S I W Y G worldview and usual techniques for accurate learning in the face of noise (see, e.g., [15]). Unfortunately , in many real-world societal applications, the noise in this transformation is non-uniform in a societally biased way . T o explain this structural bias , we start with the notion of a group : a collection of individuals that share a certain set of characteristics (such as gender , race, religion and so on). These characteristics are often historically and culturally deﬁned (e.g., by the long history of racism in the United States). W e repr esent groups as a partition of individuals into sets G 1 , G 2 , . . . , G k . In this work, we will think of a group membership as a characteristic of an individual; thus each of the construct space, observed space, and decision space admits a partition into gr oups, induced by the gr oup memberships of individuals repr esented in these spaces. Structural bias manifests itself in unequal treatment of gr oups. In order to quantify this notion, we ﬁrst deﬁne the notion of group skew : the way in which gr oup (geometric) structure might be distorted between spaces. What we wish to captur e is the r elative distortion of groups with r espect to each other , rather than (for example) a scaling transformation that would transform all groups the same way . Let ( X , d ) be a metric space partitioned into groups X = { X 1 , . . . , X k } . Any probability measur e µ X deﬁned on X induces a measure µ X on X in the natural way . W e can deﬁne a metric d X on X via the operation d X ( X i , X j ) = W d ( X i , X j ) . Now consider two such metric spaces ( X , d X ) , ( Y , d Y ) and their associated group metric spaces ( X , d X ) , ( Y , d Y ) and measures µ X , µ Y Deﬁnition 3.2 (Between-groups distance) . The between-groups distance between ( X , d X ) , ( Y , d Y ) with mea- sures µ X , µ Y is ρ b = G W ( X , Y ) ( k 2 ) The between-groups distance treats the gr oups in a space as individual “points”, and compares two collections of “points”. T o capture the differential treatment of groups, we need to normalize this against a measure of how each gr oup is distorted individually 3 . Deﬁnition 3.3 (W ithin-group distance) . Let X i and Y i be the two sets in the spaces X , Y corresponding to the i th group. Let ρ i = G W ( X i , Y i ) . Then we deﬁne ρ w = 1 k k ∑ i = 1 ρ i W e can now deﬁne a notion of group skew between two spaces. Deﬁnition 3.4 (Group skew) . Let ( X , d X ) and ( Y , d Y ) be metric spaces with group partitioning X , Y and measur es µ X , µ Y . The group skew between X and Y is the quantity σ ( X , Y ) = ρ b ( X , Y ) ρ w ( X , Y ) There is a degenerate case in which group skew is not well-deﬁned. This is when for each i the sets X i and Y i are identical in distance structure. In this (admittedly unlikely) setting, each ρ i will be zero, and thus ρ w = 0. This can be interpreted as saying that when groups are identical in the two spaces, any small variation between gr oups is magniﬁed gr eatly . T o avoid this degenerate case, we will instead compute ρ b and ρ w on a perturbed version of the data, where each point is shifted randomly within a metric ball of radius δ . The parameter δ acts as a smoothing operator to avoid such degenerate cases. This effectively adds O ( δ ) to each of the numerator and denominator , ensuring that the ratio is always well deﬁned. Using these deﬁnitions, we can now account for structural bias, which can be informally understood as the existence of more distortion between groups than there is within groups when mapping between the construct space and the observed space, thus identifying when groups are tr eated differentially by the observation process. 3 This is similar to how we might measure between-group and within-gr oup variance in statistical estimation problems like ANOV A. 3 A mathematical formulation of fairness and bias 8 Deﬁnition 3.5 (Structural Bias) . The metric spaces C S = ( X , d X ) and O S = ( Y , d Y ) admit t - structural bias if the group skew σ ( X , Y ) > t . Example: Structural Bias in a College Admissions Decision Researchers have shown that the SA T verbal questions function differently for the African-American sub- group, so that the validity of the r esults as a measure of ability ar e in question for this subgroup [21]. In the case where SA T scores ar e a feature in the observed space, this resear ch indicates that we should consider these scores to be the r esult of structural bias. 3.3.1 Non-Discrimination: a top-level goal Since group skew is a property of two metric spaces, we can consider the impact of group skew between the construct space and observed space (structural bias as deﬁned above), between the observed space and the decision space, and between the construct space and the decision space. While colloquially “structural bias” can refer to any of these (since the construct space and observed space are often conﬂated), in this paper we will give different names to group skew depending on the r elevant spaces used in the mapping. W e will refer to gr oup skew in the decision-making procedur e (the mapping from observed space to decision space) as direct discrimination . Deﬁnition 3.6 (Direct Discrimination) . The metric spaces O S = ( X , d X ) and D S = ( Y , d Y ) admit t - direct discrimination if the group skew σ ( X , Y ) > t . Note that the group structur e Y is the direct result of a mapping f : O S → D S , so we can think of direct discrimination as a function of this mapping. Since group membership is usually deﬁned based on innate or culturally deﬁned characteristics that individuals have no ability to change, it is often considered unacceptable (and in some cases, illegal) to use group membership as part of a decision-making process. Thus, in decision-making non-discrimination is often a high-level goal. This is sometimes termed “fairness,” but we will distinguish the terms here. Deﬁnition 3.7 (Non-Discrimination) . Let C S = ( X , d X ) and D S = ( Y , d Y ) . A mapping f : C S → D S is t-nondiscriminatory if the group skew ρ ( X , Y ) ≤ t. Thus, this worldview is primarily concerned with achieving non-discrimination by avoiding both structural bias and direct discrimination. Given the social history of this type of group skew occurring in a way that disadvantages speciﬁc sub-populations it makes sense that this is a common top-level goal. Unfortunately , it is har d to achieve directly , since we have no knowledge of the construct space and the existence of structural bias precludes us from using the observed space as a reasonable repr esentation of the construct space (as is done in the W Y S I W Y G worldview). 3.3.2 An Axiomatic Assumption: we’re all equal Instead, a common underlying assumption of this worldview , that we will make precise her e, is that in the construct space all groups look essentially the same . In other wor ds, it asserts that there ar e no innate dif ferences between groups of individuals deﬁned via certain potentially discriminatory characteristics. This latter axiom of fairness appears implicitly in much of the literature on statistical discrimination and disparate impact. There is an alternate interpretation of this axiom: the gr oups ar en’t equal, but for the purposes of the decision-making process they should be treated as if they were. In this interpretation, the idea is that any differ ence in the gr oups’ performance (e.g., academic achievement) is due to factors outside their individual control (e.g., the quality of their neighbor hood school) and should not be taken into account in the decision making process [18]. This interpr etation has the same mathematical outcome as if the equality of gr oups is assumed as true, and thus we will r efer to a single axiom to cover these two interpretations. Axiom 3.2 ( we’re all equal ( WA E )) . Let C S = ( X , d X ) with measur e µ X be partitioned into groups X 1 , . . . , X k . There exists some ε > 0 such that for all i , j , W d X ( X i , X j ) < ε . 3 A mathematical formulation of fairness and bias 9 Example: WA E in a College Admissions Decision In the college admissions setting, the WA E asserts that all groups will have almost the same distribution in the construct space of intrinsic abilities, such as grit or intelligence, chosen as important inputs to the decision making process. In the example of SA T scores, given above, this would mean that we assume the structural bias of these scor es apparent in the observed space is not repr esentative of a distributional differ ence in the construct space. It is useful to note that the W A E is a pr operty of the construct space, whereas the W Y S I W Y G describes the relation between the construct space and observed space. Note also that the deﬁnition of structural bias does not itself assume the W A E - in fact, ther e could be structural bias that acted in addition to existing true differ ences between gr oups present in the construct space to further separate the groups in the observed space. However , because of the lack of knowledge about the construct space when assuming the existence of structural bias, the WA E will often be assumed in practice under the structural bias worldview . 3.4 Compa ring Wo rldviews While we introduce these two axioms as different world views or belief systems, they can also be strategic choices. Whatever the motivation (which is ultimately mathematically irrelevant), the choice in axiom is critical to a decision-making process. The chosen axiom determines what fairness means by giving enough structur e to the construct space or the mapping between the construct space and observed space to enforce fairness despite a lack of knowledge of the construct space. W e discuss the subtleties of the axiomatic choice here and will r eturn to the enforcement of fairness based on this axiomatic choice in the next section. Example: Choice of W Y S I W Y G or WA E for a College Admissions Decision There are many reasonable ways that a college admissions ofﬁce might decide to set up a fair decision making procedur e. These would include different choices of what features to be included in the constr uct space and might indicate differ ent fairness goals. They would also necessitate differ ent axiomatic choices. 1. One decision-making philosophy might be that the college should admit only those students who reach a high level of achievement and demonstrated intelligence at the time of admission. The construct space in this example might include potential, intelligence, and diligence at the time of the admissions decision . (a) If the admissions ofﬁce believes that their observed space features accurately represent the construct space featur es, this scenario aligns with the W Y S I W Y G axiom. (b) If the admissions of ﬁce believes that any systemic gr oup dif ferences in the observed space are inaccuracies (e.g., potentially due to culturally inﬂected exam questions), this scenario follows the WA E axiom. 2. Another decision-making philosophy might be that the college should focus on admitting those with high innate potential, regar dless of the social environment and life experiences that may have shaped that potential. In this case, the construct space might include potential, intelligence, and diligence at birth . As above, the admissions ofﬁce could choose to believe either the W Y S I W Y G or the WA E axioms. 3. A third decision-making philosophy might be that the college admissions process should serve as a social equalizer , so that, e.g., applicants fr om differ ent class backgrounds are admitted at approximately the same rate. Since the construct space is the space of features used for ideal decision-making, in this case potential, intelligence, and diligence might be assumed to represent an idealized belief of the characteristics and abilities of an individual were their class backgr ound equalized. (Some may believe that this case is the same as representing these qualities at the time of birth.) This would be a choice to follow the WA E axiom. The choice of worldview is heavily dependent on the speciﬁc attributes and task considered, and on the algorithm designer ’s beliefs about how observations of these attributes align with the associated constructs. 4 Making fair or non-discriminato ry decisions 10 Roemer identiﬁes the goal of such choices as ensuring that negative attributes that are due to an individual’s circumstances of birth or to random chance should not be held against them, while individuals should be held accountable for their effort and choices [18]. He suggests that differ ences in worldview can be attributable to when in an individual’s development the playing ﬁeld should be leveled and after what point an individual’s own choices and ef fort should be taken into account. In our decision-making formulation, the decision about when amounts to a decision of which axiom to believe at the point in time the decision will be made. If the decision is being made while the playing ﬁeld should be leveled, then the we’re all equal axiom should be assumed. If the decision is being made while only an individual’s own efforts should be included in the decision, then the W Y S I W Y G axiom may be the right choice. 3.5 Mechanisms W e can think of the axioms as assumed relationships between the construct space and the observed space (or operating within the construct space), and fairness deﬁnitions as desirable outcomes (executions of tasks) that reﬂect these relationships. A mechanism is then a constructive expression of a deﬁnition: it is a mapping (or set of mappings) from O S to D S that allow the deﬁnition to be satisﬁed. In effect, a well-designed mechanism working from a speciﬁc set of axioms should yield a fair outcome. Formally , a mechanism is a mapping f : O S → D S that satisﬁes certain properties. First and foremost, a mechanism should be nontrivial . For example, if the decision space is { 0, 1 } (e.g for binary classiﬁcation), a mechanism that assigned a 0 to each point would be trivial. Deﬁnition 3.8 (Richness) . A mechanism f : O S → D S is rich if for each d ∈ D S , f − 1 ( d ) 6 = ∅ . There ar e then two types of mechanisms that (we will show) pr ovide guarantees under the two different world views described above: individual fairness mechanisms (aiming to guarantee fairness) and gr oup fairness mechanisms (aiming to guarantee non-discrimination). Deﬁnition 3.9 (Individual fairness mechanism ( I F M )) . Fix a tolerance e . A mechanism I F M e is a rich mapping f : O S → D S such that ρ f ≤ e . The individual fairness mechanism asserts that the mechanism for decision making tr eats people similarly if they are close, and can tr eat them differently if they ar e far , in the observed space. Deﬁnition 3.10 (Group fairness mechanism (G F M )) . Let X be partitioned into groups X 1 , X 2 , . . . as befor e. A rich mapping f : O S → D S is said to be a valid group fairness mechanism if all groups ar e treated equally . Speciﬁcally , ﬁx e . Then f is said to be a G F M e if for any i , j, W d O ( X i , X j ) ≤ e . The group fairness mechanism asserts that the decision mechanism should treat all groups the same way . The doctrine of disparate impact is an example of such an assertion (although the pr ecise measure of the degree of disparate impact as measur ed by the 4 /5-rule is differ ent). In the following sections we will explor e whether and how these types of mechanisms actually guarantee fairness under certain axiomatic assumptions. 4 Making fair o r non-discriminatory decisions W ith the basic vocabulary in place, we can now ask questions about when fairness is possible. An easy ﬁrst observation is that under the W Y S I W Y G , we can always be fair . Theorem 4.1. Under the W Y S I W Y G with err or parameter δ , an I F M δ 0 will guarantee ( e , e 0 ) -fairness for some function f such that e 0 = f ( δ , δ 0 ) . Proof. T wo points in the observed space at distance d have a distance in the construct space between d − δ and d + δ . Applying the mechanism WM δ 0 yields decision elements that have a distance (in D S ) between d − δ − δ 0 and d + δ + δ 0 . Setting e 0 appropriately yields the claim. 4 Making fair or non-discriminato ry decisions 11 The requir ement that we have an individually fair mechanism turns out to be important. W e start with some background. Fix C S . Assume that we have two groups a , b , and so each point p ∈ C S has a label ` ( p ) : C S → { a , b } . Let φ : C S → O S be the method by which features of p are “observed”. For simplicity , we assume this map is bijective and so for each q ∈ O S there exists φ − 1 ( q ) = p ∈ C S . W e will abuse notation and denote the (gr oup) label of q ∈ O S as ` ( q ) = ` ( φ − 1 ( q ) ) . Let d O S be a metric on O S . Let P ⊂ O S be a set of points. The diameter of P is ∆ ( P ) = max x , y ∈ P d O S ( x , y ) . Consider an arbitrary f . Theorem 4.2. Under the W Y S I W Y G with parameter e , for any δ , δ 0 < 1 and a rich mechanism f : O S → D S where the decision space is discrete ( D S = { 0, 1, 2, . . . , k } ), f is not ( δ − e , δ 0 ) -fair . Proof. Fix the metric d ( x , y ) = 1 x 6 = y . Let B = B r ( x ) be a ball of radius r centered at x ∈ O S . W e will say that B is monochromatic if all points in B have the same image under f . Let r 0 be the smallest value of r such that B r ( x ) is not monochromatic. If such an r 0 does not exist, then f cannot be rich. Consider the difference ∆ = B r ( x ) \ B r − δ ( x ) , and let B δ / 2 ( y ) ∈ ∆ be some ball that is not monochromatic (such a ball must exist since f is injective). Pick two points p , q ∈ B δ / 2 ( y ) that have different images under f . But they ar e at most δ apart! Any bijection φ from C S to O S that preserves this distance will thus ensure that ther e ar e two points φ − 1 ( p ) and φ − 1 ( q ) that ar e within distance δ − e but have a distance of 1 > δ 0 in D S . The essence of the above argument is that a discrete decision space disallows a fair mechanism, and precludes fairness. 4.1 Non-discriminato ry decisions are p ossible Demographic parity , the disparate impact four-ﬁfths rule, and other measures quantifying the similarity in the outcomes that groups receive in the decision space are prevalent and associated with many group fairness mechanisms that attempt to guarantee good outcomes under these measures. W e will show that such group fairness mechanisms guarantee non-discriminatory decisions. Recall from Deﬁnition 3.7 that non-discriminatory decisions guarantee a lack of group skew in the mapping between the construct space and decision space, i.e., the goal of non-discrimination is to ensure that the process of decision-making does not vary based on group membership. Given that it’s not possible to directly measure the constr uct space or the mapping between the constr uct space and decision space without assuming the W Y S I W Y G axiom, these group fairness mechanisms attempt to ensure non-discrimination through measur ements of the decision space. Do these group fairness mechanisms succeed? Not at ﬁrst glance. Suppose we have two groups X 1 , X 2 in C S that are far apart, i.e W ( X 1 , X 2 ) is large and suppose also that they are appr opriately far apart in their performance on the task. Suppose that because of structural bias, the images of these two groups in the observed space O S are even further apart while keeping the distribution of task performance within each group the same. A group fairness mechanism applied to the observed space will then move these groups, on the whole, to the same smaller portion of the decision space so that they r eceive decisions indicating that they are, on the whole, equal with respect to the task. (Suppose again that the individuals within the group are mapped similarly with respect to each other and the task.) Is this decision process non-discriminatory? No. While the within-group distortion will remain the same between the construct space and the decision space, the between-group distortion will be as large as the separation between X 1 and X 2 in the construct space. Intuitively , we can see this as discriminatory towards the gr oup that performs better with r espect to the task in the construct space, since they ar e, as a gr oup, receiving worse decisions than less skilled members of the other group (i.e., ther e has been group skew in their group’s mapping to the decision space). Y et these group fairness mechanisms ar e in common practice – why? First, let’s review the assumptions of this scenario. If the W Y S I W Y G axiom is assumed, then guaranteeing fairness is easily achievable, so here we are interested in what to do in the case when the W Y S I W Y G axiom is not assumed. Speciﬁcally , let’s assume that we ar e worried about the existence of structural bias – group skew in the mapping between the construct space and the observed space. In this scenario, it may make sense to assume the we’re all equal axiom. In fact, as we will show now , when the we’re all equal axiom is assumed gr oup fairness mechanisms can be shown to guarantee non-discrimination. 5 Analyzing Related Wo rk 12 Theorem 4.3 (Group fairness mechanisms guarantee non-discrimination) . Under the WA E , a G F M with parameter e 0 guarantees ( max ( e , e 0 ) / δ ) -nondiscrimination. Proof. The W AE ensur es that in the construct space C S , all gr oups ar e within distance e from each other under W d . Similarly , a gr oup fairness mechanism ensures that in the decision space D S , all groups ar e within distance e 0 from each other . Consider now the between-group distance ρ b between C S and D S . Since all groups are within e of each other in C S and within e 0 in D S , each term in the integral that computes G W is upper bounded by max ( e , e 0 ) . By construction, the within-group distance ρ w is lower bounded by the noise parameter δ . Thus, the overall structural bias scor e σ is upper bounded by max ( e , e 0 ) / δ . Note that this guarantee of non-discrimination holds even under the structural bias worldview , i.e., the theorem makes no assumptions about the mapping from the construct space to the observed space. W ith this theor em, we now have both an axiom and mechanism under which fairness can be achieved, and a corresponding axiomatic assumption and mechanism under which non-discrimination can be achieved. 4.2 Conﬂicting w orldviews necessitate diﬀerent mechanisms As we have shown in this section, under the W Y S I W Y G worldview fairness can be guaranteed, while under a structural bias worldview non-discrimination can be guaranteed. Are these worldviews fundamentally con- ﬂicting, or do mechanisms exist that can guarantee fairness or non-discrimination under both worldviews? Unfortunately , as discussed above, the W Y S I W Y G appears to be cr ucial to ensuring fairness: if for example there is structural bias in the decision pipeline, no mechanism can guarantee fairness. Fairness can only be achieved under the W Y S I W Y G worldview using an individual fairness mechanism, and using a group fairness mechanism will be unfair within this worldview . What about non-discrimination? Unfortunately , a simple counterexample again shows that these mech- anisms are not agnostic to worldview . While group fairness mechanisms were shown to achieve non- discrimination under a structural bias worldview and the we’re all equal axiom, if structural bias is assumed, applying an individual fairness mechanism will cause discrimination in the decision space whether the we’re all equal axiom is assumed or not . Consider again the two groups X 1 , X 2 in C S with large W ( X 1 , X 2 ) , and again suppose that the images of these two groups in the observed space O S are even further apart, while keeping the distribution of task performance in each group the same. Now apply an individual fairness mechanism to this observed space. The resulting decision space contains a large between-gr oup distortion since the group that performed better with r espect to the task in the construct space will have received, on the whole, much better decisions than their original skill with respect to the other group warrants. These decisions will thus be discriminatory . Choice in mechanism must thus be tied to an explicit choice in worldview . Under a W Y S I W Y G worldview , only individual fairness mechanisms achieve fairness (and gr oup fairness mechanisms are unfair). Under a structural bias worldview , only group fairness mechanisms achieve non-discrimination (and individual fairness mechanisms are discriminatory). 5 Analyzing Related W ork This section serves partly as a review of the literatur e on fairness. But it also serves as a form of “empirical validation” of our framework, in that we use our new formalization of what fairness and non-discrimination mean and the underlying assumptions necessitated when attempting to build fair mechanisms in or der to reconsider pr evious work within this framework. Broadly , we ﬁnd that the previous work in fairness-aware algorithms either (i) adopt the W Y S I W Y G worldview and guarantee fairness while assuming the W Y S I W Y G axiom or (ii) adopt the structural bias worldview and guarantee non-discrimination while assuming the we’re all equal axiom. A full survey of such work can be found in [19, 23]. Here, we will describe some inter esting repr esentative works from each of the worldviews. 5 Analyzing Related Wo rk 13 5.1 WYSIWYG W orldview One foundational work that adopts the W Y S I W Y G worldview is Dwork et al. [8]. The deﬁnition of fairness they intr oduce is similar to (and inspired) ours – they are inter ested in ensuring that two individuals who are similar receive similar outcomes. The difference fr om our deﬁnition is that they consider outcome similarity according to a distribution of outcomes for a speciﬁc individual. Dwork et al. emphasize that determining whether two individuals ar e similar with respect to the task is critical, and assume that such a metric is given to them. In light of the formalization of the construct space and observed space, we add the understanding that the metric discussed by Dwork et al. is the distance in the construct space. In our framework, this metric is not knowable unless W Y S I W Y G is assumed (or the speciﬁc mapping between the construct space and observed space is otherwise provided), so we classify this work as adopting the W Y S I W Y G worldview . Additionally , Dwork et al. [8] show that when the earthmover distance between distributions of the attributes conditioned on protected class status are small, then their notion of fairness implies non- discrimination (which they measure as statistical parity , or a ratio of one between the positive outcome probability for the protected class and that for the non-protected class). Thus, they show that under an assumption similar to the W Y S I W Y G axiom, if an assumption similar to the we’re all equal axiom is also assumed, then group fairness mechanisms guarantee fairness. Note that this special case is unusual, since both axiomatic assumptions ar e made. A follow-up work by Zemel et al. [25] attempts to bridge the gap between these worldviews by adding a regularization term to attempt to enforce statistical parity as well as fairness. Interestingly , some of the examples in Dwork et al. [8] arguing that a particular form of non-discrimination measure (“statistical parity”) is insuf ﬁcient in guaranteeing fairness make an additional subtle assumption about what spaces are involved in the decision-making process. Their model implicitly assumes that there could be both an observed decision space and a true decision space (a scenario common in the differ ential privacy literature), while our framework assumes only a single truly observable decision space (as is more common in the machine learning literature). One example issue they introduce is the “self-fulﬁlling prophecy” in which, for example, an employer purposefully brings in under-qualiﬁed minority candidates for interviews (the observed decision space) so that no discrimination is found at the interview stage, but since the candidates were under -qualiﬁed, only white applicants ar e eventually hired (the true decision space). Under our framework, only the ﬁnal decisions about who to hire make up the single decision space, and so the discrimination in the decision is detected. Another type of fairness deﬁnition is based on the amount of change in an algorithm’s decisions when the input or training data is changed. Datta et al. [4] consider ad display choices to be discriminatory if changing the protected class status of an individual changes what ads they are shown. Fish et al. [11] consider a machine learning algorithm to be fair if it can reconstruct the original labels of training data when noise has been added to the labels for anyone fr om a given pr otected class. Both of these deﬁnitions make the implicit assumption that the r emaining training data that is not the pr otected class status or the label is the corr ect data to use to make the decision. This is exactly the W Y S I W Y G axiomatic assumption. A recent work by Joseph et al. [12] also contributes a new fairness deﬁnition, akin to those introduced in this paper and by Dwork et al., that aims to ensure that worse candidates are never accepted over better candidates as measured with respect to the task. Their goal is to take these measurements within the construct space with unknown per-group functions mapping from the construct space to the observed space. Joseph et al. aim to learn these per -group functions. Thus, although their fairness goal focuses on fairness at an individual level, this work serves as a bridge to the structural bias worldview by recognizing that differ ent groups may receive dif ferent mappings between the construct space and the observed space. 5.2 Structural Bias W orldview The ﬁeld of fairness-aware data mining began with examinations of how to ensur e non-discrimination in the face of structural bias. These group fairness mechanisms often implicitly assume the we’r e all equal axiom and, broadly , shar e the goal of ensuring that the distributions of classiﬁcation decisions when conditioned on a person’s pr otected class status are the same for historically disadvantaged groups as they ar e for the majority . The underlying implicit goal in many of these papers and associated discrimination measur es is non-discrimination as we have deﬁned it in this paper – a decision-making pr ocess that is made based on 6 Conclusions 14 an individual’s attributes in the construct space and that does not have group skew in its mapping to the decision space. The particular formulation of the group fairness mechanism goal has taken many forms. Let Pr [ C = Y es | G = 0 ] be the probability of people in the minority group receiving a positive classiﬁcation and Pr [ C = Y E S | G = 1 ] be the probability of people in the majority group receiving a positive classiﬁcation. Much previous work has considered the goal of achieving a low discrimination score [3, 13, 14, 20, 25], where the discrimination score is deﬁned as Pr [ C = Y E S | G = 1 ] − Pr [ C = Y E S | G = 0 ] . Since the goal is to bring this dif ference close to zer o, the assumption is that groups should, as a whole, receive similar outcomes. This reﬂects an underlying assumption of the we’re all equal axiom so that similar group outcomes will be non-discriminatory . Previous work [10] has also cr eated group fairness mechanisms with the goal of ensuring that decisions are non-discriminatory under the disparate impact four-ﬁfths ratio, a U.S. legal notion with associated measure advocated by the E.E.O.C. [22]. W ork by Zafar et al. [24] has used a related deﬁnition that is easier to optimize. The disparate impact four-ﬁfths measure looks at the ratio comparing the protected class-conditioned probability of r eceiving a positive classiﬁcation to the majority class’ probability: Pr [ C = Y E S | G = 0 ] / P r [ C = 0 | G = 1 ] . Ratios that are closer to one ar e considered more fair , i.e., it is assumed that groups should as a whole receive similar classiﬁcations in or der for the result to be non-discriminatory . Again, this shows that the we’re all equal axiom is being assumed in this measur e. Many of these works attempt to ensur e non-discrimination by modifying the decision algorithm itself [3, 14] while others change the outcomes after the decision has been drafted. Especially interesting within the context of our deﬁnitional framework, some solutions change the input data to the machine learning algorithm before a model is trained [10, 13, 25]. These works can be seen as attempting to reconstruct the construct space and make decisions directly based on that hypothesized reality under the we’r e all equal assumption. 6 Conclusions In this paper , we have shown that some notions of fairness ar e fundamentally incompatible with each other . These results might appear discouraging if one hoped for a universal notion of fairness, but we believe they are important. They for ce a shift in the focus of the discussion surr ounding algorithmic fairness: without precise deﬁnitions of beliefs about the state of the world and the kinds of harms one wishes to prevent, our results show that it is not possible to make pr ogress. They also for ce future discussions of algorithmic fairness to directly consider the values inherent in assumptions about how the observed space was constructed, and that such value assumptions should always be made explicit. Although the speciﬁc theorems themselves matter , it is the deﬁnitions and the problem setup that are the fundamental contributions of this paper . This work represents a ﬁrst step towar ds fairness resear chers using a shared setting, vocabulary , and assumptions. 7 Ackno wledgements W e want to thank the attendees at the Dagstuhl workshop on Data, Responsibly for their helpful comments on an early presentation of this work; special thanks to Cong Y u and Michael Hay for encouraging us to articulate the subtle differ ences in reasons for choosing a speciﬁc worldview and to Nicholas Diakopoulos and Solon Barocas to pointing us to the relevant work on “constr ucts” and inspiring our naming of that space. Thanks to T ionney Nix and T osin Alliyu for generative early conversations about this work. Thanks also to danah boyd and the community at the Data & Society Research Institute for continuing discussions about the meanings of fairness and non-discrimination in society . References [1] R. Agarwala, V . Bafna, M. Farach, M. Paterson, and M. Thorup. On the approximability of numerical taxonomy (ﬁtting distances by tree metrics). SIAM Journal on Computing , 28(3):1073–1085, 1998. 7 Acknowledgements 15 [2] M. Almlund, A. L. Duckworth, J. J. Heckman, and T . D. Kautz. Personality psychology and economics. T echnical Report w16822, NBER W orking Paper Series. Cambridge, MA: National Bureau of Economic Research., 2011. [3] T . Calders and S. V erwer . Three na ¨ ıve Bayes approaches for discrimination-free classiﬁcation. Data Min Knowl Disc , 21:277–292, 2010. [4] A. Datta, M. C. T schantz, and A. Datta. Automated experiments on ad privacy settings: A tale of opacity , choice, and discrimination. Proceedings on Privacy Enhancing T echnologies , 1:92 – 112, 2015. [5] K. Dhamdhere. Approximating additive distortion of embeddings into line metrics. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and T echniques , pages 96–104. Springer , 2004. [6] A. L. Duckworth, C. Peterson, M. D. Matthews, and D. R. Kelly . Grit: Perseverance and passion for long-term goals. Journal of Personality and Social Psychology , 92(6):1087–1101, 2007. [7] A. L. Duckworth and D. S. Y eager . Measurement matters: Assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher , 44(4):237–251, 2015. [8] C. Dwork, M. Hardt, T . Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proc. of Innovations in Theoretical Computer Science , 2012. [9] M. Farach, S. Kannan, and T . W arnow . A robust model for ﬁnding optimal evolutionary trees. Algorith- mica , 13(1-2):155–179, 1995. [10] M. Feldman, S. A. Friedler , J. Moeller , C. Scheidegger , and S. V enkatasubramanian. Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 259–268, 2015. [11] B. Fish, J. Kun, and A. D. Lelkes. A conﬁdence-based approach for balancing fairness and accuracy . In Proc. of the SIAM International Confer ence on Data Mining (SDM) , 2016. [12] M. Joseph, M. Kearns, J. Morgenstern, and A. Roth. Fairness in learning: Classic and contextual bandits. In Proc. of the Neural Information Pr ocessing Systems (NIPS) , 2016. [13] F . Kamiran and T . Calders. Classifying without discriminating. In Proc. of the IEEE International Conference on Computer , Control and Communication , 2009. [14] T . Kamishima, S. Akaho, and J. Sakuma. Fairness aware learning thr ough regularization approach. In Proc of. Intl. Conf. on Data Mining , pages 643–650, 2011. [15] M. Kearns. Efﬁcient noise-tolerant learning fr om statistical queries. Journal of the ACM , 45(6):983–1006, Nov . 1998. [16] F . M ´ emoli. Gromov–wasserstein distances and the metric appr oach to object matching. Foundations of computational mathematics , 11(4):417–487, 2011. [17] Northpointe. COMP AS - the most scientiﬁcally advanced risk and needs assessments. http://www .northpointeinc.com/risk-needs-assessment. [18] J. E. Roemer . Equality of Opportunity . Harvard University Press, 1998. [19] A. Romei and S. Ruggieri. A multidisciplinary survey on discrimination analysis. The Knowledge Engineering Review , pages 1–57, April 3 2013. [20] S. Ruggieri. Using t-closeness anonymity to contr ol for non-discrimination. T ransactions on Data Privacy , 7:99–129, 2014. [21] M. V . Santelices and M. W ilson. Unfair treatment? The case of Fr eedle, the SA T, and the standardization approach to dif ferential item functioning. Harvard Educational Review , 80(1):106–134, April 2010. 7 Acknowledgements 16 [22] The U.S. EEOC. Uniform guidelines on employee selection procedur es, March 2, 1979. [23] I. ˇ Zliobait ˙ e. A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148 , 2015. [24] M. B. Zafar , I. V alera, M. G. Rogriguez, and K. P . Gummadi. Fairness constraints: A mechanism for fair classiﬁcation. In ICML W orkshop on Fairness, Accountability , and T ransparency in Machine Learning (F A TML) , 2015. [25] R. Zemel, Y . W u, K. Swersky , T . Pitassi, and C. Dwork. Learning fair repr esentations. In Proc. of Intl. Conf. on Machine Learning , pages 325–333, 2013.

On the (im)possibility of fairness

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment