Wrong side of the tracks: Big Data and Protected Categories

When we use machine learning for public policy, we find that many useful variables are associated with others on which it would be ethically problematic to base decisions. This problem becomes particularly acute in the Big Data era, when predictions …

Authors: Simon DeDeo

Wrong side of the tracks: Big Data and Protected Cate gories Simon DeDeo ∗ June 27, 2016 Abstract When we use machine learning for public policy , we find that many useful variab les are asso- ciated with others on which it woul d be ethically problematic to base decisio ns. This problem becomes particu larly acute in the Big Data era, when predictio ns are ofte n made in the abse nce of strong theori es for underlying causal mechanisms. W e descri be the dangers to democratic decisi on-making w hen h igh-perfor mance algorithms f ail to prov ide an e xplicit acco unt of cau- sation . W e then demonst rate how information theory allo ws us to degrade predictio ns so that the y de correlate fro m pro tected v ariabl es with minimal l oss of accuracy . Enforcing t otal de cor- relatio n is at best a near -term solution, ho wev er . The role of causal ar gument in ethica l debate ur ges the de velo pment of ne w , interpr etable machine- learning algorithms that refer ence causal mechanis ms. “SIR,—I ha ve read your letter with interest ; and, judging from your descr iption of yourself as a working-man , I venture to think that you will ha ve a much better chance of success in life by remainin g in your own sp here and sticki ng to your trade than by adopti ng any other cours e. ” J ude the Obscur e , Thomas Hardy (1895) Should juries reason well? Should doctors? Should our leaders? When t he human mind is augmented by machine learning, the question is more subtl e t han it appears. It seems obvious that, i f we gather more inform ation and use proven meth ods t o analyze it, we will make b etter decisions. The protagonist on television turns to a computer to enhance an image, find a terrorist, or track down the source of an epidemic. In the popular imaginatio n, computers help, and the better the computer and the more clev er the programmer , the better the help. The real story , howe ver , is more surprising. W e naturally assu me that t he algorithm s th at model our behavior will do s o in a way t hat av oids human bias. Y et as we will show , computer analysis can lead us to m ake decisions that withou t our knowledge, judge people on the basis of their race, ∗ Departmen t of I nformatics, Scho ol of Inf ormatics and Computing, and Program in Cognitiv e Science, I n- diana University , Bloomingto n, IN 47408 & Santa Fe Institute, 1 399 Hy de Park Roa d, Santa Fe, NM 87501 . simon@santaf e.edu ; http://santafe.edu/%7E simon 1 sex, or class. In relying on machines, we can exacerbate preexisting inequali ties and eve n create new on es. The problem i s made worse, not better , in the big dat a era—despite and ev en because of the fact that our algorithm s have access to increasingly rele vant informati on. The challenge is a broad one: we face it as ordi nary people on juries, experts sought ou t for our specialized knowledge, and decis ion makers in positions of po wer . T o demo nstrate this challenge, we will m ake explicit how we judge t he ethics of decision - making in a political context. Through a series of examples, we will sho w ho w a crucial feature of those judgments in volv es reasoning about cause. Causal reasoning i s at t he heart of how we j ustify the ways in w hich we bot h rew ard i ndivid- uals and hold t hem respons ible. Our most successful algorithms, though, do not p roduce causal accounts. They may not rev eal, say , which features of a mort gage applicant’ s file combined t o- gether to lead a bank’ s algorithm to offer or deny a loan. Nor do they provide an account of how the ind ividual in questio n might have come to hav e those properties, or w hy those features and not others were chosen to begin with. The eth ics of ou r polici es become opaqu e. In situation s such as these, decision m akers may unkno win gly violate their c ore beliefs when they follow or are influenced by a machine recommendation. In some cases, th e s olution t o a mo ral p roblem t hat technology creates is more technology . W e will illustrate how , i n the near -term, a brute force soluti on is poss ible, and p resent its optimal mathematical form. M ore promis ingly yet, future extensions of work j ust getting underway in t he computer sciences may make it possibl e to rever se-engineer the i mplicit morals of an algo rithm, allowing for mo re effi cient use of t he data we hav e on hand . W e describe two recent advances— contribution propagation and the Bayesian list machine—that may help this goal. All these m ethods ha ve their l imits. The ethical use o f machines may l ead to new (short-term) ineffi ciencies. W e may find t hat more m ortgages are not repaid, more scholarships are w asted, and more crimes are not detected. This is a trade-off familiar to democratic societ ies, whose j udicial systems, for example, thriv e under a presumptio n of in nocence th at may let the guilty go free. Justice and fl ouris hing, e ven in the presence of inefficiencies, are not nece ss arily incompatible. T o trade s hort-term gain for more sacred va lues has, in certain historical p eriods and for reasons s till poorly understood, led to long-term prosperity . Our discussion will in volve qu estions posed by modern, deve lop ed, and div erse societies re- garding equality of both o pportunity and ou tcome. W e do n ot take a position on th e merits of any particular politi cal program or approach. Rather , our goal is to show h ow the us e of new algo- rithms can interfere wi th the ways in which citizens and politicians historically ha ve debated these questions i n democratic societies. If we do not understand how machines change the nature of decision-making, w e wil l find ourselves increasingl y unable to ha ve and resolve ethical question s in the public sphere. 1 Correlation, D iscriminat ion, and the E thics of Dec ision-Making There are many constraints on the ethics of decision-making in a social cont ext. W e begin here with one of t he clearest of the modern era: the noti on of a protected category . Decisi ons made on the basis of such categories are considered potentially problematic and have been the focus of debate for over half a century . In the Un ited States, for instance, t he Civil Rights Act of 1 964 prohibits d iscrimination on t he b asis of race, color , religi on, sex, and national orig in, while T itle 2 IX of the Education Am endments of 1972 m akes it l egally unacceptable to use sex as a criterion in providing educa ti onal opportunities. What appears to be a bright-l ine rule, howe ver , is anything but. In any society , protected and unprotected cate gories are strongly correlated. Some are obvious: if I am female, I am more likely to be und er fiv e feet t all. Others are l ess so: depending on m y nati on of origin, and th us m y cultural background, I may sho p at particular stores, have dist inctive patterns of electricity use, marry young, or be at higher risk for diabetes. Correlations are ever ywhere, and the euphemisti c North American i diom “wrong side of t he tracks” gains it s meaning from t hem. Being north or south of a town’ s rail road line is an inno cent category , b ut correlates with properties th at a society may consider an improper basis for decision- making. What a person c onsi ders “wrong” about t he wrong side of the trac ks is not geography b ut rather the kind of person who live s there. In the big data era, euphemisms multipl y . Consider , for e xample, a health c are system with the admirable goal of allocating scarce transpl ant organs to those recipients m ost likely to benefit. As electronic records, and methods for collecting and analyzin g t hem, become increasingly sophis- ticated, we may find stati stical e vidence th at properties—diet, smo king, or exerc is e—correlated with a particular ethnic grou p give members a lower surviv al rate. If we wish t o maximize t he number of person-years of li fe saved, should we make decis ions that end up preferring recipient s of a diffe rent race? Such problems generically arise when machine learning is used to select members of a popula- tion t o receive a benefit or suffer harm. Organ donation i s only on e in stance of w hat we expect to be a wide portfolio of uses with inherent moral risks. Should we wish to allocate scholarship fund- ing to those students most likely t o graduate from college, we may find that including a student’ s zip code, physical fitness, or vocab ulary increases the predictive power of our algorithms. 1 None of these properties are pro tected categories, b ut t heir use in machine l earning will naturally lead to group-dependent outcomes as e verything from place of residence to medical ca re to vocab ulary learned in childhood may correlate with a property such as race. In the case of the allocation o f scho larship fund s, we m ay want to exclude s ome sources of data as being prima faci e discrimin atory , such as zip code, e ven when t hey do not directly correspond t o protected categories. Others, t hough, such as ph ysical fitness or vocab ulary , may plausibly signal future success tracking, say , character traits such as grit [1, 2]. Y et the predictive power of physical fitness or vocab ulary may be driven in part by how race or socioeconomic status correlates w ith bot h these predictor variables and future success. Da ta- mining m ight uncover a relationship between adolescent fitness and future employment success. But this corr elatio n may be induced by underlying mechanisms such as access to playgrounds that we may consider problematic influencers of decision-making. An ef fort to use p hysical fitness as a signal of mental discipline may lead us to prefer wealthier students solely because physical fitness signals access to a playground, acc ess to a pl ayground si gnals hi gh socioeconomic status, and high socioeconomic status leads to greater access to job networks. Put informall y , a machine may discover th at squash players or competit iv e rowers hav e unusual success in the banking p rofession. But s uch a correlation may b e driv en by how e xpos ure to these 1 Note that use of physical fitness does not mean we select stude nts wh o are ne cessarily m ore fit instead of more likely to graduate; rathe r , we can improve our orig inal selection go al—graduatio n rate—by use of sub tle sig nals that include physical fitness. 3 sports correlates with membership in sociocultural groups that ha ve traditio nally dominated th e profession—not some occult relationship between a squash serve and the ability to a ss ess the v alue of a trade agreement. 2 Reasoning about Causes One solution to the problems of the previous section is t o c ons ider a ll measurable properties guil ty until proven innocent. In this case, we base decision-making only o n those properties for which a d etailed causal m echanism is known, and known to be eth ically neut ral. A focus on the causal mechanisms th at play a role in decision-maki ng i s well known; to reason morall y , particularly in the public sphere, is to in vok e causation. 2 For e xampl e, if we wished to use physical fi tn ess in scholarship deliberations, we w ould begin by prop osing a causal narrativ e: how a student’ s character coul d lead them to a d esire to accomplish diffic ult t asks through persistence; h ow this desire, in the right con texts, could cause them to t rain for a competit iv e sport; and h ow this training would cause them to improve in quanti tativ e measures of physi cal fitness. W e would be alert for signs that o ur reasoning was at fa ult —if excellence at handball is no more or less a signal of grit than excellence at squash, we sh ould not prefer the squash player from Manhattan to the handball player from the Bronx. This kind o f reasoning, impli cit or explicit, is found almost ev erywhere people gather to m ake decisions that af fect the lives of ot hers. Human-readable accounts o f why s omeone failed t o meet the bar for a s cholarship, trigg ered a stop and frisk, or was aw arded a government contract are the bread and butter for ethical debates on policy . These usually dem and we explain both the role dif ferent p erceptions played in the decis ion-making process ( i.e . , on what basis the commit tee made the decision it did) and the causal origin of the facts that l ed to those perceptions ( i.e. , how it came about t hat the person had the qualities that formed the basis of that decision). Even if we agreed o n a mechanism con necting a student’ s character and t heir physi cal fitness, we might be concerned with the causal role played by , say , socioeconomi c status: a student’ s character may lead them to the desi re to accomplis h diffic ult tasks, but their so cioeconomic status may rule out access to p laygrounds and coaching. Shou ld we agree on t his ne w causal pathway , it might lead us to ar gue against the use of physical fitness, or to us e it to make decisions only wit hin a particular socioeconomic category . 3 Clash of the Machines The demand for a causal account of b oth the origin of rele vant facts and their us e is at t he core of the conflict between ethical decision -making and the us e o f b ig data. Th is i s because the algorithms that us e these data do not make explicit reference to causal mechanisms. Instead, they gain their power from the d iscovery of unexpected patterns, found by com bining coarse-graining prop erties in often uninterpretable ways. “Why the computer said wh at it d id”—why one candidate was rated higher than another , for 2 See, for example, the e ssay [3] o r the revie w [4], and refere nces therein , for the role of causality in legal rea- soning; mor e bro adly , a key reason fo r the analysis of causation in gen eral is its role in eth ical concep ts suc h as responsibility [5]. 4 example—is no longer clear . On the one side, we have the inputs: d ata on each candidate. On the other side, we have the output: a bi nary classification (good risk or bad risk), or perhaps a number (say , the probabil ity of a m ortgage default). No human, h owe ver , wires togeth er th e logic of the intermedi ate st ep; no human d ictates how fa cts about each candidate are combined together m athematically (“divide salary by d ebt, ” say) or l ogically (“if married and under t wenty- fi ve, increase risk ratio”). The programmer sets the bo undaries: what ki nds of wi rings are possible. But they al low t he machine to find, within this (usually i mmeasurably lar ge) space, a m ethod of combination th at p er - forms particul arly well. T he method i tself is us ually impos sible to read, l et alone interpret. When we do attempt to represent it i n h uman-readable form, t he best we g et is a kind of spaghetti code that subjects the i nformation to multiple parallel transformations or u nintuitive recombinati ons, and ev en allows rul es to v ote agains t ea ch other or gang up in pairs against a third. Meanwhile, advance s in machine l earning generally amount to di scover in g particularly fertil e ways to constrain t he space of rules the machine has to sear ch, or in finding new and faster methods for searching it. They often take t he form of black magic: heuristics and rules of thum b that we stumble on, and that have unexpectedly good performance for reasons we struggle to explain at any le vel of rigor . As h euristics are stacked on top of heuristics, the impact of these adv ances is to make the rules more tangl ed and harder to i nterpret than before. (Thi s poses problems beyond the ethics of d ecision-making; the abi lity of high-powered machines to achieve increased accurac y at the cost of intelligib ility also threatens certain a venues of progress in science more generally [6].) As described by Ref. [7], the situatio n for the ethicist is further comp licated because the volume of data demands that i nformation be st ripped of cont ext and revealing ambiguity . Quit e l iterally , one’ s behavior is n o longer a si gnal of the content of one’ s character . Even if we could follow the rules of an algorithm, interpretation of the underlying and i mplicit theory that t he algorithm hol ds about the world becomes impossible. Because of the problem of euphemism, eli minating protected categories from our input data cannot solve the problem of i nterpretability . It i s als o t rue th at k nowledge of prot ected categories is no t in itself et hically problemati c and may in some cases be needed. It may aid u s not only in the pursuit of greater fairness b ut also in the pursuit of other , unrelated goals. The diagno sis of diseases with dif fering prev alence in dif ferent groups is a sim ple example. Consider t he organ transplant case, and a protected category { a, b } . Individuals of t ype a may be subject to one kind of compli cation, in dividuals of type b equally subject to a dif ferent kind. Given imprecise test ing, knowledge of an indi vidual’ s ty pe may help in determining who are the best ca n- didates from each group, im proving survi val without implicit reliance on an ethically p roblematic mechanism. In other cases, fairness in decision-m aking m ight s uggest we t ake into account the diffi culti es a candidate has faced. Consider the awarding of state scholarships to a summ er camp. Of two candidates with equ al achievements, we may wish to prefer the stu dent who was likely to h a ve suffe red racial discrim ination in recei ving early opportuniti es. T o do this rebalancing, we must, of course, come to learn the candidate’ s race. Even wh en fairness is not an issue, knowledge o f protected categories may aid decisio n m ak- ers wel l beyond t he medical case described above. An under graduate admissio ns comm ittee for an engineering school might wish to rank a hig h-performing female applicant above a simil arly qualified male, n ot out of a desire to redress wrongs or achie ve a demog raphically balanced group, but s imply because her achiev ements m ay be the mo re im pressiv e for ha vi ng been ac comp lished in 5 the face of overt discriminati on or stereotyp e th reat [8]. A desire to select candidates on the basis of a univ ersal characteristic (in this case, perhaps grit; see discussion above) i s aided by the use of protected information. In the or gan transplant case, th e knowledge of correlations may be sufficient. I need not know why group a and group b hav e the difference t hey do—on ly t hat I can gain predictive knowledge of th eir ris k profiles by distinct m ethods. In the other two cases, howe ver , d iscussions abou t how , when, and why to dra w back the v eil of ignorance [9 ] lead us to con versations about the causes and mechanisms that underlie inequities and adv antages. In sum, it is not just that computers are unable to judg e th e ethi cs of their decision-making . It i s t hat th e way these algorithm s work precludes t heir analys is in the very hu man ways we hav e learned to judge and reason about what is, and is not, just, reasonable, and good. 4 Algorithm ic Solutio ns W e find ourselves in a quandary . W e can reject the mod ern turn to data science, and go back to an era when statistical analysis relied on explicit causal mo dels that c oul d naturally be e xamin ed from an et hical and poli tical standpoint . In doi ng so, we lose out on many of the potential benefits t hat these ne w algorithm s promi se: mo re efficient use of resou rces, better aid to our fellow citizens, and new o pportunities for human flourishin g. Or , we can reject t his earlier squ aring of moral and technocratic goals , accept that machine-aided decision-making will lead to di scrimination, and enter a n e w era of euphemisms, playing a game, at best, of catch as catch can, and banni ng the use of t hese m ethods when problems b ecome a pparent. Neither seems acceptable, although ethical intuition s ur ge t hat we trade util ity (the unrestricted use of machine-aided inference) for m ore sacred v alues such as equit y (concerns with the dangers of euphemism). Some progress h as b een made in resolving this conflict. R esearchers have begun to develop causal accounts of how algorith ms use i nput data to classi fy and p redict. The cont ribution pr opa- gation m ethod introduced by Ref. [10], or the Bayesi an list ma chines of Ref. [11 ] are two recent examples, that allow us to see how the differe nt parts of an algorithm’ s in put are combined to produce the final prediction. Contribution propagation is most naturally appl ied to the layered processing th at happens in so- called deep learning systems. 3 As a deep learning system passes data through a series of modules, culminating in a final predicti on, contribution propagati on allows us to track which features of the in put dat a, in any particular ins tance, lead to the final classification. In the case of im age classification, for example, it can highli ght regions of a picture of a cat that led it to be classified as a cat (perhaps the ears, or forward-facing eyes); in this way , it allows a us er to check to make sure the correct features o f the i mage are being used. Contribution propagation can ident ify when an image is being clas sed as a c at solely on the basis of irrele vant cont exts, such as the bac kgrou nd featuring an armchair—in which c ats are often found . This makes contri bution propagati on a natural diagnostic for t he “on what basis ” problem: which features were u sed in the decision-m aking process. Appli ed to a complex medical or social 3 Many machine-lear ning algorith ms are designed to classify incomin g data: mortgag e applican ts, for example, by degree of risk. They w ork by finding pattern s in past data, and using the relati ve strengths of these patterns to classify new data of unknown type. “Deep learning” combines these tw o steps, simultaneo usly learnin g pattern s and how they combine to produ ce the prop erty of interest. 6 decision-making prob lem, it could highlig ht t he relev ant categories, and whether th ey made a positive or negativ e contrib uti on to the final choice. The Bayesian list machi ne takes a d iff erent approach, and can be applied to algorithms that rely on so-called d ecision trees. A decision tree is a series of quest ions about prop erties of the input data (“is the subject over the age of twenty on e?”; “does t he subject liv e on the s outh si de?”) that result in a classification (“the subject is h igh risk”). Because the t rees are s o compl ex, howe ver , with sub-trees of redundant questions , it can be hard to int erpret t he m odel of t he world th at the algorithm is using. The Bayesian list machine circumve nt s this problem by “pruning” tree s to find simple decision rules that are human readable. Neither adv ance can solve the full problem: sim ply knowing h ow the v ariables were combined does not p rovide an explanation for why the variables were combined in t hat fashion. Whil e bo th methods allow us to look inside previously opaque bl ack boxes, they leave us uncertain o f the causal mechanisms that led to this or that c om bination of v ariables being a good predictor of what we wi sh to kno w . W e may know “on what basis” the decisio n was made, b ut not how it came about that the individuals in question met the criteria. T echnical advances may go a long way to solvin g this second, more p ressing p roblem. A long tradition in artificial intell igence research relied on building explicit models of causal inter- actions [12 ]. Frame works such as Pearl causality [13] attempt to create, i n a computer , a ment al model of the causes in the world expressed graphically , as a network of influences. Th is is, in sp irit, similar to an earlier attack on the artificial intelligence probl em—one described as “good old f ash- ioned” artificial intelligence, or GOF AI, b y John Haugeland [14]. GOF AI approaches try t o make intelligent machines by mimicking a particular style of think ing—the representation of thoug hts in a mathematical syntax, and their manipul ation according to a fixed and internally consistent set of rules. The causal networks of Judea Pearl can be “read” by a human, and in a logically cons istent fashion, their causal language can be used in moral e xplanatio ns. Pearl causali ty has had wi despread influence in the sciences. But it does not yet play a role in many of the machine-learning algorithms, such as deep learning or random forests, in widespread use for m onitoring and pre dicti on tod ay . One day , a framework such as t his coul d provide a “moral schematic” for new alg orithms, m aking it possibl e for researchers, policy makers, and citizens to reason ethically about their use. That day has not yet arriv ed. 5 Inf ormatio n Theory and Public Policy In the absence of causal models that allo w us to discuss the moral weight of group-dependent out- comes, progress i s s till possi ble. In this section, we w ill sh ow ho w t o encode a sim pler goal: that decisions made by decisi on makers do not correlate with prot ected variables at all. One might infor- mally describe s uch a system as “outcome equal. ” Correctly implemented, our so lution compl etely de-correlates category and outcome. Imagine you hav e heard that a person received a scholarship ; using the outcome equal solution we present here, this fact would giv e you n o knowledge about the race (or sex, or class, as desired) about the indi vid ual in question. Whether such a go al i s d esirable o r no t i n any case is , and has b een, constantly u p for debate, and we will return to t his in our conclusio ns. The existence of a unique mathematical solut ion to this goal is not on ly of intrinsi c interest. It also p rovides an explicit e xampl e of how technical and ethical iss ues intertwi ne in the algo rithmic era. The exact structure of the mathematical argument 7 has moral implications . The method we propose post-processes and “cleans” prediction outputs so t hat we eliminate the possibilit y that the output o f an algorithm correlates with a protected category . At the same time it will preserve, as m uch as possible, the algorithm’ s predictive p owe r . The method alters the outputs of an algorit hm, in contrast to recent work [15] that has considered the possi bility of altering an alg orithm’ s inpu t . Our recommendation s here are also distinct from tho se considered by Ref. [16], which seeks to exclude some variables as i nputs altogether . Indeed, here, in order to correct for the effec ts of correlation s, our calculatio ns require knowledge of prot ected categories to proceed T o determine h ow to do this correction , this we turn to Information Theory . W e w ant to predict a parti cular policy-relev ant variable S (say , the odds of a pati ent surviving a m edical procedure, committi ng a crime, or graduating from college) and ha ve at our disposal a list of properties, V . The list V may be partitioned into two sub -lists, one of wh ich, U , is unproblem atic while the other , W , consists of protected v ariables such as race, sex, or national origin. Giv en our discussion above, making a policy decision on the basis of Pr( S | V ) m ay well be un- acceptable. If it is unacceptable, s o is us ing the rest ricted function Pr( S | U ) , because U correlates with V (the “wrong si de of the tracks” problem). In addition, use of the restricted function throws aw ay pot entially innocuous use of protected cate gories. W e wish t o find the distribution which avoids correlating wit h protected variables while m ini- mizing the loss of predicti ve informati on this imposes. The insensitivity conditio n for this “policy- valid” probability , Pr X is X u Pr X ( s, u, w ) = Pr( s )Pr( w ) . (1) Equiv alently , Pr X ( s | w ) — the probabil ity of a protected category w having outcom e s — is ind epen- dent of w , give n t he true dist ribution, Pr( w ) , of th at category in the popu lation. Our principle is thus that from knowledge o f the out come alone on e can no t infer prot ected properties of an indi - vidual. In the two examples above, allocation according to Pr X would mean that i f you learn that a person received a life-saving t ransplant or was su bject to additi onal police surveillance, you do not gain information about his race. There are many Pr X that satisfy th e constraint above. T o minim ize informat ion loss, we impose the additional constraint that it minim ize KL(Pr X ( S, V ) , Pr( S, V ) ) , (2) where KL is the Kullback-Leibler di ver gence, KL( P , Q ) ≡ X y P ( y ) log P ( y ) Q ( y ) . (3) Minimizi ng Kullback-Leibler diver gence means th at decisions made on the basis of Pr X ( S, V ) will be maximally i ndistingui shable from the t he full kno wledge encapsulated in Pr( S, V ) (the Chernof f–Stein Lemm a; see Ref. [17]). Giv en the structure of Eq. 1 we can m inimize Eq. 3 using Lagrange mult ipliers. W e require | S || W | + 1 multipli ers: one to enforce a normalizati on for Pr X , and the remainder to enforce the distinct constraints implied by Eq. 1. W e find Pr X ( s, u, w ) = Pr( s, u, w )  Pr( s ) Pr( s | w )  . (4) 8 Knowledge of how predicti ons, s , correlate wi th protected variables w , allows us t o undo those correlations when using these predictions for policy purposes. Kullback-Leibler diver g ence has the property o f becoming i ll-defined wh en Pr( S, V ) is equal to zero b ut Pr X ( S, V ) is not. Howe ver , the ethical intuit ions th at lea d to the imposition of Eq. 1 do not app ly when Pr( S, V ) is precisely zero. Thi s perfect k nowledge case im plies a very differ ent epistemic stru cture: it is necessarily true—as opposed to sim ply very probabl e—that a certain group can not ha ve outcome S . Rather than the example of organ t ransplants o r graduation rates, where such perfect kn owledge is impossible, a better analogy is i n the provision of pre-natal care. No not ion of justice suggest s that fair treatment requires equal resources to test both men and women for p regnanc y . Correc t account ing for these exceptions i s easil y accomplished, so th at an agency can exclude m en from pre-natal care, but, us ing the methods of this secti on, provide t hem optimally for women while pre venting non-uniform allocation by rac e, reli gion, or social class. The methods we present here are a m athematically clean solution to a particul ar problem. Use of these methods enforces a st rict independence between protected categories and the variables o ne wishes to know . In any specific case this may , or may not, be t he desired outcom e. Populations may b e willin g to trade off group-dependent outcom es in fa vor of o ther virtues . This can be seen i n the ongoing debates ov er Proposition 209 in California (1996) and the Michigan C ivil Rights Initiativ e (2006), both of which forbi d the use of information concerning race to rebalance col lege admission , and the latter of which was affirmed as constitut ional in April of 2014. T his “outcom e equal” construction, in other words, does not absolve us of the dut y to reason about the methods we use. Rather , it provides a limiting case against which we can compare o ther approaches. How mu ch does a prediction change when we force it to b e o utcome equal? Do the changes we see alert us to potentially problematic sources of the correlations our algorithms use? The relative transparency of decision-m aking in the era before data science allowed ordinary people to debate question of group-dependent outcomes o n equ al terms. Even when burea ucracies, traditions, or a failure to achiev e consens us pre vent them from i mplementing the chang es they desire, citizens at least ha ve had the ability to debate and discuss what they saw . When w e use machines to infer and extrapolate from data, democratic debate on the underlyin g questi ons of fairness and opportunity becomes harder , i f not impossi ble, for citizens, e xperts, and leaders a li ke. If we do not know the m odels of th e world on wh ich our algori thms rely , we cannot ask if th eir recommendations are just. 6 Conclusio ns Machine learning giv es u s new abiliti es to predict—wit h remarkable accuracy , and well beyond the powers o f the unaided hum an mind—som e of the most critical features of our bodi es, min ds, and societies. The machines t hat implement these algorithms increasingl y become extensions of our will [18], giving us the abili ty to infer the outcom es of thought experiments, fill in mi ssing knowledge, and predi ct the future with an unexpected accuracy . Over and above advances reported in t he peer-r evie wed literature, recent popular accounts provide a sense of the gro win g use of these t ools beyond the academy [19, 20], and their use seems likely to accelerate in both scale (number of domains) and scope (range of problems withi n a domain). 9 Reliance on po werful but onl y partially-understood algorit hms provides new challenges to risk management. For example, predictions may go wrong in ways we do not expect, making it harder to assess risk. As d iscussed elsewhere [21], machine learning als o provides new chall enges to individual pri vacy: in a famous example, s tatisticians at W alMart were abl e to infer , from her shopping habits, a teenager’ s pregnancy before it became physically e vident to her father [22]. This chapter demonstrates th e existence of a t hird challenge. Th is challenge persist s e ven when algorit hms function perfectly (when t heir predictions are correct, and th eir uncertainties are accurately estimated), when they are used by well -meaning individuals, and when th eir use is restricted to data and the prediction of va riables, explicitly consented to by participants. T o help overcome t his third challenge, we hav e presented the ambiti ous goal of rev erse engi - neering algorithms to undercover their h idden causal models. W e ha ve also presented, b y way o f example, more modest meth ods that work i n a restricted dom ain. In bo th cases, progress requires that we “open the algorithmi c bo x, ” and re ly on commitments by corporations and governments to re veal i mportant features of the algorithms they use [23]. Judges al ready use predictive computer models in the sentencing phase of trials [16]. There appears t o be no aware ness of the dangers these models pose to just decision-makin g—despite the influence they ha ve on life-changing decisions. These are real challenges, but there is also reason for optim ism. Mathem atical innov ati on may provide t he means to re pair unexpected i njustices. The same m ethods we use to stu dy new tools for computer-aided prediction may change our views on rules we have used in the p ast. Perhaps most important, they may re-empo wer policy makers and ordinary citizens, and allow ethical debates to thriv e in the era of the algorithm. The very natu re of big d ata blurs the boundary between inference to t he best soluti on and ethical constraints on uses of that inference. Debates concerning equi ty , discrimination, and fairness, pre vio usly un derstood as the domain of legal phi losophy and political theory , are no w un a voidably tied to mathematical and com putational quest ions. The ar gum ents here suggest that ethical debates must be su pplemented by an understanding of t he mathematics of prediction. And they urge that data scientists and stati sticians become increasingly familiar with the nature of ethical reasoning in the public sphere. Ac knowledgements . I thank Cosma Shalizi (Carnegie Mello n University) for conv ersations and discussion, and jo int work on the “ou tcome equal” solut ion presented above. I thank J ohn Miller (Carnegie Mellon Univ ersity), Chris W ood (Santa Fe Institute), Dave Ba ker (Un iv ersity of Michi- gan), Eden Medina (Indiana Un iv ersity), Bradi Heaberlin (Indiana Unive rsit y), and Kirstin G . G. Harriger (University of New M exico) for addit ional discussi on. This work was s upported in part by a Santa Fe Institute Omidyar Postdoctoral Fellowship. 10 References [1] Angela L Duckworth, Christopher Peterson, Michael D Matthe ws, and Dennis R K elly . Grit: perse verance and passion for long -term goals. Journal of personality and social psychology , 92(6):1087, 2007. [2] Lauren Eskreis-W ink ler , El izabeth P Shulman, Scott A Beal, and Angela L Duckworth. The grit eff ect: predicti ng retention in th e military , the workplace, s chool and marriage. F r ontiers in Psychology , 5, 2014. [3] M.S. M oore. Causation and Responsi bility: An Essay in Law , Morals, and Metaphysics . Oxford Univ ersity Press, Oxford, UK, 2010. [4] Antony Hon or ´ e. Causation in the law . In Edward N. Zalta, editor , The Stanford Encyclopedi a of Philosophy . W inter 201 0 edition, 2010. [5] Jonathan Schaf fer . The metaph ysics of causation. In Edward N. Zalta, editor , The Stanf or d Encyclopedia of Philosophy . Summer 2014 edition, 2014. [6] Da vid C Krakauer , Jessi ca C Flack, Simon DeDeo, Doyne Farmer , and Daniel Rockmore. Intelligent data analysis of intelligent systems. In International Symposium on Intelligent Data Analysis , pages 8–17. Springer , 2010 . [7] Cristina Alaimo and Jannis Ka ll inikos. The social life of big data: Sociali ty and personaliza- tion on social media platforms. In Big Data is not a monolith , 2015. [8] Ste ven J Spencer , Claude M Steele, and Diane M Quinn. Stereotype threat and women’ s math performance. J our nal of Experimental Social Psyc hology , 35(1):4–28, 1999. [9] J. Rawls. A Theory of J ustice . Harv ard U niv ersity Press, Cambridge, MA, USA, 2009. First edition 1971. [10] W ill Landeck er , Michael D Thomure, Luis Bettencourt, Melanie Mitchell, Ga rrett T Kenyon, and Ste ven P Brumby . Interpreting individual classifications of hierarchical networks. In 2013 IEEE Symposium on Computationa l Intelligence and Dat a Mining (CIDM) , pages 32– 38, April 2013. [11] Benjamin Letham, Cynt hia Rudi n, T yl er H McCormick, and Da vid Madigan. An inter - pretable stroke prediction m odel usi ng rules and Bayesian analysis. In AAAI (Late-Br eaking Developments) , 2013. [12] Judea Pearl. Pr obab ilistic Reasoning i n Intelli gent Syst ems: Networks of Pl ausible Infer ence . Representation and Reasoning Series. Morgan Kaufmann, 1997. [13] Judea Pearl. Causality: models, r easoning and infer ence . Cambridge Univ P ress, 200 0. [14] J. Haugeland. Art ificial Intelligence: The V ery Idea . A Bradford bo ok. MIT Press, Cam- bridge, MA, USA, 1989. 11 [15] Michael Feldman, Sorelle A. Friedler , John M oeller , Carlos Scheidegger , and Suresh V enkatasubramanian. Certifying and removing disparate impact. In Pr oceedings of t he 21th A CM SIGKDD Interna tional Confer ence o n Knowledge Discovery and Data Mi ning , KDD ’15, pages 259–268, New Y ork, NY , USA, 2015. A CM. [16] Sonja B S tarr . Evidence-based sentencing and the scientific rationalization of discrim ination. Stanfor d Law Revie w , 66:803, 2014. [17] T .M. Cov er and J.A. Thomas. El ements of Informati on T heory . W iley Series in T elecommu- nications and Signal Processing. W iley-Interscience, 2006. Ch. 11.8. [18] Andy Clark. Supersizing the Mi nd: Embod iment, Action, and Cognitive Extension . Oxford Univ ersity Press, Oxford, UK, 2008. [19] V ikt or Mayer-Sch ¨ onber ger and Kenneth Cukier . Bi g dat a: A re volution that will transform how we live, work, and think . Houg hton Miffl in Harcourt, 2013. ISBN 978-054422 7750. [20] Eric Siegel. Pre dict ive analyti cs: the power to pr edict who will click, buy , lie, or die . John W il ey & Sons, 2013. ISBN 978-1118 356852. [21] Fred H. Cate. Protecti ng priv acy in a world of big data. In Big Data is not a monolith , 2015. [22] Charles Duhigg. How companies learn your s ecrets. Ne w Y ork T imes , 2012. Published 16 February 2012. Last accessed 13 January 2015. http://www .nytimes.c om/2012/02/19/magazine/shopping- habits.html . [23] Eden Medina. Rethinking algorithmic regulation. K ybernetes , 44(6/7):1 005–1019, 2015. 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment