Long-term causal effects of economic mechanisms on agent incentives
Economic mechanisms administer the allocation of resources to interested agents based on their self-reported types. One objective in mechanism design is to design a strategyproof process so that no agent will have an incentive to misreport its type. …
Authors: Panos Toulis, David C. Parkes
Long-term causal ef fects of economic mechanisms on agent incenti v es Panos T oulis and Da vid C. Parkes June 15, 2021 Abstract Economic mec hanisms administer the allocation of resources to interested agents based on their self-reported types . One objectiv e in mechanism design is to design a strate gypr oof pro- cess so that no agent will hav e an incentiv e to misreport its type. Ho wever , typical analyses of the incenti ves properties of mechanisms operate under strong, usually untestable assump- tions. Empirical, data-oriented approaches are, at best, under-de veloped. Furthermore, mech- anism/policy e valuation methods usually ignore the dynamic nature of a multi-agent system and are thus inappropriate for estimating long-term effects. W e introduce the problem of es- timating the causal effects of mechanisms on incenti ves and frame it under the Rubin causal frame work (Rubin, 1974, 1978). This raises unique technical challenges since the outcome of interest (agent truthfulness) is confounded with strategic interactions and, interestingly , is typically nev er observed under any mechanism. W e dev elop a methodology to estimate such causal ef fects that using a prior that is based on a strate gic equilibrium model. W orking on the domain of kidne y e xchanges , we sho w ho w to apply our methodology to estimate causal ef fects of kidne y allocation mechanisms on hospitals’ incentiv es. Our results demonstrate that the use of game-theoretic prior captures the dynamic nature of the kidney exchange multi-agent sys- tem and shrinks the estimates tow ards long-term ef fects, thus improving upon typical methods that completely ignore agents’ strategic beha vior . K eywords : causal inference, multiagent systems, equilibrium effects, mechanism design 1 1 Intr oduction A mechanism defines the rules that are used to determine the allocation of resources to interested parties in an economic transaction. For example, an online ad auction determines the winners of the advertisement slots and the appropriate payments, based on advertisers’ reports (bids). In designing mechanisms, a key requirement is to hav e good incentiv es properties, so that agents have no incenti ve to misreport their v aluations. Such strate gyproof mechanisms are appealing for being strategically simple for agents, and can lead to desirable outcomes because this agent behavior can be anticipated and le veraged for good ef fect. There are a few general procedures for de vising strategyproof mechanisms. One key idea is to determine the payment of an agent i as a function of the reports of all agents excluding i . Along with allocating the desired resources to this agent gi ven the prices it faces, the intuition is that agent i will hav e no incentiv e to misreport since this will not affect the utility gi ven fixed reports from others. This idea underlies the V ickrey auction (V ickrey, 1961) and its generalization in the V ickrey-Clark e-Groves (VCG) mechanism (Clarke, 1971; Grov es, 1973). F or example, in the V ickrey auction, the highest bidder wins the item but pays the second highest bid and thus, there is no incenti ve for the highest bidder to reduce the initial bid. Ho wev er , and ev en when strategyproof mechanisms can in principle be designed, theoretical analyses of their incentiv e properties rely critically on assumptions that are strong or untestable in practice. T ypical assumptions include: no collusion among agents; (ii) the rationality of partici- pants; (iii) that the types (such as valuations) of agents are correctly modeled (e.g., the y are pri vate v alues and don’t depend on information of others); and (i v) that the strategic interactions have been correctly modeled. But without getting these assumptions correct, the incentiv e properties of a mechanism will not be as desired. For example, if participants in a single-item second price (V ickrey) auction can collude then one bidder can submit a high bid while others withhold their bids. In another example, if the problem is truly multi-round then participants have a ne w incentiv e to shav e down their bids in order to get the best price for an item across time. In this light, there remains a large opportunity for empirical methods in the design of mecha- nisms, especially in estimating the causal ef fects of mechanisms on incenti ves, and other outcomes of interest (such as welfare, rev enue and so forth.) Across many online platforms such as ad auction platforms, one w ants to be able to make changes to the design across subsets of the pop- ulation and be able to estimate the effect of these design decisions on economic properties if one was to run a single design on the whole population vs run some other design. In this paper we adopt the Rubin causal frame work (Rubin, 1974, 1978) using the potential outcomes notation. Our goal is to estimate the causal ef fects of mechanisms on the incenti ves of agents, after agents ha ve 2 been randomly assigned to participate in one mechanism (vie wed as a treatment) and their reports hav e been observ ed. This raises unique technical challenges since the outcome of interest (agent truthfulness) is typically nev er observed under any mechanism. Furthermore, we assume that data collection (agent reports) happens before the system has reached an equilbrium and so, we are in- terested in a methodology that will strike a sensible balance between observed data and equilbrium considerations. There is a de veloping body of work on experimention of online, socio-economic systems, b ut it is not dev eloped within the potential outcomes frame work and has not studied the special ques- tion of causal analysis in regard to incentive properties. In related work, large field experiment conducted at Y ahoo! in 2008, aimed to estimate the effects of increased reserve prices on ke yword re venue (Ostro vsky & Schwarz, 2011). The applied method was to use a “diff-in-dif fs” estimator that completely ignores all aforementioned subtleties. Other work aims to estimate the effects of interventions in a machine learning model underlying a mechanism (e.g. (Bottou et al., 2012)), b ut the methods are usually predicti ve (i.e. predict all missing outcomes through a model based on one intervention and the observed outcomes). Equilibrium effects for causal inference has first been proposed in the econometric literature (Heckman & Vytlacil, 2005). Ho we ver , no general method- ology has been proposed for the estimation of long-term causal ef fects of policies/mechanisms. Other work has considered the empirical design of mechanisms in settings where the goal of strategyproof design is unachiev able in combination with other desirable properties, or cannot be supported from an analytical framew ork (Lubin & Parkes, 2009). This appeals to the div ergence between distributions o ver payments (or payoffs) in an incenti ve-aligned mechanism and distribu- tions ov er payments (or payoffs) in another candidate design, with the view to finding the optimal mechanism through online search. But this work does not adopt a causal approach, b ut rather as- sumes the ability to switch the entire population through alternate designs and thus does not hav e the difficulty of estimating counterfactuals. Moreov er , the work does not adopt our viewpoint of looking to make inferences from empirical frequences about reported types about the incentiv e properties of the mechanism. 2 Causal effects on incentiv es W e consider a population of N agents, inde xed by i in some natural ordering, and two mechanisms M 1 and M 0 . Each agent will be randomized to participate in one mechanism only (thus the mechanisms can be viewed as treatments that agents recei ve). Specifically , if agent i participates in M 1 then Z i = 1 and if agent i participates in M 0 then Z i = 0 . W e also consider a setting 3 where the mechanisms are multi-round, each round being indexed by t , for t = 1 , 2 , · · · T and a maximum number of rounds T . At each round t of any mechanism, each agent samples a type θ it from a type space Θ according to some distrib ution F and selects a strategy Y i ( z , t ) ∈ { 0 , 1 } . Gi ven the sampled type, agent i , then reports R i ( z , t ) to mechanism M z according to the follo wing rule: R i ( z , t ) = θ it , if Y i ( z , t ) = 1 d ( θ it ) ∈ Θ , otherwise In other words, if Y i ( z , t ) = 1 , agent i is reporting truthfully to the mechanism and it is de viating according to a known de viation if Y i ( z , t ) = 0 . W e assume that, the distribution of true types F and the function of deviation d ( · ) , are kno wn or can be estimated from other sources 1 . Thus, assume that d ( θ it ) has a known distribution G . The agent strate gy Y i ( z , t ) and report R i ( z , t ) at each round t are the potential outcomes of interest. W e consider a completely randomized e xperiment and denote the entire ( N × 1) assignment vector with Z . The full ( N × 1) vector of potential outcomes of agent stratagies for mechanisms M 0 and M 1 at round t , are denoted by Y 0 ( t ) and Y 1 ( t ) respectively . In a similar fashion, let R 0 ( t ) , R 1 ( t ) denote the potential reports of agents in mechanisms M 0 , M 1 respecti vely . Note that all the potential outcomes for agent strategies, Y 0 ( t ) , Y 1 ( t ) , are nev er observed and thus are considered as missing. Ho we ver , we make the following distinction: for an agent i with Z i = z ∈ { 0 , 1 } , the outcome Y i ( z , t ) will be r ealized b ut will not be observed whereas the outcome Y i (1 − z , t ) will not be realized at all. The subv ector of Y 0 ( t ) with the realized outcomes for M 0 is denoted by Y real 0 ( t ) . Similarly , the vector of realized outcomes for M 1 is denoted by Y real 1 ( t ) . In contrast, some of the potential outcomes of the reports of agents are observed under the mechanisms they participate in. Let R obs 0 ( t ) denote the subv ector of R 0 ( t ) , for those agents i such that Z i = 0 , and R obs 1 ( t ) be the subv ector of R 1 ( t ) for agents i with Z i = 1 . The “Science” table of the observed and unobserv ed quantities in the aforementioned experi- ment is sho wn in T able 1. W e can no w define the causal estimand of interest: Definition 2.1 (Causal ef fects on incenti ves) . The causal effect on incentives of mechanism M 1 over mec hanism M 0 in r ound t , is defined by: ∆( t ) = 1 N N X i =1 Y i (1 , t ) − Y i (0 , t ) (1) 1 For example, in the ad auction literature bidder valuations can be routinely estimated from the data using empirical distribution of bids and prices (Athe y & Nekipelov, 2010; Ostrovsk y & Schwarz, 2011) 4 T able 1: Science table of potential outcomes. Outcomes of agent strate gies Y i ( z , t ) are all missing and reports R i ( z , t ) are only observ ed when Z i = z . M 0 M 1 Units Z R 0 ( t ) Y 0 ( t ) R 1 ( t ) Y 1 ( t ) 1 0 R 1 (0 , t ) ? ? ? 2 0 R 2 (0 , t ) ? ? ? · · · · · · · · · N - 1 1 ? ? R N − 1 (1 , t ) ? N 1 ? ? R N (1 , t ) ? The estimand ∆( T ) is defined as the long-term effect on ag ent incentives. 2.1 Discussion Recall that Y i ( z , t ) denotes the strategy of agent i (1=truthful, 0=deviating) it mechanism M z at round t . Therefore, the estimand ∆( t ) defined in (1) compares the proportions of truthful agents in M 1 compared to M 0 and by definition, it holds that ∆( t ) ∈ [ − 1 , 1] . Other options for the definition of the estimand are a v ailable as well (e.g. median of difference) and in general it would in volv e a “contrast function” h ( Y 1 ( t ) , Y 0 ( t )) that will summarize the difference between the two vectors. W e will use this notion of contrast function throughout the rest of this paper , b ut for all numerical purposes we will assume this is the dif ference in means as in Definition 1. Note also that the estimand is time-dependent as we expect agents to be self-interested and adapt their strategy over repeated rounds. The long-term ef fect ∆( T ) is trying to capture this dynamic e volution of agent strategies ov er a specified time horizon T that is considered enough time for the system to reach an economic equilbrium. This is related to the study of equilibrium effects in the econometric literature (Heckman & Vytlacil, 2005). 3 Causal Infer ence Inspection of T able 1 rev eals one major technical challenge. Since we do not observe the actual strategies of agents (i.e., their “truthfulness” status) but only their reports, no potential outcomes of strategies Y i ( z , t ) are actually observed. Howe ver , giv en strate gies Y i ( z , t ) the potential outcomes 5 on reports R i ( z , t ) ha ve a well-defined distribution 2 . p ( Y 1 ( t ) , Y 0 ( t ) | R obs 1 ( t ) , R obs 0 ( t ) , Z ) ∝ L ( R obs 1 ( t ) , R obs 0 ( t ) | Y 0 ( t ) , Y 1 ( t )) × π ( Y 0 ( t ) , Y 1 ( t )) (2) The likelihood term L ( R obs 1 ( t ) , R obs 0 ( t ) | Y 0 ( t ) , Y 1 ( t )) is easy to obtain based on our assump- tions. Specifically , we hav e assumed that report R i ( z , t ) has distrib ution F if agent i is truthful and has distribution G if it is deviating. Hence: L ( R obs 1 ( t ) , R obs 0 ( t ) | Y 0 ( t ) , Y 1 ( t )) = L ( R obs 1 ( t ) | Y 1 ( t )) × L ( R obs 0 ( t ) | Y 0 ( t )) Therefore, by independence, it holds for j ∈ { 0 , 1 } indexing mechanism M j : L ( R obs j ( t ) | Y real j ( t )) = Y i : Z i = j f ( R i ( j, t )) Y i ( j,t ) g ( R i ( j, t )) 1 − Y i ( j,t ) (3) Hence, causal inference depends critically on the model of π ( Y 0 ( t ) , Y 1 ( t )) . The main contribu- tion of this paper is to consider a prior on potential outcomes that has a game-theoretic justification through a well-defined equilibrium model. The main idea is that, by doing so, we will shrink estimates from data observed at an early round to wards the long-term effects, assuming that the equilibrium model is accurate enough to describe the dynamics of the economic system. T o illus- trate our method we will compare it with a straightforward imputation method the is based on a uniform prior . More options, such as a fully-Bayesian approach are discussed later . 3.1 Empirical method: Imputation on unif orm prior of realized outcomes This method, dubbed the empirical method , serves as our baseline method and works in two steps. First, we impute the realized (but missing outcomes) Y real 1 ( t ) , Y real 0 ( t ) assuming a uniform prior . Second, we impute the non-realized and missing outcomes Y nreal 0 ( t ) , Y nreal 1 ( t ) through the empirical distribution of the imputed realized outcomes. This algorithmic process (sho wn next) is repeated many times and estimates of the causal ef fects are used for summarization. 2 Since we operate under a completely randomized experiment, the assignment mechanism p ( Z | · · · ) is uncon- founded and the vector Z can be omitted for brevity . 6 Initialize ∆ = array of length n For j = 1 , 2 , · · · n Impute all missing potential outcomes for strategies as follo ws: Y real 1 ( t ) ∼ L R obs (1 , t ) | Y real 1 ( t ) Y nreal 1 ( t ) ∼ Y real 1 ( t ) (use empirical distribution) Y real 0 ( t ) ∼ L R obs (0 , t ) | Y real 0 ( t ) Y nreal 0 ( t ) ∼ Y real 0 ( t ) (use empirical distribution) Y 1 ( t ) = ( Y nreal 1 ( t ) , Y real 1 ( t )) Y 0 ( t ) = ( Y real 0 ( t ) , Y nreal 0 ( t )) Causal ef fect estimate ˆ ∆ j ( t ) = h ( Y 1 ( t ) , Y 0 ( t )) Return ∆ Algorithm 1: Causal inference on incenti ves through uniform priors. Unrealized outcomes are sampled from the empirical distrib ution of the imputed realized ones. V ariable n is the # of samples. The estimate of causal ef fects on incentiv es from the empirical method is gi ven by: d ∆( t ) = 1 n X j ˆ ∆ j ( t ) (4) One critical implicit assumption underlying the imputation of non-realized outcomes from re- alized ones, is that collecti ve behavior is someho w “homegeneous” in a mechanism. For example, if 2 out of 10 agents are truthful on av erage, then we expect 4 agents to be truthful out of 20. 3.2 Game-theor etic method: Imputation using a game-theor etic prior T ypical causal inference methods, such as the aforementioned one, are usually criticized that they ignore incentiv es in a multi-agent system (Heckman & Vytlacil, 2005). Howe ver , in this work we sho w that the Rubin causal model can be adapted to address this issue. Before we proceed, we make the follo wing assumption: Assumption 3.1 (Best-response behavior) . Given no prior information, the potential outcomes on agent str ate gies ar e independent for every r ound i.e., Y 1 ( t ) ⊥ ⊥ Y 0 ( t ) , ∀ t . Assumption 3.1 can be thought as a consequence of assuming that agents are best-responding, regardless of beha viors in other mechanisms. This assumption w ould be inv alid in sev eral cases, for example, when agents ha ve dif ferent propensities to be truthful or lie to a mechanism. W e will of fer more discussion later in the paper . 7 Assuming independence, we only need to model π ( Y 0 ( t )) and, similarly , π ( Y 1 ( t )) . Recall that in any mechanism, say M 1 , the expected utility of agent i choosing strate gy Y i ( z , t ) = y assuming fixed beha viors from other agents Y 1 , − i ( t ) is denoted by u i ( y , Y 1 , − i ( t )) . Therefore the e xpected utility benefit from being truthful for agent i is gi ven by ∆ u i ( Y 1 , − i ( t )) = u i (1 , Y 1 , − i ( t )) − u i (0 , Y 1 , − i ( t )) (5) W e adopt a quantal r esponse equilbrium (McK elvey & Palfrey, 1995; Goeree et al., 2003) model in order to construct our game-theoretic prior . In specific, agent i facing a vector of agent strategies, will randomize over the a vailable actions according to a softmax rule that depends on the expected utilities. In specific, for a mechanism M z at round t , agent i facing fixed beha viors from other agents Y z , − i ( t ) , will select to be truthful according to the probability: π ( Y i ( z , t ) = 1 | Y z , − i ( t )) ∝ exp ( β · u i (1 , Y z , − i ( t ))) (6) Hence: π ( Y i ( z , t ) = 1 | Y z , − i ( t )) = 1 1 + exp ( β · ∆ u i ( Y z , − i ( t ))) Quantal response equilbrium is a well-studied model of utility-based agent behavior that has been shown to con ver ge to Nash equilibria under certain mild conditions (McKelv ey & Palfre y, 1995). The choice of parameter β > 0 is critical 3 . If β is high then the agent has a strong preference for actions with better expected utilities. In the extreme case, if β is very high then the agent simply prefers the best action (so adopts a best-response strategy). W e will discuss the choice of β in the experimental section. Note also that there are N functions u i ( · ) and each needs to be e valuated at 2 N − 1 points since the outcome is binary . For large populations this computation is prohibitiv e. T o circumvent this problem we need to mak e the following e xchangeability assumption: Assumption 3.2 (Exchangeability among agents) . Agents ar e exc hangeable so that inferences ar e in variant to permutations of agent labels. The main consequence of the exchangeability assumption is that the expected utility u i ( · ) are the same for all agents. Furthermore, the e xpected utility of an agent depends only on the num- ber of truthful agents he is “competing” against and so ∆ u i ( Y z , − i ( t )) = ∆ u ( P j 6 = i Y j ( z , t )) . In other words, there is only one function ∆ u ( · ) for each agent that needs to be e valuated at N + 1 3 If β < 0 that would be considered irrational since the agent would prefer actions with smaller expected utilities than others. 8 points (having 0 to N truthful agents in total). Inference can now be performed through Gibbs sampling. The implementation is straightforward since strate gy outcomes are binary . In summary , if i participates in mechanism M z , then potential outcome Y i ( z , t ) is sampled according to the follo wing rule: If the outcome is not-realized then only the report outcome R i ( z , t ) is used (this is observed) to sample the strategy . Howe ver , if the outcome is realized, then both the report R i ( z , t ) and the vector of agent beha viors Y z , − i ( t ) is used to sample Y i ( z , t ) . The full procedure is gi ven in Algorithm 2. Initialize ∆ = array of length n For j = 1 , 2 , · · · n Initialize Y ( j ) 0 ( t ) = Y ( j − 1) 0 ( t ) and Y ( j ) 1 ( t ) = Y ( j − 1) 1 ( t ) ∀ i ∈ { 1 , 2 , · · · , N } and z ∈ { 0 , 1 } : π iz = 1 + exp β ∆ u ( P j 6 = i Y j ( z , t )) − 1 p iz = ( L ( R i ( z , t ) | Y ( j ) i ( z , t ) = 1) , if Z i = 1 − z L ( R i ( z , t ) | Y i ( z , t ) = 1) × π iz , if Z i = z Sample Y ( j ) i ( z , t ) = 1 with probability p iz Causal ef fect estimate ˆ ∆ j ( t ) = h ( Y ( j ) 1 ( t ) , Y ( j ) 0 ( t )) Return ∆ Algorithm 2: Causal inference on incenti ves through a g ame-theoretic prior . Inference per - formed though Gibbs sampler . Each sample is index ed by j . V ariable n is the # of samples. 4 A pplication on Kidney exchanges 4.1 Pr eliminaries Kidney exchanges (Roth et al., 2004) enable kidney transplantations when donors are incompatible with recipients. In particular , a pair of a donor and a recipient who are incompatible can exchange a kidney transplant with a pair of donor/recipient, provided that the donor from one pair can donate to the patient of the other . Incompatilibility is determined by two medical tests. The first is one is a blood-type test between the donor and the recipient. The second test is a sensiti vity test which sho ws whether the recipient will accept or reject the kidney transplant from the donor . The statistics of these compatibilities are well studied and we will assume them to be known 4 . T ypically , these exchanges in v olve 2 pairs due to logistical issues, ho we ver it is also common to perform cycle 4 For example, it is kno wn that the probability that a random patient will reject a kidney of a random donor is about 0.11 and this is 3x as high when the recipient is a woman who has been pregnant and the donor is her spouse. 9 exc hanges in which a donor donates to the recipient of another pair , in sequence, until a loop is formed. Multiple regional exchange programs currently operate in the US and the w orld, ho wev er , their expansion has hitherto been hindered by logistical and mechanism inef ficiency issues. In specific, it has been reported that manipulation of centralized kidney exchange markets is possible and is performed by participating hospitals (Ashlagi et al., 2010). W ork in mechanism design has focused on mechanisms that resolve such incentives issues (Ashlagi & Roth, 2011; T oulis & Park es, 2011; Ashlagi et al., 2010). The kidney exchange problem fits the frame work of this paper as follows. First, we assume N hospitals and we assume the existence of two mechanisms M 0 and M 1 . The former , M 0 , can be considered as the mechanism currently in practice whereas M 1 is a new proposed mechanism under test. Agents are hospitals that are randomly assigned to participate in an exchange mech- anism. This e xchange is multi-round (e.g. once per month) as it usually happens in practice. At each round t , each hospital samples a donor/patient pool of fixed size. This pool can be represented by a set of donor/patient pairs such that in each pair the donor is willing to donate to the patient but the y are incompatible. Giv en such set, compatibilities can be determined by medical tests that are assumed common knowledge i.e., hospitals cannot hide compatibilities between pairs, as that would be unethical and easy to uncover . Thus, the sampled type of the hospital θ it is simply the set of donor/recipient pairs that were sampled at round t . At each round, the hospital decides be- tween two strate gies: in the truthful strate gy , Y i ( z , t ) = 1 , the hospital reports R i ( z , t ) = θ it . In the de viating strategy , Y i ( z , t ) = 0 and the hospital performs all possible matches among its own pairs internally , and then reports the remainder R i ( z , t ) = d ( θ it ) . Gi ven a pool of donor/patients θ it , the function d ( . ) is deterministic. Furthermore, as mentioned before, compatibility statistics are well-documented in the medical literature, and so the distributions of θ it and d ( θ it ) ( F and G respecti vely) are assumed kno wn 5 . 4.2 Simulation Setup W e perform simulation of two realistic stylized kidney e xchange models that have been studied in literature. Mechanism M 0 is the baseline mechanism and gi ven a joint pool of hospital reports, computes a random maximum matching over all pairs. Mechanism M 1 applies the r evelation principle along with some more detailed allocations in well-defined subgroups of donor/patient 5 For example, a pair with a donor with blood-type O is one that can possibly perform many exchanges, since O- donors can donate to all blood-types. Therefore, we expect more O-donors under distribution F than distribution G , since when deviating, hospitals are more lik ely to match these “good” pairs internally . 10 pairs 6 . Random graph theory has been lev eraged to sho w that M 0 is vulnerable to de viating hospitals while in M 1 hospitals are better-of f by being truthful (T oulis & P arkes, 2011). As a ground-truth model of agent strategic behavior , we adopt a multi-armed bandit formula- tion. Specifically , we assume that hospitals try to maximize their utility (number of total pairs matched) by using the uniform confidence bound algorithm. This algorithm has been widely used in practice as the simplest and most effecti ve model of dynamic strate gic beha vior with bounded regret (Auer et al., 2002). The algorithm used to generate the dataset is shown in Algorithm 3. For ∀ hospital i, ∀ s ∈ { 0 , 1 } u is = 0 n is = 1 For t = 1 , 2 , · · · T For i = 1 , 2 , · · · N θ it ∼ F , sample the hospital’ s internal pairs s ∗ = arg max s ∈{ 0 , 1 } u is + q 2 log t n is Y i ( z , t ) = s ∗ n is ∗ = n is ∗ + 1 R i ( Z i , t ) = ( θ it , if Y i ( z , t ) = 1 d ( θ it ) , otherwise Causal ef fect estimand ∆( t ) = h ( Y 1 ( t ) , Y 0 ( t )) Return ∆( t ) , R obs 1 ( t ) , R obs 0 ( t ) , ∀ t Algorithm 3: Simulation model of dynamic hospital behavior in kidney e xchanges. V ariable n is keeps track of how man y times hospital i has chosen strategy s (1=truthful). V ariable u is keeps track of the av erage achiev ed utility of agent i by playing strategy s . The output of the simulation of Algorithm 3 are the causal estimand v alues at different rounds t and the observed agent reports (( N / 2 × 1 ) vectors of observed hospital reports for each t ). Our inference goal for both the empirical and the game-theoretic methods will be to estimate the esti- mand v alues ∆( t ) produced by the simulation, gi ven the observ ed reports, for two specific rounds: t 1 = 5 will be round when the data are collected (observe agents’ reports) and t 2 = T = 100 will be considered as the round where the system has reached equilibrium. Figure 1 shows 100 inde- pendent runs of the simulation with the multi-armed bandit dynamic and the confidence bands of the respectiv e estimand v alues ∆( t ) for ev ery round t . Note that the causal ef fects to be estimated 6 Briefly , pairs can be categorized as “under-demanded”, ”over -demanded”, “reciprocal” and “self-demanded”. The compatibility networks within these groups vary significantly . The “under -demanded” cannot be matched to each other and so the subgroup network is isolated. The “reciprocal” subgroup consists of two smaller groups that can be matched to each other and so the network is bipartite. The “self-demanded” is composed of four smaller groups that internally look like complete graphs. These nuisances can be leveraged for the design of better allocation mechanisms than myopic maximum matchings. 11 are time-dependent. F or e xample, at round t = 5 , the difference in incentiv es is around 0.1, specif- ically 0.5 for M 1 and 0 . 4 for M 0 , which means that hospitals in M 1 are 25% more likely to play the truthful strategy ( t ) compared to M 0 at round 5 and under the multi-armed bandit dynamic. For lar ger t this v alue steadily increases until 0 . 44 and then stabilizes (based on additional experi- ments, we belie ve the seemingly linear trend at later time points, is an artifact of our simulation). The complication for causal inference is that unbiased estimates of incenti ves, taken at dif ferent timepoints, would be wildly different. This illustrates that estimation of mechanism effects needs to make the distinction between short-term and long-term . One ke y goal of this paper is to propose a methodology to estimate long-term effects by early experimental data. In our simulation, we assume that data are collected at t = 5 and we are interested in estimating the short term effect ∆(5) and long-term ef fect ∆(100) . Figure 1: Causal estimand at different rounds t a verage over 100 independent runs. The estimand is time-dependent indicating that the system reaches equilibrium ov er time. 4.3 Estimation W e compare between the empirical method that uses uniform priors and our method which is using a game-theoretic prior . The former is straightforw ard and can be implemented by follo wing Algo- rithm 1. The implementation of our method is a bit more in volved as it first requires to compute the payoff functions ∆ u ( · ) as put forth in Equation (5) and under the exchangeability assumption (Assumption 3.2). In our simulation, this means that we ha ve to calculate the payof f matrices for 9 cases. These can be obtained through simulations. The results over 10,000 mechanism simulations are shown in T able 2. For example, in M 0 a truthful hospital will hav e an expected utility of 9 . 66 matches when there are 5 truthful hospitals out of N = 8 . Thus if this hospital were to de viate, it would obtain an e xpected utility of 10 . 69 since the number of truthful hospital would decrease 12 T able 2: P ayoff matrices of M 0 and M 1 . Each cell in the matrix sho ws the expected utility (av erage number of matched pairs) gi ven (i) a mechanism, (ii) number of truthful hospitals in the mechanism and (iii) an agent strategy . The table shows that in M 0 it is better for hospitals to de viate and in M 1 it is better to be truthful. M 0 M 1 expected utility expected utility truthful de viating #truthful truthful deviating #truthful - 9.76 0 - 9.58 0 8.24 10.06 1 9.94 9.61 1 8.71 10.37 2 9.82 9.54 2 9.07 10.49 3 9.91 9.66 3 9.31 10.69 4 9.75 9.68 4 9.66 10.76 5 9.86 9.78 5 9.87 10.91 6 9.83 9.88 6 10.15 11.21 7 9.89 9.85 7 10.30 - 8 9.88 - 8 to 4. T able 2 shows that de viation is a dominant stratefy in M 0 and truthful strategy is dominant strategy in M 1 . Having obtained the payof f matrices, estimation through our model proceeds through the simple Gibbs sampling procedure described in Algorithm 3. 4.4 Results W e conduct two experiments and for each e xperiment we collect agent reports at t = 5 and wish to estimate short-term ef fects at t = 5 and long-term ef fects for T = 100 . The ground truth estimands hav e simulated values ∆(5) = 0 . 1 and ∆(100) = 0 . 44 o ver 500 independent simulation runs. In our first e xperiment, we work on a case in which agent reports are highly informative of the underlying agent beha viors. W e refer to this case as “strong separability” since the posterior distributions Y i ( z , t ) | R i ( z , t ) takes higher values (around 1) when the report is a truthful one and take small values (around 0) when the report is an untruthful one. In this cases the distributions look “separated” as in Figure 2a. Figure 2b shows histograms of the estimates from the empirical method and the game-theoretic method. W e can see that the former estimates ∆(5) at 0.12. V isual inspection of the estimates 13 (a) Conditional distribution of agent strat- egy conditioned on truthful report (light gray) or untruthful report (dark gray). (b) Causal ef fects estimates from the em- pirical method (light gray) and the game- theoretic (GT) method (dark gray). Figure 2: Experiment in which agent reports are informativ e of agent strate gies (strong separabil- ity). Ground truth estimands ha ve simulated v alues ∆(5) = 0 . 1 and ∆(100) = 0 . 44 . sho ws that the method performs well under the informative agent reports (strong separability). This is expected, since the likelihood giv es plenty of information about the missing agent strategies. Ho wev er , while the estimate (under the e xchangeability assumption) is unbiased for the true ef fect ∆(5) at round t = 5 , it is still biased for the long-term ef fect ∆(100) . The game-theoretic method makes a compromise between the two estimands. The estimates are centered around 0.27 and they clearly biased for the ef fect at t = 5 . Ho wever , they are also shrinked tow ards the long-term ef fect ∆(100) which indicates that the payof f matrix is able to capture the dynamic e volution of the system to a certain extent. In our second e xperiment, we work on a case in which agent reports are not informativ e about the underlying agent behaviors. W e refer to this case as “weak separability” since the posterior distributions Y i ( z , t ) | R i ( z , t ) takes v alues around 0.5 for both truthful and untruthful reports so that there is practically no information about strategies gi ven observed agents’ reports. The respecti ve distributions are sho wn in Figure 3a. Figure 3b shows histograms of the estimates from the empirical method and the game-theoretic method for the weak separability case. W e can see that the former estimates ∆(5) at 0.02. In that case, we observe a breakdown of the empirical method since the estimates are centered around zero. This is because the reports are not informati ve (almost random guesses) about the strategies and so the empirical method is using only the information on the prior to make inference about incenti ves. Howe ver , since the empirical method is assuming a uniform prior , the ov erall procedure will deduce no dif ference in incenti ves. In constrast, the game-theoretic method is actually giving 14 (a) Conditional distribution of agent strat- egy conditioned on truthful report (light gray) or untruthful report (dark gray). (b) Causal ef fects estimates from the em- pirical method (light gray) and the game- theoretic (GT) method (dark gray). Figure 3: Experiment in which agent reports are informativ e of agent strate gies (strong separabil- ity). Ground truth estimands ha ve simulated v alues ∆(5) = 0 . 1 and ∆(100) = 0 . 44 . higher estimates for the ov erall difference in incenti ves since it is based only on the equilibrium model of the prior . Thus, the ov erall estimates are e ven higher than before (average around 0.33) as the equilibrium model shrinks the estimates to wards long-term effects. 5 Discussion The ev aluation of mechanisms is critical in numerous socioeconomic problems. Ho wev er, this is technically challenging because multi-agent systems are dynamic by nature and estimation should be performed with respect to an equilbrium state of the system. Furthermore, in estimating ef fects of incentives, there are additional challenges as agent strate gies, which are the main potential outcomes of interest, are typically ne ver observed. For the former , we use agent reports and further distributional assumptions to obtain likelihoods of strategies gi ven observ ed reports. For the latter , we propose a prior on agent strategies that is based on a quantal response equilibrium model. In a simulated study , this was sho wn to shrink towards long-term effects, thus offering impro ved inference ov er methods that don’t consider such priors. There are multiple ways that this work could be further improv ed. First, the empirical method we described is by no w means the only way that could be used to perform causal inference. A fully Bayesian model would also be a good choice. Ho wev er , this work hints that any model that is 15 ignoring the game-theoretic aspect of the potential outcomes (agent behaviors) will be inadequate to capture the time dependence of the causal estimand. Second, the choice of the β parameter in the game-theoretic prior is crucial and ho w it was set, was not sufficiently justified. In practice, β was set based on a heuristic calculation and experimental results. Future work would be benefited by a more principled way to set such hyperparameters. Third, we of fered limited discussion of our assumptions (best-response and exchangeability). Se veral violations of these assumptions yield more realistic and particularly interesting situations. For example, in case of substitution ef fects i.e., cases where agents can switch between mechanisms (e.g. assume that a mechanism is a mode of transportation), it is no longer possible to model the two potential outcome vectors indepen- dently . Last but not least, agent interactions (e.g. information sharing, communication, collusion) will require more sophisticated models. Refer ences A S H L A G I , I . , F I S C H E R , F., K A S H , I . & P R O C AC C I A , A . D . (2010). Mix and match. In Pr oceed- ings of the 11th A CM confer ence on Electr onic commerce . A CM. A S H L A G I , I . & R O T H , A . (2011). Indi vidual rationality and participation in large scale, multi- hospital kidne y e xchange. In Pr oceedings of the 12th ACM confer ence on Electr onic commer ce . A CM. A T H E Y , S . & N E K I P E L O V , D . (2010). A structural model of sponsored search advertising auctions. In Sixth Ad Auctions W orkshop . A U E R , P . , C E S A - B I A N C H I , N . & F I S C H E R , P . (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning 47 , 235–256. B O T T O U , L . , P E T E R S , J . , Q U I ˜ N O N E R O - C A N D E L A , J . , C H A R L E S , D . X . , C H I C K E R I N G , D . M . , P O RT U G UA LY , E . , R AY , D . , S I M A R D , P . & S N E L S O N , E . (2012). Couterfactual reasoning and learning systems. arXiv pr eprint arXiv:1209.2355 . C L A R K E , E . H . (1971). Multipart pricing of public goods. Public choice 11 , 17–33. G O E R E E , J . K . , H O L T , C . A . & P A L F R E Y , T . R . (2003). Risk av erse behavior in generalized matching pennies games. Games and Economic Behavior 45 , 97–113. G R OV E S , T . (1973). Incentiv es in teams. Econometrica: Journal of the Econometric Society , 617–631. 16 H E C K M A N , J . J . & V Y T L A C I L , E . (2005). Structural equations, treatment ef fects, and economet- ric policy e v aluation1. Econometrica 73 , 669–738. L U B I N , B . & P A R K E S , D . C . (2009). Quantifying the strategyproofness of mechanisms via met- rics on payoff distributions. In Pr oceedings of the T wenty-F ifth Confer ence on Uncertainty in Artificial Intelligence . A U AI Press. M C K E LV E Y , R . D . & P A L F R E Y , T . R . (1995). Quantal response equilibria for normal form games. Games and economic behavior 10 , 6–38. O S T R OV S K Y , M . & S C H W A R Z , M . (2011). Reserve prices in internet adv ertising auctions: A field experiment. In Pr oceedings of the 12th A CM confer ence on Electr onic commer ce . A CM. R O T H , A . E . , S ¨ O N M E Z , T. & ¨ U N V E R , M . U . (2004). Kidney exchange. The Quarterly Journal of Economics 119 , 457–488. R U B I N , D . B . (1974). Estimating causal ef fects of treatments in randomized and nonrandomized studies. Journal of educational Psyc hology 66 , 688. R U B I N , D . B . (1978). Bayesian inference for causal ef fects: The role of randomization. The Annals of Statistics , 34–58. T O U L I S , P . & P A R K E S , D . C . (2011). A random graph model of kidne y exchanges: efficienc y , indi vidual-rationality and incentives. In Pr oceedings of the 12th ACM confer ence on Electr onic commer ce . A CM. V I C K R E Y , W . (1961). Counterspeculation, auctions, and competiti ve sealed tenders. The J ournal of finance 16 , 8–37. P A N O S T O U L I S , Department of Statistics, Harvard Uni versity E-mail addr ess: ptoulis@fas.harvard.edu D A V I D C . P A R K E S , SEAS, Harvard Uni versity E-mail addr ess: parkes@eecs.harvard.edu 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment