Randomization and The Pernicious Effects of Limited Budgets on Auction Experiments

Randomization and The P ernicious Eﬀects of Limited Budgets on Auction Exp erimen ts Guillaume W. Basse † Hossein Azari Souﬁani ‡ , Diane Lam b ert ∗ ∗ Guillaume Basse is a graduate studen t in the Departmen t of Statistics at Harv ard Universit y ( gbasse@fas.harv ard.edu ) and his w ork w as supported by a Go ogle F ello wship in Statistics for North Amer- ica. Hossein Azari Souﬁani is a researc h scien tist at Go ogle Research ( azari@google.com ). Diane Lambert is a retired research scientist ( exdlam bert@gmail.com ). This work was completed at Go ogle Research. The authors thank Go ogle’s Max Lin and George Levitte for their help, as w ell as Go ogle NYC’s statistics group for early feedback. W e also thank three anon ymous review ers for their constructiv e feedbac k. Abstract Buy ers (e.g., advertisers) often hav e limited ﬁnancial and pro cessing resources, and so their participation in auctions is throttled. Changes to auctions ma y aﬀect bids or throttling and an y c hange m a y aﬀect what winners pay . This pap er shows that if an A/B exp eriment aﬀects only bids, then the observed treatmen t eﬀect is unbiased when all the bidders in an auction are randomly assigned to A or B but it can b e severely biased otherwise, even in the absence of throttling. Exp eriments that aﬀect throttling algorithms can also be badly biased, but the bias can b e substantially reduced if the budget for each advertiser in the exp eriment is allo cated to separate p ots for the A and B arms of the exp eriment. Keyw ords : Causal inference; Auctions; Experiments. Con ten ts 1 INTR ODUCTION 1 2 CA USAL MODELS F OR A UCTIONS 2 2.1 P oten tial Bids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 P oten tial paymen ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Quota throttling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Bid treatmen ts and throttling treatmen ts . . . . . . . . . . . . . . . . . . . . . . . . 4 2.5 Eﬀects on rev enue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 RANDOMIZA TION 5 4 TW O EXAMPLES OF BIAS 6 4.1 Iden tical but independent bidders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2 T reatmen t dominates con trol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 BIAS WITH QUOT A THR OTTLING 8 5.1 Join t and split throttling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.2 Bid treatmen ts and join t throttling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3 Quota treatmen t and split throttling . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6 SIMULA TIONS 10 6.1 Bid treatmen ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 Quota treatmen ts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.3 Extension to larger samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7 CONCLUSION 13 1 INTR ODUCTION A search query generates a request for ads to show with search results. A user’s visit to a webpage generates a request to an ad exchange for page ads. In either case, adv ertisers are chosen to participate in an auction that c ho oses the ad to b e sho wn. Adv ertisers often cannot pa y for or process all auction requests they are eligible for, so the requests passed on to them are throttled to meet their quota constrain ts. Auction parameters like reserv e prices and throttling sc hemes can aﬀect an adv ertiser’s pa yments and an exc hange’s rev enue, so A/B exp eriments are run to test ideas for impro ving outcomes for adv ertisers and the ad exc hange. Lucking-Reiley ( 1999 ) and Eina v et al. ( 2011 ) exp erimen ted with auction formats, Reiley ( 2006 ) and Ostro vsky and Sc hw arz ( 2011 ) experi- men ted with reserv e prices, and Ausubel et al. ( 2013 ) exp erimen ted with budget constrain ts. Usually , A/B exp eriments are analyzed by assuming that the tw o exp eriment arms, tra- ditionally called treatment and control, are indep endent. But, as Blak e and Co ey ( 2014 ) explains, indep endence fails when the demands of the treatmen t and con trol arms aﬀect eac h other. Suc h interfer enc e is unav oidable when some adv ertisers in an auction are in treatmen t and some in control. Kohavi et al. ( 2009 ) recognizes that some randomization schemes can giv e misleading treatment estimates for auction exp eriments. F or more insigh t in to in terfer- ence in other applications see Halloran and Struc hiner ( 1995 ), Hudgens and Halloran ( 2012 ), Rosen baum ( 2007 ), Tc hetgen and V anderW eele ( 2010 ), Arono w and Samii ( 2013 ), Ec kles et al. ( 2014 ), Athey et al. ( 2015 ) and Airoldi et al. ( 2012 ). Optimal exp erimen t design in the presence of in terference has b een explored b y Da vid and Kempton ( 1996 ), Ec kles et al. ( 2014 ) and W alk er and Muchnik ( 2014 ). This paper uses the framework of causal models and in terference to shed light on auction exp erimen ts. Section 2 introduces the main elements of our mo del: 1. p oten tial outcomes that describe what each advertiser w ould bid if assigned to treat- men t or if assigned to control, 2. throttling algorithms that determine which of the advertisers eligible to resp ond to a request for an ad are called to its auction, 3. bid tr e atments that aﬀect what adv ertisers bid and thr ottling tr e atments that aﬀect when they are called to bid, and 4. eﬀects on the total daily pa ymen t of an adv ertiser or the total daily rev en ue of the ad exc hange. Section 3 deﬁnes tw o randomization sc hemes for auction exp erimen ts. In query r andomiza- tion each request for an ad is randomized to treatmen t or control, so all participan ts in an auction are in treatmen t or all are in con trol. In (query, advertiser) r andomization , each adv ertiser that is eligible for a query is indep enden tly assigned to treatment or con trol, so treatmen t bidders and control bidders can comp ete in the same auction. With either kind of 1 randomization, an adv ertiser can be assigned to treatmen t for some queries and to con trol for others. (Query , adv ertiser) randomization is undesirable b ecause it in tro duces in terference, but it ma y b e unav oidable if only some adv ertisers are included in an exp eriment, p erhaps to av oid rev ealing a p ossible c hange in auction algorithms b efore it has prov en useful. The remainder of the pap er explores bias and v ariance of estimated eﬀects for bid and throttling treatmen ts under query and (query , adv ertiser) randomization. T o establish the ideas, Section 4 shows what can go wrong when treatmen t and control bidders comp ete in the same auction. Section 5 introduces budget (pro cessing or ﬁnancial) throttling. In Split quota exp eriments eac h advertiser has separate budgets for treatmen t and con trol queries. In joint quota exp eriments the adv ertiser dra ws against the same budget for all its queries. Sim ulations in Section 6 suggest that estimated treatment eﬀects can b e sev erely biased under (query , advertiser) randomization regardless of whether the budget is split or joint for b oth bid and throttling exp eriments. Estimated treatmen t eﬀects for throttling experiments are biased for b oth query and (query , adv ertiser) randomization, but the bias is muc h smaller for split quota than for joint quota exp eriments. 2 CA USAL MODELS F OR A UCTIONS This section casts auction exp erimen ts as causal mo dels in whic h eac h adv ertiser has t wo p oten tial bids for eac h query: the bid that w ould b e made if the adv ertiser were assigned to treatmen t for that query and the bid if assigned to con trol. Of course, only one p otential bid can b e observed. This section p oin ts out further subtleties that result from advertiser comp etition and quota throttling. 2.1 P oten tial Bids T o start, supp ose there is no throttling, so advertisers bid in all auctions for whic h they are eligible. The raw data for a sample of auctions is then a set of vectors ( q , a, B , Y ), where q denotes a query , a an adv ertiser, B the advertiser’s bid, and Y the adv ertiser’s paymen t. W e consider only auctions like ﬁrst and second price auctions in whic h the pa yment is p ositiv e if the adv ertiser wins the auction and zero otherwise. Deﬁne N q to b e the n um b er of unique queries and N the total n umber of (query , adv ertiser) pairs, t ypically considered o ver the course of one day . In an exp erimen t, a (query , advertiser) pair is assigned to either treatmen t or con trol. Let Z = ( Z 1 , . . . , Z N ) b e the treatmen t assignmen t vector, where Z i = 1 if (query , advertiser) pair i is assigned to treatment, and Z i = 0 if assigned to con trol. F ollo wing the p oten tial outcomes framew ork of Rubin ( 1990 ), each eligible advertiser for a query has a bid and pa yment that will b e observ ed if the pair is assigned to control and a possibly diﬀeren t bid and pa yment that will b e observ ed if assigned to treatmen t. That is, the p otential bids for the N (query , adv ertiser) pairs are B ( Z ) = ( B 1 ( Z ) , . . . , B N ( Z )) and the p oten tial paymen ts are Y ( Z ) = ( Y 1 ( Z ) , . . . , Y N ( Z )). 2 An advertiser do es not know whic h other adv ertisers are eligible for a query so its bid is indep enden t of all other bids for the query , whic h implies that the following assumption giv en in Rubin ( 1980 ) holds. Assumption 1 (Stable Unit T reatmen t V alue (SUTV A)) . Ther e is only one version of the tr e atment and ther e is no interfer enc e in the p otential bids for eligible advertisers in tr e atment and c ontr ol. Because SUTV A holds for bids, the p otential bids for all query , eligible advertiser pairs under treatment B ( ~ 1) and under control B ( ~ 0) can b e considered separately . 2.2 P oten tial pa ymen ts If bidder i’s paymen t is p ositive, then an y other bidder in the auction pays zero, so a treat- men t that aﬀects what adv ertisers bid can aﬀect the pa ymen t for b oth treatmen t and con trol adv ertisers. F or example, supp ose there are three adv ertisers in an auction with the p oten tial bids given in the following table and the winner pays what it bid. adv ertiser B(1) B(0) b 1 5 2 b 2 4 4 b 3 3 1 If all three adv ertisers participate in the auction and advertiser 2 is the only one assigned to treatmen t, then it wins and pa ys Y 2 ((0 , 1 , 0)) = 4. How ev er, if adv ertiser 1 is also assigned to treatment and participates in the auction then adv ertiser 2 loses and pays nothing. This is an example of interference: the pa ymen t of an advertiser is aﬀected by the assignmen t of other advertisers in the auction to treatment and con trol. Hence, SUTV A do es not hold for pa yments. W e make the follo wing assumption ab out paymen ts. Assumption 2. The p otential p ayment of an advertiser eligible for a query dep ends only on its assignment to tr e atment or c ontr ol and the assignments of the other eligible advertisers for the query. That is Y i ( Z ) = Y i ( Z q [ i ] ) (1) wher e Z q [ i ] is the subset of the assignment ve ctor Z on (query, advertiser) p airs for which the query is q [ i ] . W e extend this assumption to auctions with quota throttling in Section 2.3. The in terference structure allo wed b y Assumption 2 is similar to that giv en in Rosen baum ( 2007 ) and Hudgens and Halloran ( 2012 ) and can b e seen as a sp ecial t yp e of eﬀe ctive tr e atment ( Manski ( 2013 )) or exp osur e mapping ( Arono w and Samii ( 2013 ); Eckles et al. ( 2014 )). How ever, those pap ers do not consider quota throttling. 3 2.3 Quota throttling Adv ertisers are generally quota c onstr aine d , meaning that they hav e insuﬃcient budget or infrastructure to pro cess all the queries they could b e sent ( Chakrab orty et al. ( 2010 )). In that case, some queries are dropp ed or thr ottle d to meet the quota constrain ts. That is, there is a v ector W ( Z ) = ( W 1 ( Z ) , . . . , W N ( Z )) with W i ( Z ) = 0 if (query , advertiser) pair i is dropp ed and W i ( Z ) = 1 otherwise. Note that throttling can dep end on the assignmen t Z to treatment and con trol. W e assume random dropping according to a thr ottling distribution p ( W ( Z ) | Z ). In the presence of throttling, Assumption 2 means that the p otential paymen t of an advertiser dep ends only on its assignmen t to treatment or control, and on the assignmen t of the adv ertisers for the same query who were not throttled. Henceforth, adv ertisers who participate (bid) in auctions are called bidders. Because ho w muc h a bidder in an auction pays depends on the other bidders in the auction, the p otential pa ymen t for an y advertiser a is random when at least one other adv ertiser for the query is quota constrained, even if adv ertiser a is unconstrained. 2.4 Bid treatmen ts and throttling treatmen ts Lo osely sp eaking, an exp erimen t about bids is designed to test whether a c hange to bidding rules, such as a change to the reserve or “ﬂoor” price for auctions, matters when throttling rules are unchanged. (See Reiley ( 2006 ) and Ostro vsky and Sch w arz ( 2011 ) for examples.) Deﬁnition 1 (Bid T reatment) . Under a bid treatmen t , for al l Z , the thr ottling distribution W satisﬁes W ( Z ) | Z ∼ ( W ( Z ) | Z = ~ 1) ∼ ( W ( Z ) | Z = ~ 0) . (2) In w ords, a bid treatment aﬀects only potential bids, so a giv en (query , advertiser) for an eligible adv ertiser has the same probabilit y of b eing dropp ed regardless of ho w other eligible bidders are assigned to treatment and con trol. That condition holds if queries are throttled b efore the advertiser’s bid is known. An experiment ab out throttling is designed to understand whether changing the rules for meeting quota constraints matters if bidding parameters like reserve prices are unc hanged. (See the selective callout algorithm in Chakrab orty et al. ( 2010 ).) Deﬁnition 2 (Throttling T reatmen t) . Under a throttling treatmen t , for al l Z , the p otential bids satisfy B ( Z ) = B ( ~ 0) = B ( ~ 1) . (3) In a throttling exp erimen t, an y eligible adv ertiser w ould bid the same amoun t, whether it is throttled or not. This is true if adv ertisers do not rev eal their bids b efore throttling. 4 2.5 Eﬀects on rev en ue If no adv ertiser is quota constrained, then no adv ertiser is dropp ed from a query and the eﬀect τ of the treatmen t on the total reven ue of the ad exc hange compares the rev en ue when ev ery (query , adv ertiser) pair is treated to the rev en ue when every (query , adv ertiser) pair is in control: τ = N X i ( Y i ( ~ 1) − Y i ( ~ 0)) (4) where Y i is the pa yment of (query , advertiser) pair i . If there are quota constrain ts, then τ is aﬀected by random dropping so the eﬀect on total reven ue is τ ∗ = E ( τ ), taking the exp ectation under the dropping scheme p ( W ). Note that the sum in this case is tak en ov er the N (query , eligible adv ertiser) pairs. The eﬀect τ a of treatmen t on the rev enue generated b y a given adv ertiser a when there are no quota constraints is given b y τ a = X i : a [ i ]= a ( Y i ( ~ 1) − Y i ( ~ 0)) . (5) Again, with random throttling the eﬀect of interest is τ ∗ a = E ( τ a ), taking the exp ectation under the dropping sc heme p ( W ). Note that τ a considers the eﬀect of treating all eligible adv ertisers on the reven ue generated only by advertiser a , rather than the eﬀect of treating only the queries for adv ertiser a . 3 RANDOMIZA TION With query r andomization , each query is randomized to treatment or control and then all the eligible adv ertisers for the query are assigned to con trol if the query is assigned to con trol or to treatmen t if the query is assigned to treatment. F ormally , an auction exp eriment is query-randomized if 1. P ( Z i = 1) = p i and P ( Z i = 0) = 1 − p i , and 2. Z i = 1 implies that ~ Z q [ i ] = ~ 1 q [ i ] . With (query, advertiser) randomization, each (query , advertiser) pair is randomized in- dep enden tly to treatmen t or con trol. Auctions no w may hav e a mix of treated and control bidders, whic h is not representativ e of the b ehavior of future auctions. How ev er, (query , adv ertiser) randomization may b e necessary if only some adv ertisers are allo wed to b e in the exp erimen t, and hence man y of the auctions with treated adv ertisers will also ha v e un treated adv ertisers. Both query randomization and (query , advertiser) randomization happ en b efore an y form of throttling tak es place. 5 Our parameters of in terest are total diﬀerences o v er a da y , not a mean p er-query diﬀer- ence. Here the total under treatmen t is estimated by in versely w eighting each observ ation in treatment b y its probabilit y of o ccurrence, and the total under the con trol is estimated b y inv ersely w eigh ting each observ ation in control b y its probabilit y of o ccurrence. ˆ τ ( Z ) = X i : Z i =1 Y i ( Z ) p i − X i : Z i =0 Y i ( Z ) 1 − p i ! (6) ˆ τ a ( Z ) =   X i : a [ i ]= a,Z i =1 Y i ( Z ) p i − X i : a [ i ]= a,Z i =0 Y i ( Z ) 1 − p i   . (7) The estimators ˆ τ and ˆ τ a w ould b e unbiased for τ and τ a resp ectiv ely , if the SUTV A as- sumption ( 2.1 ) held for pa ymen ts Y ( Z ), but SUTV A do es not hold for pa yments. Nonethe- less, Section 5 shows that ˆ τ and ˆ τ a are unbiased under query randomization. 4 TW O EXAMPLES OF BIAS T o illustrate the issues, tw o to y examples show that the estimated treatmen t eﬀect for an exp erimen t with just one auction can b e severely biased. 4.1 Iden tical but indep enden t bidders Supp ose K bidders participate in a ﬁrst price auction, and they all hav e the same bid under treatmen t and the same bid under control. That is, B i (1) = R 1 , i = 1 . . . K B i (0) = R 0 , i = 1 . . . K R 1 > R 0 . Also supp ose each adv ertiser was indep endently assigned to treatmen t with probabilit y 1 2 ; this is an example of (query , adv ertiser) randomization. The goal is to estimate the eﬀect of treatment on this auction alone: τ = P K i =1 ( Y i ( ~ 1) − Y i ( ~ 0)). How ties are decided do es not 6 matter. The exp ected v alue of the estimator deﬁned in ( 6 ) is E ( ˆ τ ) = 1 2 K ( ˆ τ ( ~ 1) + ˆ τ ( ~ 0)) + 1 2 K X Z 6 = { ~ 1 , ~ 0 } ˆ τ ( Z ) = 1 2 K ( R 1 − R 0 ) + 1 2 K X Z 6 = { ~ 1 , ~ 0 } R 1 = 1 2 K τ + 1 2 K (2 K − 2) R 1 = 1 2 K τ + (1 − 1 2 K − 1 ) R 1 where all 2 K p ossible assignmen ts are equiprobable. The bias of ˆ τ is Bias( ˆ τ , τ ) = ( 1 2 K − 1) τ + (1 − 1 2 K − 1 ) R 1 . (8) In tuitively , what happens is that the estimator ˆ τ takes the v alue R 1 for ev ery assignmen t Z , except for Z = ~ 0 where it tak es the v alue − R 0 . Th us, as the n um b er K of bidders gro ws, the bias approac hes R 0 , so the bias b ecomes as large as the con trol bid as the size of the auction gro ws. F or example, with K = 4 bidders, treatmen t bids of R 1 = 6 and con trol bids of R 0 = 5, the true eﬀect is τ = 1 while equation ( 8 ) sho ws that the bias is ab out 4.3. The same result w ould hold for second price auctions in this scenario. The bias would not v anish if w e only allo w ed randomizations with giv en n umbers N 0 and N 1 of con trol and treated bidders resp ectively . Indeed, for any such randomization sc heme with 0 < N 0 , N 1 < K , the bias w ould b e exactly R 0 for an y K . Ho w ever, as will b e sho wn in Section 5 , the estimator is unbiased under query randomization. 4.2 T reatmen t dominates con trol Supp ose K bidders participate in an auction and each bidder under treatmen t bids more than ev ery bidder under con trol. If we lab el the bidders according to their bids under treatmen t, then B i (1) > B j (0) for all i, j B i (1) > B j (1) if i < j Then (see the supplemen tary material) the bias of the estimated treatment eﬀect is b ounded below b y τ ( 1 2 K − 1) + A K , where A K approac hes B K (1) as the num b er K of eligible bidders grows. Thus, the bias grows at least as large as max i ( B i (0)) − ( B 1 (1) − B K (1)). The limiting bias is esp ecially large if the smallest and largest bids under treatment are close, ev en if K is small. When K = 4, ( B 1 (0) , B 2 (0) , B 3 (0) , B 4 (0)) = (4 , 4 . 25 , 4 . 50 , 4 . 75) and ( B 1 (1) , B 2 (1) , B 3 (1) , B 4 (1)) = (6 , 5 . 50 , 5 . 25 , 5), the true eﬀect is τ = 1 . 25, and the exact bias is 3 . 8, which is ab out three times as large as the eﬀect itself. 7 5 BIAS WITH QUOT A THR OTTLING 5.1 Join t and split throttling Let N q [ a ] b e the total n um b er of queries that advertiser a is eligible for and let Q [ a ] b e the quota or num b er of queries that advertiser a can pro cess. If N q [ a ] > Q [ a ], then the queries for advertiser a m ust b e throttled so some are randomly dropp ed. An exp eriment can use adv ertiser a ’s quota for b oth treatment and control, or it can split the quota into t wo pieces, using one piece to service the a ’s queries assigned to treatment and the other piece to service a ’s queries assigned to con trol. W e do not consider mixed exp erimen ts in whic h some advertisers are assigned to join t throttling and others to split throttling. Deﬁnition 3. Under join t throttling , X i : a [ i ]= a W i ( Z ) ≤ Q [ a ] (9) for al l assignments Z to tr e atment and al l advertisers a . Deﬁnition 4. Under split throttling , P i : a [ i ]= a I ( Z i = 1) W i ( Z ) ≤ Q (1) [ a ] , P i : a [ i ]= a I ( Z i = 0) W i ( Z ) ≤ Q (0) [ a ] , and Q (1) [ a ] + Q (0) [ a ] = Q [ a ] for al l assignment Z to tr e atment and every advertiser a , wher e Q (1) [ a ] and Q (0) [ a ] denote the quotas in tr e atment and c ontr ol r esp e ctively. 5.2 Bid treatmen ts and join t throttling Supp ose that the ad exchange alwa ys fulﬁlls eac h adv ertiser’s quota entirely and that the n umber of queries that could b e sent to adv ertiser a is more than it can pro cess, so N q [ a ] > Q q [ a ]. Then joint throttling implies that X i : a [ i ]= a W i ( Z ) = Q [ a ] (10) for all advertisers a and all assignmen ts Z to treatment. F urther supp ose that throttling is random, so an advertiser is randomly dropp ed from queries to meet its quota constrain ts. Then for all vectors w satisfying ( 10 ), P ( W ( Z ) = w | Z ) =  N [ a ] Q [ a ]  − 1 for all Z . (11) 8 Under these assumptions, the follo wing theorem is pro v ed in the supplemen tary material. Theorem 5.1. With joint thr ottling and query r andomization, the estimator ˆ τ is unbiase d for τ ∗ for any bid tr e atment when every advertiser is quota c onstr aine d and ther e ar e suﬃcient queries for every advertiser’s budget quota to b e fulﬁl le d. Unsurprisingly , the same result holds for query randomization when all auctions are un- constrained, that is, when no advertiser is quota throttled. The corresp onding theorem is stated and pro ved in the supplemen tary material. In summary , query randomization leads to unbiased estimates for bid treatments b oth with and without throttling. 5.3 Quota treatmen t and split throttling A quota exp eriment is designed to test whether a change to the algorithm for dropping adv ertisers from queries to satisfy quota constrain ts aﬀects reven ue. Supp ose that the c hange is in addition to the standard throttling algorithm (the control case) and that it is applied b efore the standard throttling algorithm is applied. F or example, some feature of the query , suc h as the user’s lo cale, could be used to drop queries, reducing the need to randomly drop queries without regard to their v alue to the advertiser. T o b e sp eciﬁc, the treatmen t drops the (query , eligible adv ertiser) pair i b efore the control throttling algorithm is applied when a binary random v ariable x [ i ] is zero and the con trol throttling algorithm alone is applied if x [ i ] is one. Then the treatment throttling enforces the constraint: ( Z i = 1) ∩ ( x i = 0) ⇒ W i ( Z ) = 0 . (12) The sim ulations in Section 6 sho w that the estimated eﬀect of the quota treatment on rev enue is biased under join t throttling for b oth query randomized and (query , advertiser) randomized exp erimen ts. The question is what happ ens if separate budgets are main tained for treatment and con trol for each adv ertiser (i.e., under split throttling)? Theorem 5.2 , whic h is prov ed in the supplementary material, shows that the estimated eﬀect can b e un biased under split-quota throttling under a set of conditions. Unfortunately , the conditions are unlikely to hold in practice. T o state Theorem 5.2 , let N (1) a ( x = 1)( Z ) b e the num b er of eligible queries under treat- men t for advertiser a when x = 1, and Z = 1. Deﬁne N (0) a ( x = 1)( Z ) analogously . Also, deﬁne N ( x = 1) to b e the total num b er of (query , eligible advertiser) pairs for which x = 1. Theorem 5.2. L et Z b e the assignments of (query, eligible advertiser) p airs al lowe d by query r andomization. If x i = 0 implies that the bid B i for advertiser i is 0, Q (0) a N (0) a ( Z ) = Q (0) a + Q (1) a N a for al l advertisers a ∈ A and assignments of (query, bidder) p airs Z ∈ Z to tr e atment and c ontr ol, and 9 Q (1) a N (1) a ( x =1)( Z ) = Q (1) a + Q (0) a N a ( x =1) for al l a ∈ B and Z ∈ Z then the estimator ˆ τ is unbiase d for τ ∗ under query r andomization with split thr ottling. 6 SIMULA TIONS This section rep orts the results of simulating the bias and v ariance of rev en ue estimators under quota throttling for query randomized exp eriments and (query , adv ertiser) randomized exp erimen ts. The sim ulated query randomized exp erimen ts are b alanc e d in the sense that they hav e the same num b er of queries in treatmen t and control. Similarly , the sim ulated (query , advertiser) randomized exp eriments balance the num b er of (query , advertiser) pairs assigned to treatment and con trol. Without such balance, eﬀect estimates for total reven ue w ould b e m uc h more v ariable and that additional v ariability w ould obscure diﬀerences in the randomization and quota sharing schemes for reasonably sized simulations. There are tw o main conclusions. First, when estimating bid tr e atment eﬀects in con- strained auctions, the estimated eﬀects under balanced query randomization are not only un biased (as pro ved in Theorem 5.1), but also hav e smaller v ariance than those obtained with balanced (query , adv ertiser) randomization. Second, even if the conditions of Theo- rem 5.2 are violated, query randomization combined with split throttling can dramatically reduce the v ariance of the estimated quota tr e atment eﬀect, compared to the v ariance under join t throttling. Eac h sim ulated exp erimen t considers auctions with three eligible adv ertisers. These adver- tisers all comp ete in N q = 90, 120 or 150 auctions unless they are throttled. Their quota limits are either N q / 3 or 2 N q / 3. F or bid treatmen ts, potential bids are Lognormal( µ 0 = 1 , v = 0 . 1) under control and Lognormal( µ 1 , v = 0 . 1) under treatmen t where µ 1 is either 1.05, 1.1, or 2. All the treatment bids in a sim ulation are drawn from the same distribution, and all the control bids are dra wn from the same distribution. The con trol and treatment bids for a (query , adv ertiser) pair are correlated, as describ ed in the supplementary material. The p oten tial bids deﬁne the true treatment eﬀect on total reven ue. F or eac h distribution of p otential bids and quota limit, w e generated 20,000 random query exp erimen ts with exactly half the N q queries in each exp eriment assigned to treatmen t and half to con trol. W e also generated 20,000 random (query , adv ertiser) pair exp erimen ts, with half the pairs in treatment and half in con trol. The bid treatment eﬀect or quota treatmen t eﬀect, depending on the nature of the simulation, w as estimated in eac h experiment. Because the bias dep ends on the treatmen t eﬀect for a lognormal, w e rep ort sim ulated bias relativ e to the true eﬀect. 10 6.1 Bid treatmen ts Figure 1 describ es the sim ulated relativ e bias (ratio of the bias to the true eﬀect), where the ro ws corresp ond to the diﬀerent quota settings and the columns to the diﬀerent mean log treatmen t bids. The standard errors rep orted are computed o v er the 200 draws from the bid distribution and are divided b y √ 200 to reﬂect uncertaint y ab out the simulated mean bias. 1.05 1.1 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 1 3 N q 2 3 N q 100 120 140 100 120 140 100 120 140 Number of auctions in exchange Bias relative to the true eff ect Randomization ● ● (Query , Advertiser) Query Figure 1: Sim ulated bias of the eﬀect of a bid treatment relative to the true eﬀect under balanced query randomization and balanced (query , advertiser) randomization. The dots sho w the mean relativ e bias, and the endp oints sho w the mean ± t wice its sim ulated standard error. Columns sho w the mean log bid under treatment and the rows show the throttling rate. Figure 1 shows that the relative bias is nearly zero for randomized query exp erimen ts, while the relativ e bias has a mean as high as 1.5 for randomized (query , advertiser) pair exp erimen ts. That is, the p enalt y for allo wing control and treated adv ertisers to comp ete in the same auction is a 50% increase in relative bias. Moreov er, the sim ulated v ariance of the bid treatmen t estimate under the random query exp erimen t is no more than its v ariance under the random (query , adv ertiser) exp erimen t (see Figure 2 .) The ratio of simulated v ariances for (query , adv ertiser) randomization versus query randomization is ab ov e one for all bid and throttling com binations considered here, and as high as 6 when the treatmen t bids are 5% higher than the control bids, and the quota is around 66%. Note that the n umber of auctions in the exp erimen t has little eﬀect on the relative bias or v ariance. 6.2 Quota treatmen ts Because a quota treatment do es not aﬀect bids, the p otential bids with a quota treatmen t are the same under treatment and con trol. Here B i (0) = B i (1) ∼ Lognormal( µ 0 = 1 , v = 0 . 1). F or eac h random query or random (query , adv ertiser) pair, we draw a co v ariate x i ∼ 11 1.05 1.1 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 2 4 6 1 3 N q 2 3 N q 100 120 140 100 120 140 100 120 140 Number of auctions in exchange V ar(quer y ,advertiser) / V ar(quer y) Figure 2: Ratio of the v ariance of estimated treatment eﬀects under (query , adv ertiser) randomization to the v ariance of the estimated treatmen t eﬀects under balanced query ran- domization for diﬀerent lognormal bid distributions and throttling rates. The dots show the mean ratios and the endp oints show the mean ± twice its simulated standard error. The horizon tal dotted line lies at one. B er noul l i ( p x ), where p x = 0 . 1 or p x = 0 . 5 or p x = 0 . 9 in diﬀerent simulations. Figure 3 sho ws the relativ e bias of the estimate of total reven ue under balanced query randomization with join t throttling and with split throttling. The results for the v ariance are sho wn in the supplemen tary material. Clearly , relativ e bias is close to zero for split throttling (ev en if the conditions of theorem 5.2 are violated), whereas it is around − 1 for joint throttling. This means that the estimator under join t throttling estimates 0 regardless of the true eﬀect on total reven ue! So the eﬀect on total reven ue of c hanging the throttling mec hanism can nev er b e estimated from an experiment that uses joint throttling to meet quota constrain ts. The supplemen tary material sho ws that this conclusion about joint quota also holds for (query , adv ertiser) randomization. 6.3 Extension to larger samples Our simulations generate only a few auctions relativ e to the millions of auctions that might participate in a real exp eriment every da y . Figures 1 and 3 hin t that the relative biases are close to constant for larger n um b ers of auctions, while Figure 2 hints that the ratio of the v ariances is constant, or even slightly increasing as the num b er of auctions in an exp erimen t increases. T ogether, these ﬁgures suggest that the v ariance of the estimator of the eﬀect of a treatmen t on total reven ue increases faster under balanced (query , adv ertiser) randomization than under balanced query randomization. W e ran additional sim ulations (see supplementary material) that give further evidence for these trends. 12 90 120 150 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1.00 −0.75 −0.50 −0.25 0.00 −1.00 −0.75 −0.50 −0.25 0.00 high low 10 50 90 10 50 90 10 50 90 Expected propor tion of x=0 (%) Bias relative to the true eff ect Throttling ● ● joint split Figure 3: Bias of join t quota throttling and split quota throttling estimators relative to the true eﬀect under query randomization. The dots are the mean relativ e bias and the segmen ts sho w tw o sim ulated standard errors around that mean. 7 CONCLUSION Ev ery da y , companies lik e Go ogle, F acebo ok and Y aho o run billions of auctions in ad ex- c hanges, and ev ery day they exp eriment to improv e the exc hange. This pap er sho ws how casting exp erimen ts in the framew ork of p oten tial outcomes clariﬁes many practical issues, suc h as the consequences of the c hoice of randomization. Bias has been emphasized through- out b ecause when it is large it makes exp erimen ts misleading. As a general p olicy , query randomization should b e preferred ov er (query , advertiser) randomization when exp erimenting with bid treatments b ecause it allows an un biased es- timator of the true treatment eﬀect, without exceeding the v ariance of the same estimator under (query , advertiser) randomization. Split-quotas are preferable to join t quotas when exp erimen ting with throttling treatmen ts b ecause split quotas ha ve reduced bias and RMSE in simulations. Admittedly , w e do not hav e a complete solution to the problems that arise in practice, suc h as the fact that advertisers are often provided information ab out the user, such as lo cation, that the advertiser may use to decide whether to bid or the bid amount. Exp eriments that tak e accoun t of suc h cov ariates migh t be b etter analyzed with statistical mo dels rather than with the simple estimators proposed here. Nor do w e ha ve analytical results for the v ariance of the estimators or formal pro cedures for h yp othesis testing. These are all p ossible directions for future work. 13 References E. M. Airoldi, Panos T oulis, Edward Kao, and D. B. Rubin. Causal estimation of p eer inﬂuence eﬀects. In NIPS Workshop on So cial Network and So cial Me dia A nalysis , 2012. P eter M Aronow and Cyrus Samii. Estimating a verage causal eﬀects under interference b et w een units. arXiv pr eprint arXiv:1305.6156 , 2013. Susan A they , Dean Eckles, and Guido W Imbens. Exact p-v alues for net work in terference. T echnical rep ort, National Bureau of Economic Research, 2015. La wrence M Ausub el, Justin E Burk ett, and Emel Filiz-Ozba y . An exp eriment on auctions with endogenous budget constrain ts. Available at SSRN 2331419 , 2013. Thomas Blak e and Dominic Co ey . Wh y mark etplace exp erimen tation is harder than it seems: The role of test-con trol interference. In Pr o c e e dings of the ﬁfte enth A CM c onfer enc e on Ec onomics and c omputation , pages 567–582. ACM, 2014. T anmoy Chakrab orty , Ey al Ev en-Dar, Sudipto Guha, Yisha y Mansour, and S Muth ukrish- nan. Selectiv e call out and real time bidding. In Internet and Network Ec onomics , pages 145–157. Springer, 2010. Olivier Da vid and Rob A Kempton. Designs for interference. Biometrics , pages 597–606, 1996. Dean Eckles, Brian Karrer, and Johan Ugander. Design and analysis of exp eriments in net works: Reducing bias from interference. arXiv pr eprint arXiv:1404.7530 , 2014. Liran Eina v, Theresa Kuc hler, Jonathan D Levin, and Neel Sundaresan. Learning from seller exp erimen ts in online mark ets. T ec hnical rep ort, National Bureau of Economic Research, 2011. M Elizab eth Halloran and Claudio J Struc hiner. Causal inference in infectious diseases. Epidemiolo gy , pages 142–151, 1995. Mic hael G Hudgens and M Elizabeth Halloran. T ow ard causal inference with in terference. Journal of the A meric an Statistic al Asso ciation , 2012. Ron Koha vi, Roger Longb otham, Dan Sommerﬁeld, and Randal M Henne. Con trolled ex- p erimen ts on the web: surv ey and practical guide. Data mining and know le dge disc overy , 18(1):140–181, 2009. Da vid Luc king-Reiley . Using ﬁeld exp eriments to test equiv alence b etw een auction formats: Magic on the internet. A meric an Ec onomic R eview , pages 1063–1080, 1999. Charles F Manski. Identiﬁcation of treatmen t resp onse with social in teractions. The Ec ono- metrics Journal , 16(1):S1–S23, 2013. 14 Mic hael Ostro vsky and Mic hael Sc hw arz. Reserve prices in in ternet adv ertising auctions: A ﬁeld exp erimen t. In Pr o c e e dings of the 12th A CM c onfer enc e on Ele ctr onic c ommer c e , pages 59–60. ACM, 2011. Da vid H Reiley . Field exp eriments on the eﬀects of reserv e prices in auctions: More magic on the internet. RAND Journal of Ec onomics , pages 195–211, 2006. P aul R Rosenbaum. Interference b etw een units in randomized exp erimen ts. Journal of the A meric an Statistic al Asso ciation , 102(477), 2007. D. B. Rubin. Comment: Neyman (1923) and causal inference in exp eriments and observ a- tional studies. Statistic al Scienc e , 5(4):472–480, 1990. Donald B Rubin. Comment. Journal of the Americ an Statistic al Asso ciation , 75(371):591– 593, 1980. Eric J Tchetgen Tc hetgen and Tyler J V anderW eele. On causal inference in the presence of in terference. Statistic al Metho ds in Me dic al R ese ar ch , page 0962280210386779, 2010. Da vid W alk er and Lev Muc hnik. Design of randomized experiments in net works. Pr o c e e dings of the IEEE , 102(12):1940–1951, 2014. 15

Randomization and The Pernicious Effects of Limited Budgets on Auction Experiments

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment