Learning to Clear the Market
The problem of market clearing is to set a price for an item such that quantity demanded equals quantity supplied. In this work, we cast the problem of predicting clearing prices into a learning framework and use the resulting models to perform reven…
Authors: Weiran Shen, Sebastien Lahaie, Renato Paes Leme
Learning to Clear the Mark et W eiran Shen ∗ 1 , S ´ ebastien Lahaie 2 , and Renato P aes Leme 2 1 Tsingh ua Universit y , Beijing, China 2 Go ogle Researc h, New Y ork, New Y ork, USA June 25, 2019 Abstract The problem of mark et clearing is to set a price for an item such that quan tity demanded equals quantit y supplied. In this work, we cast the problem of predicting clearing prices into a learning framework and use the resulting mo dels to p erform rev enue optimization in auctions and markets with contextual information. The economic intuition b ehind market clear- ing allo ws us to obtain fine-grained control ov er the aggressiveness of the resulting pricing p olicy , grounded in theory . T o ev aluate our approac h, we fit a mo del of clearing prices ov er a massiv e dataset of bids in display ad auctions from a ma jor ad exc hange. The learned prices outp erform other mo deling techniques in the literature in terms of reven ue and efficiency trade-offs. Because of the conv ex nature of the clearing loss function, the con vergence rate of our metho d is as fast as linear regression. 1 In tro duction A k ey difficulty in designing machine learning systems for reven ue optimization in auctions and markets is the discontin uous nature of the problem. Consider the basic problem of setting a reserve price in a single-item auction (e.g., for online adv ertising): rev en ue steadily increases with price up to the p oin t where all buy ers drop out, at which p oin t it suddenly drops to zero. The discontin uity ma y a verage a wa y ov er a large mark et, but one is typically left with a highly non-con vex ob jective. W e are interested in obtaining pricing p olicies for rev enue optimization in a data-ric h (i.e., contextual) environmen t, where eac h pro duct is asso ciated with a set of features. F or example, in online display advertising, a pro duct is an ad impression (an ad placemen t view ed b y the user) whic h is annotated with features lik e geo-information, device type, co okies, etc. There are tw o main approaches to reserve pricing in this domain: one is to divide the feature space into well ∗ Corresponding author. emersonswr@gmail.com 1 defined clusters and apply a traditional (non-contextual) reven ue optimization algorithm in eac h cluster [ 18 , 9 , 20 , 19 ]. This is effectively a semi-parametric approac h with the drawbac k that an ov erly fine clustering leads to data sparsity and inability to learn across clusters. An o verly coarse clustering, on the other hand, do es not fully take adv an tage of the ric h features av ailable. T o o vercome these difficulties, a natural alternative is to fit a parametric pricing policy by optimizing a loss function. The first instinct is to use rev enue itself as a loss function, but this loss is notoriously difficult to optimize b ecause it is discontin uous, non-con vex, and has zero gradient ov er muc h of its domain—so one must look to surrogates. Medina and Mohri [14] prop ose a contin uous surrogate loss for rev enue whose gradient information is ric h enough to optimize for prices. The loss is nev ertheless non-conv ex so optimizing it relies on tec hniques from constrained DC-programming, which ha ve pro v able conv ergence but limited scalabilit y in high-dimensional con texts. Main con tribution. The main innov ation in this paper is to address the rev enue optimization problem by instead lo oking to the closely related problem of mark et clearing: how to set prices so that demand equals supply . The loss function for mark et clearing exhibits sev eral nice prop erties from a learning p erspective, notably conv exity . The mark et clearing ob jective dates back to the economic theory of mark et equilibrium [ 2 ], and more recently arises in the literature on iterative auctions [ 4 , 11 , 3 ]. T o our knowledge, our work is the first to use it as a loss function in a machine learning con text. The economic insight b ehind the mark et-clearing loss function allows us to adapt its shap e to control how conserv ativ e or aggressive the resulting prices are in extracting reven ue. T o increase price levels, we can artificially increase demand or limit supply , which connects reven ue optimization theorems from computational economics [ 5 , 21 ] to regularization techniques under our loss function. W e b egin b y casting the problem of market clearing as a learning problem. Giv en a dataset where each record corresp onds to an item characterized by a feature v ector, together with buyer bids and seller asks for the item, the goal of the pricing p olicy is to quote a price that balances supply and demand; with a single seller, this simply means predicting a price in b et ween the highest- and second-highest bids, which in tuitively improv es ov er the baseline of no reserve pricing. This offers us a general framew ork for price optimization in contextual settings, but the ob jectiv e function of market clearing is still disconnected from rev enue optimization. Rev enue is the aggregate price paid b y buy ers, while mark et clearing is linked to the problem of optimizing efficiency (realized v alue). Efficiency can b e measured as so cial welfare (the total v alue of the allo cated items), or more coarsely via the matc h rate (the num b er of cleared transactions). The platform faces a tension b et ween trying to extract as muc h reven ue as p ossible from buyers, while also lea ving them enough surplus to discourage a mo ve to comp eting platforms. T o better understand the trade-off betw een rev en ue and efficiency , w e consider the linear programming duality b et ween allo cation and pricing and observ e 2 that a natural parameter that trades-off rev enue for efficiency is the av ailable supply . Artificially limiting supply (or increasing demand) allo ws one to control the aggressiveness of the resulting clearing prices output by the mo del. This fundamen tal idea has b een used multiple times more recently in algorithmic game theory to design approximately reven ue-optimal auctions [ 12 , 9 , 21 , 10 ]. T ranslating this intuition to our setting, a simple mo dification of the primal (allo cation) linear program has the effect of restricting the supply . In the dual (pricing) linear program, this is equiv alent to adding a regularization to the mark et-clearing ob jective function. The fo cus of this pap er is empirical. As our main application, we use this metho dology to optimize reserve prices in display adv ertising auctions. W e demonstrate the efficacy of the market c l earing loss for reserve pricing b y experimentally comparing it with other strategies on a real-w orld data set. Coupled with the exp erimen tal ev aluation, we establish some theoretical guaran tees on match rate and efficiency for the optimal pricing p olicy under clearing loss. The theory provides guidance on how to set the regularization parameters and we inv estigate how this translates to the desired trade-offs exp erimen tally . Exp erimen tal results. W e ev aluate our metho d against a linear-regression based approac h on a dataset consisting of o ver 200M auction records from a ma jor displa y advertising exchange. The features are represented as 84K-dimensional sparse v ectors and contain information such as the website on which the ad will b e display ed, device and browser type, and country of origin. As b enc hmarks we consider standard linear regression on either the highest or second-highest bid, and models fit using the surrogate reven ue loss prop osed by Medina and Mohri [14] . W e find that our metho d P areto-dominates the b enc hmarks in terms of the trade-off b et ween reven ue and match rate or so cial w elfare. F or example, for the b est reven ue obtained from regression approac h, we can obtain a pricing function with at least the same reven ue but 5% higher so cial welfare and 10% higher match rate. W e also find that the conv ergence rate of fitting mo dels under our loss function is as fast as a standard linear regression. In comparison, the surrogate loss of Medina and Mohri [14] has m uch slow er conv ergence due to its non-con vexit y . Related work. There is a large b ody of literature on learning algorithms for optimizing reven ue, how ever, most of the literature deals with the non- con textual setting. Cole and Roughgarden [8] , Morgenstern and Roughgarden [17 , 16] , P aes Leme et al. [19] study the batc h-learning non-contextual problem. Roughgarden and W ang [20] study the non-con textual problem both in the online and batc h learning settings. Cesa-Bianchi et al. [6] study it as a non-con textual online learning problem. Finally , there has b een a lot of recen t interest in the con textual online learning version [ 1 , 7 , 13 ], but those ideas are not applicable to the batch-learning setting. Closest to our work are Medina and Mohri [14] and Medina and V assilvitskii [15] , who also study con textual reserve price optimization in a batch-learning setting. Medina and Mohri [14] pro ves generalization b ounds, defines a surrogate loss as a con tin uous appro ximation to the rev enue loss, and proposes an algorithm 3 with pro v able con vergence based on DC programming. The algorithm, how ever, requires solving a conv ex program in each iteration. Medina and V assilvitskii [15] prop ose a clustering based approach, which inv olves the following steps: learning a least-square predictor of the bid, clustering the feature space based on the linear predictor, and optimizing the reserve using a non-contextual metho d in each cluster. 2 Mark et Clearing Loss This section introduces our mo del, pro ceeding from the general to the sp ecific. W e first explain the duality b et ween allo cation and pricing, which motiv ates the form of the loss function to fit clearing prices, and provides useful economic insigh ts into how the input data defines its shap e. W e next define the formal problem of learning a clearing price function in an environmen t with several buy ers and sellers. W e then sp ecialize to a single-item, second-price auction (m ultiple buyers, single seller). Allo cation and Pricing W e consider a mark et with n buy ers and m sellers who aim to trade quan tities of an item (e.g., a sto c k or commo dit y) among themselves. Each buyer i is defined b y a pair ( b i , µ i ) where b i ∈ R + is a bid price and µ i ∈ R + is a quan tity . The in terpretation is that the buy er is willing to buy up to µ i units of the item at a price of at most b i p er unit. Similarly , eac h seller j is defined by a pair ( c j , λ j ) where c j ∈ R + is an ask price and λ j ∈ R + is the quantit y of item the seller can supply . The ask price can b e viewed as a cost of pro duction, or as an outside offer av ailable to the seller, so that the seller will decline to sell item units for an y price less than its ask. The allo cation problem asso ciated with the market is to determine quantities of the item supplied b y the sellers, and consumed b y the buy ers, so as to maximize the gains fr om tr ade —v alue consumed minus cost of pro duction. F ormally , let x i b e the quantit y b ough t by buyer i and y j the quantit y sold b y seller j . The optimal gains from trade are captured by the (linear) optimization problem: max 0 ≤ x i ≤ µ i , 0 ≤ y j ≤ λ j n X i =1 b i x i − m X j =1 c j y j s.t. n X i =1 x i = m X j =1 y j (1) The optimization is straightforw ard to solve: the highest bid is matched with the low est ask, and the tw o agents trade as m uch as p ossible b et ween eac h other. The pro cess rep eats until the highest bid falls b elo w the low est ask. The purp ose of the linear programming formulation is to consider its dual, whic h corresp onds 4 to a pricing problem: min p n X i =1 µ i ( b i − p ) + + m X j =1 λ j ( p − c j ) + (2) where ( · ) + denotes max {· , 0 } . The optimal dual solution corresp onds to a price that balances demand and supply , which is the cen tral concept in this pap er. Definition 1. A pric e p ∗ is a clearing price if, for any optimal solution ( x ∗ , y ∗ ) to the al lo c ation pr oblem, we have x ∗ i ∈ arg max x i ∈ [0 ,µ i ] x i ( b i − p ) y ∗ j ∈ arg max y j ∈ [0 ,λ j ] y j ( p − c j ) for e ach buyer i and sel ler j , In words, a clearing price balances supply and demand by ensuring that, at an optimal allo cation, each buy er buys a quantit y that maximizes its utility (v alue minus price), and similarly eac h seller sells a quantit y that maximizes its profit (price minus cost). In the curren t simple setup with a single item, buy er i will buy µ i units if b i > p , zero units if b i < p , and is indifferent to the num b er of units b ough t at p = b i ; similarly for eac h seller j . How ev er, the concept of clearing prices—where each agent maximizes its utility at the optimal allo cation—generalizes to muc h more complex allo cation problems with multiple differen tiated items and nonlinear v aluations ov er bundles of items [4]. The fact that a clearing price exists, and can b e obtained by solving (2), follo ws from standard LP duality . The complemen tary slackness conditions relating optimal primal solution ( x ∗ , y ∗ ) to optimal dual solution p ∗ amoun t to the conditions of Definition 1. The optimal solution p ∗ to the dual corresp onds to a Lagrange multiplier for constrain t (1) which equates demand and supply . Learning F orm ulation T o cast mark et clearing in a learning context, we consider a generic feature space Z with the lab el space T = R n + × R m + consisting of bid and ask vectors ( b , c ). F or the sake of simplicity , w e develop our framework assuming that the num b er of buyers and sellers remains fixed (at n and m ), and that the item quantit y that each agent demands or supplies ( µ i or λ j ) is also fixed. This information is straigh tforward to incorp orate into the lab el space if needed, and our results can b e adapted accordingly . The ob jectiv e is to fit a price predictor (also called a pricing p olicy) p : Z → R to a training set of data { ( z k , b k , c k ) } dra wn from Z × T , to achiev e go o d prediction p erformance on separate test data drawn from the same distribution as the training data. As a concrete example, the training data could consist of bids and asks for a sto c k on a financial exchange throughout time, and the features migh t b e recen t 5 price clearing loss 5 10 15 20 0 2 4 6 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● e xtra buy er e xtra seller original Figure 1: Effect on the shap e of the clearing loss when adding a buyer or a seller. economic data on the company , time of da y or week, etc. The clearing problem here is equiv alen t to predicting a price within each datap oin t’s bid-ask spread giv en the features. As another example, the data could consist of bids for ad impressions on a display ad exchange, and the features might b e contextual information ab out the website (e.g., topic) and user (e.g., whether she is on mobile or desktop). The clearing problem there reduces to predicting a price b et ween the highest and second-highest bids. Based on our developmen ts so far, the correct loss function to fit clearing prices is given by (2), which w e call the cle aring loss : ` c ( p, z , b , c ) = n X i =1 µ i ( b i − p ( z )) + + m X j =1 λ j ( p ( z ) − c j ) + Figure 1 illustrates the shap e of the clearing loss (in green) under an instance with buyers ($1 , 1) , ($4 , 1) , ($5 , 2) and sellers ($2 , 1) , ($3 , 1). Note that although the first buyer’s bid of $1 lies b elo w any of the sellers’ costs, it still con tributes to the shap e of the loss. Here any price b et ween $4 and $5 is a clearing price. If we add an extra buy er ($6 , 1), the loss curve tilts to the righ t (in blue) and the unique clearing price b ecomes $5; since there is more demand, the clearing price increases. If we instead add an extra seller ($2 , 2), the curve tilts to the left (in pink) and the clearing price decreases; now any price b et ween $3 and $4 is a clearing price. This example hints at a w ay to control the aggressiveness of the price function p fit to the data, by artificially adjusting demand or supply . Ov er a training set of data { ( z k , b k , c k ) } , mo del fitting consists of computing a pricing p olicy p that minimizes the ov erall loss P k ` c ( p, z k , b k , c k ). Under a limited num b er of con texts z k , it may b e p ossible to directly compute optimal 6 clearing prices, or even reven ue-maximizing reserve prices, based on the bid distributions in each context [ 8 , 18 ]. But this kind of nonparametric approach quic kly runs into difficulties when there is a large num b er of contexts or even con tinuous features, where issues of data sparsity and discretization arise. Our form ulation allows one to imp ose some structure on the pricing p olicy (e.g., a linear mo del or neural net) whenever this aids with generalization. F rom a learning p erspective, clearing loss has several attractive prop erties. It is a piece-wise linear conv ex function of the price, where the kink locations are determined by the bids and asks. The magnitude of its deriv ativ es dep ends only on the buyer and seller quan tities, which makes it robust to any outliers in the bids or asks. By its deriv ation via LP dualit y , its optimal v alue equals the optimal gains from trade, which are easy to compute. This gives a reference p oin t to quantify how w ell a price function fits any given dataset. Reserv e Pricing As a practical application of the clearing loss, we consider the problem of reserve pricing in a single-item, second-price auction. In this setting ev ery buyer demands a single unit ( µ i = 1), and there is a single seller ( m = 1) with cost c . The seller also has unit supply , but we still parametrize its quan tity by λ to allo w some con trol on the shap e of the loss. W e write b (1) and b (2) to denote the highest and second-highest bids, resp ec- tiv ely . In a single-item second-price auction, the item is allo cated to the highest bidder as long as b (1) ≥ c , and is charged ¯ c ≡ max { b (2) , c } . Second-price auctions are extremely common and until now hav e b een the dominant format for selling displa y ads online through ad exc hanges, among countless other applications. It is common in second-price auctions for the seller to set a r eserve pric e , a minim um price that the winning bidder is charged. The cost c is itself a reserv e price, but the seller may choose to increase this to some price p in an attempt to extract more reven ue, at the risk of leaving the item unsold if it turns out that b (1) < p . Reven ue as a function of p can be negated to define a loss, which w e denote ` r : − ` r ( p, z , b , c ) = max { p ( z ) , ¯ c } if max { p ( z ) , c } ≤ b (1) c otherwise Ho wev er, this loss is notoriously difficult to optimize directly , b ecause it is non- con vex and ev en discon tinuous, and its gradien t is 0 except ov er a p ossibly narro w range b et ween the highest and second-highest bids. Clearing loss represents a promising alternativ e for reserve pricing b ecause an y price b et ween ¯ c and b (1) is a clearing price, so a correct clearing price prediction should intuitiv ely improv e o ver the baseline of c . The clearing loss in the auction setting takes the form: ` c ( p, z , b , c ) = n X i =1 ( b i − p ( z )) + + λ ( p ( z ) − c ) + (3) In practical applications of reserve pricing it is often desirable to ac hieve some degree of control ov er the match r ate —the fraction of auctions where the item is 7 sold—and the closely related metric of so cial welfar e —the aggregate v alue of the items sold, where v alue is captured by the winning bid b (1) . F ormally , these concepts are defined as follows, where the notation J · K is 1 if its predicate is true and 0 otherwise. Definition 2. On a single data p oint, the match rate at pric e p is MR ( p ) = J b (1) ≥ max { p, c } K and the so cial welfare is SW ( p ) = b (1) J b (1) ≥ max { p, c } K . As with the rev enue ob jectiv e, matc h rate and social w elfare are discon tinuous and their gradien ts are almost ev erywhere 0, so they are not directly suitable for mo del fitting via conv ex optimization (i.e., one has to lo ok to surrogates). Note that the clearing loss (3) effectively contains a term that approximately regularizes according to match rate. The seller’s term ( p − c ) + can b e view ed as a hinge-t yp e surrogate for matc h rate, since an y setting of p ab o ve c risks impacting matc h rate. Increasing λ impro ves match rate, in line with the earlier economic in tuition that increasing seller supply λ shifts the clearing price down wards. Symmetrically , λ can b e decreased within the range [0 , 1] (the loss remains con vex in this range), which is equiv alent to increasing each buyer’s demand to µ = 1 /λ . According to the economic in tuition, this shifts the clearing price up wards at the exp ense of match rate. The fact that the relev ant range and units of the regularization weigh t λ are understo o d is v ery conv enient in practice. In the next section, we deriv e a quantitativ e link b et ween λ and match rate. 3 Theoretical Guaran tees In this section w e prov e approximation guarantees on the matc h rate and efficiency p erformance of mo dels fit using the clearing loss. The results of this analysis will pro vide guidelines for setting the regularization parameters for fine-grained con trol of the matc h rate. W e b egin by characterizing the optimal pricing p olicy under clearing loss when there is no restriction on the p olicy structure, assuming that bids and costs are drawn indep enden tly (but not necessarily identically). Prop osition 3. If c onditione d on e ach fe atur e ve ctor z the bid and c ost distri- butions ar e given by b i ∼ F z i and c j ∼ G z j , then the pricing p olicy that optimizes cle aring loss is the solution to X i µ i (1 − F z i ( p ( z ))) = X j λ j G z j ( p ( z )) , which is the p olicy that b alanc es exp e cte d supply and demand. Pr o of. W e can write the exp ectation of the market clearing loss function as follo ws: E [ ` c ( p )] = n X i =1 µ i Z ∞ p ( b i − p ) d F z i ( b i ) + m X j =1 λ j Z p 0 ( p − c j ) d G z j ( c j ) . 8 T aking the deriv ative with resp ect to p and setting it to zero leads to the result in the statement: 0 = d d p E [ ` c ( p )] = − n X i =1 µ i (1 − F z i ( p )) + m X j =1 λ j G z j ( p ) . W e now consider the single-item auction setting where m = 1 and µ i = 1 for all buyers. F or simplicity , also assume that c = 0, which implies G j ( p ) = 1 for all p . In that case w e can b ound the matc h rate by a simple formula. Prop osition 4. In the setup with a single sel ler with λ supply and c ost c = 0 , and indep endent buyer distributions, the exp e cte d match r ate under the optimal cle aring pric e p olicy is at le ast 1 − e − λ . Pr o of. A transaction clears if there is at least one buy er with v aluation ab o ve the price p whic h happ ens with probabilit y 1 − Q n i =1 F z i ( p ). Since the optimal p olicy p is the solution of P n i =1 (1 − F z i ( p )) = λ b y the previous prop osition, we can b ound the match rate as follows: E [ MR ] = 1 − n Y i =1 F z i ( p ) ≥ 1 − " 1 n n X i =1 F z i ( p ) # n = 1 − 1 − λ n n ≥ 1 − e − λ where the first inequalit y follows from the arithmetic-geometric mean inequality . The preceding proposition provides a useful guideline on how to set the regularization parameter λ to achiev e a certain target match rate. W e can also obtain a similar b ound for so cial welfare: Corollary 5. In the setting of the pr evious pr op osition, the so cial welfar e E [ SW ] = E [ b (1) · J b (1) ≥ p K ] obtaine d by the optimal cle aring pric e p olicy is at le ast 1 − e − λ of the optimal so cial welfar e, obtaine d by setting no r eserves. Pr o of. This follows from the fact that E [ b (1) · J b (1) ≥ p K ] ≥ E [ b (1) ] · P [ b (1) ≥ p ] ≥ (1 − e − λ ) · E [ b (1) ]. Another in teresting corollary is that when buy ers are i.i.d., fitting a clearing price is equiv alent to fitting a certain quantile of the common bid distribution. Corollary 6. In the setup of the pr evious pr op osition with i.i.d. buyers, the optimal cle aring pric e p olicy is to set the pric e at p ( z ) = F − 1 (1 − λ/n ) wher e F = F z i . 9 This result makes explicit how v arying λ in the clearing loss tunes the aggressiv eness of the resulting price function, by moving up or do wn the quantiles of the bid distribution. In particular, it’s p ossible to span all quantiles using λ ∈ [0 , + ∞ ]. Fitting clearing prices is not exactly equiv alent to quantile regression, since the relev an t quan tile dep ends on the num b er of buy ers, whic h is a prop ert y of the data and not fixed in adv ance. 4 Empirical Ev aluation In this section we ev aluate our approach of using predicted clearing prices as a reserv e pricing p olicy in second-price auctions. W e collected a dataset of auction records by sampling a fraction of the logs from Go ogle’s Ad Exchange o ver tw o consecutive days in January 2019. Our sample contains ov er 100M records for each day . In display adv ertising, online publishers (e.g., websites lik e n ytimes.com) can choose to request an ad from an exchange when a user visits a page on their site. The exchange runs a second-price auction (the most common auction format) among eligible advertisers, p ossibly with a reserve price. W e clip bid vectors to the 5 highest bids. As the publisher cost c , we use a reserv e price av ailable in the data which is meant to capture the opp ortunit y cost 1 of not showing ads from other sources b esides the exchange, in line with our mo del. 2 Reserv e prices are only relev an t conditional on the top bid exceeding the publisher cost, so the auction records were filtered to satisfy this condition. When rep orting our results this means that the baseline match rate without an y reserv e pricing is 100%, so w e will refer to it as r elative match rate in our plots to emphasize this fact. All the mo dels we ev aluate 3 are linear mo dels of the price p as a function of features z of the auction records. The only difference b et ween the mo dels is the loss function used to fit each one, to fo cus on the impact of the choice of loss function. The features we used included: publisher id, device type (mobile, desktop, tablet), OS type (e.g., Android or iOS), country , and format (video or displa y). F or sparse features like publisher id we used a one-hot enco ding for the most common ids and an ‘other’ buck et for ids in the tail. The mo dels were all fit using T ensorFlo w with the default Adam optimizer and minibatches of size 512 distributed o ver 20 machines. An iteration corresp onds to one minibatc h up date in each machine, therefore 20 × 512 data p oin ts. The mo dels were all 1 A common alternativ e source of display ads b esides exchanges are reserv ation contracts, which are advertiser-publisher agreements to show a fixed volume of ads for a time p eriod. If the contract is not fulfilled, this comes at a penalty to the publisher. 2 W e also excluded additional sources of reserv e prices from the dataset: (a) reserve prices configured by publishers reflecting business ob jectives like av oiding channel conflict (i.e., protecting the v alue of inv entory sold through other means) and (b) automated reserve prices set by the exchange. 3 W e ev aluate the mo dels by simulating the effect of the new reserves on a test dataset. The simulation do es not tak e into accoun t p ossible strategic resp onses on the part of buyers. How ever, since the auction format is a second price auction, it is a dominant strategy for the buyers to bid truthfully . 10 trained ov er at least 400K iterations, although for some mo dels conv ergence o ccurred m uch earlier. Besides the clearing loss used to fit our mo del, we considered several other losses as b enc hmarks: • Least-squares regression on the highest bid b (1) . • Least-squares regression on the 2nd-highest bid b (2) . • A reven ue surrogate loss function prop osed by Medina and Mohri [14] as a con tinuous alternative to the pure reven ue loss ` r men tioned previously: − ` γ ( p, z , b , c ) = max { p ( z ) , ¯ c } if p ( x ) ≤ b 1 c if p ( x ) > (1 + γ ) b 1 ((1 + γ ) b 1 − p ( x )) /γ otherwise The loss has a free parameter γ > 0 whic h can b e tuned to contr ol the appro ximation to ` r . Although this loss is contin uous, it is still non-conv ex. In our exp erimen ts w e tried a range of γ ∈ { 0 . 25 , 0 . 5 , 0 . 75 , 1 } . Below we rep ort on the setting γ = 0 . 75 whic h gav e the b est reven ue p erformance. F or each loss function we added the match-rate regularization λ ( p − c ) + , and w e v aried λ to span a range of realized matc h rates. Recall that this regularization is already implicit in the clearing loss, where λ can b e construed as the item quan tity s u pplied by the seller. W e used non-negative λ to ensure that conv exit y is preserved if the original loss is itself conv ex. W e used the first day of data as the training set and the second day as the test set. The p erformance was very similar on b oth for all fitted mo dels, which is exp ected due to the v olume of data and the generalization prop erties of this learning problem [17]. W e rep ort results ov er the test set b elo w. Rev en ue P erformance W e first consider the reven ue p erformance of the different losses as it trades off against match rate and buyer welfare. Figure 2 plots the ratio of realized rev enue with learned reserves against the realized match rate (top). Both axes are normalized b y the reven ue and match rate of the second price auction using only the seller’s cost as reserves. Each p oin t represen ts a pair of reven ue and matc h rate or w elfare ac hieved at a certain setting λ . The most immediate observ ation is that the curve traced out b y the clearing loss Pareto dominates the p erformance of the b enc hmark loss functions, in the sense that for any fixed matc h rate, the clearing loss’ reven ue p erformance lies higher than the others. The b est reven ue performance is a 20% improv emen t achiev ed by the clearing loss at λ = 0 . 25 with a match rate of 30%. W e also plot in the figure the rev enue against w elfare (b ottom). W e again normalize each axis by the rev enue and welfare of the auction that uses only seller’s costs (which achiev es the optimal so cial w elfare). F or the sake of clarity the range of the x-axis has b een clipp ed. The Pareto dominance here is ev en 11 relative match r ate rev enue ratio 0.95 1.00 1.05 1.10 1.15 1.20 0.2 0.4 0.6 0.8 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● clearing loss second−bid regression top−bid regression surrogate loss ● ● ● ● welf are ratio rev enue ratio 0.95 1.00 1.05 1.10 1.15 1.20 0.90 0.92 0.94 0.96 0.98 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● clearing loss second−bid regression top−bid regression surrogate loss ● ● ● ● Figure 2: T rade-off b et w een reven ue improv ement and decrease in match rate (top) or buyer welfare (b ottom). Eac h p oin t represents the p erformance of the fitted mo del under a loss function for a fixed regularization level. more pronounced, and it’s also striking to note that clearing loss can achiev e rev enue improv ement s of ov er 10% with less than 2% impact on buyer welfare. Another interesting asp ect of Figure 2 is the range of match rates spanned b y the differen t losses. Recall that, under the assumptions and results of Prop osition 4, v arying λ from 0 to large v alues should allow the clearing loss to span the full range of match rates in (0 , 1), and this is b orne out by the plot. F or the regressions on b (1) and b (2) , there is a hard flo or on the matc h rate that they can ac hieve with λ = 0, resp ectiv ely at 0.38 and 0.67. Another kind 12 of regularization term would b e needed to push these further down ward and reac h more aggressive prices. Matc h rate for the surrogate loss w as particularly sensitiv e to regularization. Over a range of λ spanning from 0 to 1, only λ = 0 and λ = 0 . 1 yielded match rates below 1, at 0.47 and 0.92 resp ectiv ely . Con trolling Matc h Rate In practice setting the right regularization w eight λ to achiev e a target match rate is usually pro cess of trial and error, ev en to determine the relev an t range to insp ect, and this was the case for all the b enc hmark losses. F or the clearing loss, ho wev er, Prop osition 4 gives a link b et ween match rate and λ whic h can serv e as a guide. Sp ecifically , the result prescrib es λ = log ( 1 1 − MR ) to achiev e a matc h rate of MR . target match rate realized match rate 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● mobile desktop tablet ● ● ● Figure 3: Realized match rate against target match rate under the mo del fit with the clearing loss, broken do wn b y device type. The vertical line denotes the parameter setting λ = 1 with a target match rate of 1 − 1 /e ≈ 0 . 63. Figure 3 plots the target match rate implied by the settings of λ that w e used, according to this form ula, against realized matc h rates. The vertical line sho ws the reference p oin t of λ = 1, which is the “default” form of the clearing loss without artificially increasing or limiting supply , with an asso ciated match rate 1 − 1 /e ≈ 0 . 63. The realized match rate tracks the target fairly well but not perfectly . A p ossible reason for the discrepancy is that the assumption of i.i.d. bidders that the formula relies on may not hold in practice. Another p ossible reason is that the linear mo del may not b e expressive enough to fit the optimal price level within eac h feature context z . Interestingly , the target matc h rate from Prop osition 4 tracks not only the ov erall match-rate but also segmen t-sp ecific match rate. In Figure 3, we break down the match rates by device type and find that they are very consistent across devices. 13 Figure 4: Conv ergence rate of the mo del under different loss function, in mini- batc h iterations. W e plot the v alue of eac h loss across iterations normalized by its v alue up on conv ergence. Con v ergence Rate W e next consider the conv ergence rates of mo del-fitting under the v arious loss functions, plotted in Figure 4. Con vergence rates for the clearing loss and the regression losses are very comparable. The main difference b et ween the curves has to do with initialization. Initial prices tended to be high under our random initialization scheme, which is more fav orable to regression on the highest bid. All mo dels hav e con verged by 100K iterations. Since square loss is ideal from an optimization persp ectiv e, these results imply that mo dels with clearing loss can b e fit v ery quickly and conv enien tly in practice, in a matter of hours ov er large displa y ad datasets. In Figure 5 we compare the con vergence of the clearing loss with the surrogate loss. Conv ergence is muc h slow er under surrogate loss. This was exp ected, as the loss is nonconv ex and it has ranges with 0 gradient where the Adam optimizer (or any of the other standard T ensorFlow optimizers) cannot make progress; it w as nonetheless an imp ortan t b enc hmark to ev aluate since it closely mimics the true reven ue ob jective. Medina and Mohri [14] discuss alternatives for optimizing the surrogate loss, and prop ose a sp ecial purp ose algorithm based on DC-programming (difference of conv ex functions programming), but they only scale it to thousands of training instances. The fact that the surrogate loss has not quite con verged after 400K iterations is a con tributing factor to its reven ue p erformance in Figure 2. Effectiv eness of Linear Regression While the key take-a wa y of our empirical ev aluation is the fact that the clearing loss dominates other metho ds in terms of rev enue vs. match rate trade-offs, 14 Figure 5: Conv ergence rate of the mo del under clearing and surrogate function, in minibatch iterations. Both loss functions are smo othed using a 0 . 9 moving a verage. another surprising consequence of this study is the effectiveness of using a simple regression on the top bid. The natural intuition would b e that any least-squares regression should perform p o orly since it has the same penalty for underpricing (whic h is a small loss in rev enue) and o verpricing (whic h can cause the transaction to fail to clear and hence incur in a large reven ue loss). Indeed it is the case that an unregularized regression (the leftmost green p oin t on Figure 2) incurs a large matc h rate loss, but it also achiev es significant reven ue improv ement (alb eit with an almost 5% loss in so cial w elfare compared to the clearing loss). Lo oking into the data, we found that an explanation for this fact is that bid distributions tend to b e highly skew ed whic h causes standard regression to underpredict for high bids and ov erpredict for low bids. In fact, under zero regularization the linear regression on the top bid underpredicts 17.7% of instances for bids b elo w the median and 99.1% for bids ab o ve the median. This type of b eha vior explains why standard regression can b e effective in practice despite the fact that square loss do es not enco de any difference b et ween underpredicting and ov erpredicting. 5 Conclusions This pap er introduced the notion of a predictive mo del for clearing prices in a mark et with bids and asks for units of an item. The loss function is obtained via the linear programming dual of the asso ciated allo cation problem. When applied to the problem of reven ue optimization via reserv e prices in second-price auctions, regularizing the loss has an intuitiv e in terpretation as expanding or limiting supply , whic h can b e formally linked to the exp ected matc h rate. Our 15 empirical ev aluation o ver a dataset of bids from Go ogle’s Ad Exchange confirmed that a mo del of clearing prices outp erforms standard regressions on bids, as well as a surrogate loss for the direct reven ue ob jective, in terms of the trade-off b et ween reven ue and match rate (or so cial w elfare). In future work, w e plan to dev elop mo dels of clearing prices for more complex allo cation problems such as searc h advertising, where the clearing loss can b e generalized (using the same dualit y ideas presented in this pap er) to handle a v ector of p osition prices. References [1] Kareem Amin, Afshin Rostamizadeh, and Umar Syed. Rep eated con textual auctions with strategic buyers. In A dvanc es in Neur al Information Pr o c essing Systems , pages 622–630, 2014. [2] Kenneth J. Arrow and Gerard Debreu. Existence of an equilibrium for a comp etitiv e economy . Ec onometric a , 22(3):265–290, 1954. [3] La wrence M Ausub el. An efficient dynamic auction for heterogeneous commo dities. A meric an Ec onomic R eview , 96(3):602–629, 2006. [4] Sushil Bikhchandani and John W Mamer. Comp etitiv e equilibrium in an exc hange economy with indivisibilities. Journal of Ec onomic The ory , 74(2): 385–413, 1997. [5] Jerem y Bulow and Paul Klemp erer. Auctions versus negotiations. A meric an Ec onomic R eview , 86:180–194, 1996. [6] N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserv e prices in second-price auctions. In ACM-SIAM Symp osium on Discr ete Algorithms (SODA) , pages 1190–1204. SIAM, 2013. [7] Maxime C Cohen, Ilan Lob el, and Renato Paes Leme. F eature-based dynamic pricing. In Pr o c e e dings of the 2016 A CM Confer enc e on Ec onomics and Computation , pages 817–817. A CM, 2016. [8] Ric hard Cole and Tim Roughgarden. The sample complexit y of reven ue maximization. In Pr o c e e dings of the 46th annual ACM Symp osium on The ory of Computing , pages 243–252. ACM, 2014. [9] P eerap ong Dhangwatnotai, Tim Roughgarden, and Qiqi Y an. Reven ue maximization with a single sample. Games and Ec onomic Behavior , 91: 318–333, 2015. [10] Alon Eden, Michal F eldman, Ophir F riedler, Inbal T algam-Cohen, and S Matthew W einberg. The competition complexity of auctions: A Bulo w- Klemp erer result for multi-dimensional bidders. In Pr o c e e dings of the 2017 ACM Confer enc e on Ec onomics and Computation , pages 343–343. ACM, 2017. 16 [11] F aruk Gul and Ennio Stacchetti. W alrasian equilibrium with gross substi- tutes. Journal of Ec onomic The ory , 87(1):95–124, 1999. [12] Jason D. Hartline and Tim Roughgarden. Simple v ersus optimal mec hanisms. In Pr o c e e dings 10th ACM Confer enc e on Ele ctr onic Commer c e , pages 225– 234, 2009. [13] Jieming Mao, Renato Paes Leme, and Jon Schneider. Contextual pricing for Lipsc hitz buyers. In A dvanc es in Neur al Information Pr o c essing Systems , pages 5648–5656, 2018. [14] Andres M Medina and Mehryar Mohri. Learning theory and algorithms for rev enue optimization in second price auctions with reserv e. In Pr o c e e dings of the 31st International Confer enc e on Machine L e arning (ICML-14) , pages 262–270, 2014. [15] Andres Mu ˜ noz Medina and Sergei V assilvitskii. Reven ue optimization with appro ximate bid predictions. In A dvanc es in Neur al Information Pr o c essing Systems 30 , pages 1856–1864, 2017. [16] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In Pr o c e e dings of the Confer enc e on L e arning The ory (COL T) , pages 1298–1318, 2016. [17] Jamie H Morgenstern and Tim Roughgarden. On the pseudo-dimension of nearly optimal auctions. In A dvanc es in Neur al Information Pr o c essing Systems , pages 136–144, 2015. [18] Roger B My erson. Optimal auction design. Mathematics of Op er ations R ese ar ch , 6(1):58–73, 1981. [19] Renato Paes Leme, Martin P´ al, and Sergei V assilvitskii. A field guide to personalized reserve prices. In Pr o c e e dings of the 25th International Confer enc e on World Wide Web (WWW) , pages 1093–1102, 2016. [20] Tim Roughgarden and Josh ua R. W ang. Minimizing regret with multiple reserv es. In Pr o c e e dings of the 2016 ACM Confer enc e on Ec onomics and Computation (EC) , pages 601–616. A CM, 2016. [21] Tim Roughgarden, Inbal T algam-Cohen, and Qiqi Y an. Supply-limiting mec hanisms. In Pr o c e e dings of the 13th A CM Confer enc e on Ele ctr onic Commer c e , pages 844–861. A CM, 2012. 17
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment