Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders

We study revenue optimization learning algorithms for repeated second-price auctions with reserve where a seller interacts with multiple strategic bidders each of which holds a fixed private valuation for a good and seeks to maximize his expected fut…

Authors: Alexey Drutsa

Reserv e Pricing in Repeated Second-Price A uctions with Strategic Bidders Alexey Drutsa Y ande x; MSU Moscow , Russia adrutsa@yandex.ru Abstract W e study re venue optimization learning algorithms for repeated second-price auctions with reserve where a seller interacts with multiple strategic bidders each of which holds a fixed pri v ate valuation for a good and seeks to maximize his expected future cumulati ve discounted surplus. W e propose a no vel algorithm that has strategic regret upper bound of O (log log T ) for worst-case valuations. This pricing is based on our nov el transformation that upgrades an algorithm designed for the setup with a single b uyer to the multi-buyer case. W e provide theoretical guarantees on the ability of a transformed algorithm to learn the valuation of a strategic buyer , which has uncertainty about the future due to the presence of ri vals. 1 Introduction Re venue maximization is one of fundamental development directions in major Internet companies that hav e their own online adv ertising platforms [ 35 , 12 , 1 , 27 , 41 ]. Most part of ad inv entory is sold via widely applicable second price auctions [ 38 , 57 ] and their generalizations like GSP [ 72 , 73 , 74 , 69 ]. Adjustment of reserve prices plays a central role in rev enue optimization here: their proper setting is studied both by g ame-theoretical methods [ 61 , 4 ] and by machine learning approaches [ 62 , 19 , 57 , 64 ]. In our w ork, we focus on a scenario where the seller r epeatedly interacts through a second-price auction with M strategic bidders (referred to as buyers as well). Each buyer participates in each round of this game, holds a fixed pri vate v aluation for a good (e.g., an ad space), and seeks to maximize his expected future discounted surplus gi ven his beliefs about the beha viors of other bidders. The seller applies a deterministic online learning algorithm, which is announced to the b uyers in advance and, in each round, selects individual reserve prices based on the pre vious bids of the buyers. The seller’ s goal is to maximize her re venue o ver a finite horizon T through r e gr et minimization for worst-case valuations of the bidders [59, 29]. Thus, the seller seeks for a no-r e gr et pricing algorithm. T o the best of our knowledge, no existing study in vestig ated worst-case r e gr et optimizing algorithms that set reserve prices in repeated second-price auctions with strategic bidders whose valuation is private, b ut fixed ov er all rounds. Howe v er , our setting constitutes a natural generalization of the well- studied 1 -buyer setup of repeated posted-price auctions 1 (RPP A) [ 6 , 59 ] to the scenario of multiple buyers in a second-price auction. In the RPP A setting, there are optimal algorithms [ 27 , 28 , 29 ] that hav e tight strategic regret bound of Θ(log log T ) . This bound follows from an ability of the seller to upper bound the b uyer v aluation ev en if he lies when rejecting a price [ 27 , Prop.2]. This ability strongly exploits that the buyer kno ws in adv ance the outcomes of a current and all future rounds since he has complete information due to the absence of ri v als. In our multi-bidder scenario, this does not hold: a bidder has incomplete information and is thus uncertain about the future. Hence, 1 In particular , when M = 1 , our auction in a round reduces to a posted-price one: the bidder has no riv als and his decision is thus binary (to accept or to reject a currently offered price). Preprint. Under re vie w . the theoretical guarantees could not be directly ported to our scenario when trying straightforw ardly apply the optimal 1 -buyer RPP A algorithms. In our study , we propose a no vel algorithm that can be applied against our strate gic b uyers with regret upper bound of O (log log T ) (Th. 1) and constitutes the main contribution of our work . W e also introduce a novel transformation of a RPP A algorithm that maps it to a multi-b uyer pricing and is based on a simple b ut crucial idea of c yclic elimination of all bidders except one in each round (Sec.3). Construction and analysis of the proposed algorithm and transformation ha ve required introduction of nov el techniques, which are contributed by our w ork as well. They include (a) the method to locate the v aluation of a strategic b uyer in a played round under his uncertainty about the future (Prop. 1); (b) the decomposition of strategic re gret into the regret of learning the indi vidual v aluations and the de viation regret of learning which bidder has the maximal v aluation (Lemma 1); and (c) the approach to learn the highest-valuation bidder with de viation re gret of O (1) w .r .t. T (Lemma 3). 2 Preliminaries: setup, backgr ound, and over view of results Setup of Repeated Second-Price A uctions. W e study the follo wing mechanism of r epeated second- price auctions . Namely , the auctioneer repeatedly proposes goods (e.g., advertisement opportunities) to M bidders (whose set is denoted by M := { 1 , . . . , M } , M ∈ N ) o ver T rounds: one good per round. From here on the following terminology is used as well: the seller for the auctioneer , a buyer for a bidder , and the time horizon for the number of rounds T . Each bidder m ∈ M holds a fixed private valuation v m ∈ [0 , 1] for a good, i.e., the v aluation v m is equal for goods of fered i n all rounds and is unknown to the seller . The vector of valuations of all bidders is denoted by v := { v m } M m =1 . In each round t ∈ { 1 , . . . , T } , for each bidder m ∈ M , the seller sets a personal reserve price p m t , and the buyer m (knowing p m t ) submits a sealed bid of b m t . Giv en the reserve prices p t := { p m t } M m =1 and the bids b t := { b m t } M m =1 , the standard allocation and payment rules of a second price auction are applied [ 64 ]: (a) for each bidder m ∈ M , we check whether he bids over his reserve price or not, a m t := I { b m t ≥ p m t } , obtaining the set M t := { m ∈ M | a m t = 1 } of actual bidder -participants; (b) if M t 6 = ∅ , the good is allocated to the winning bidder m t := argmax m ∈ M t b m t (if a tie, choose randomly) who pays p t := max { p m t t , max m ∈ M t \{ m t } b m t } to the seller . (c) if M t = ∅ , the current good disappears and no payment is transferred. Further we use the follo wing notations for allocation indicators, payments, and their vectors: a t := I { M t 6 = ∅ } , a m t := I { M t 6 = ∅ & m = m t } , p m t := a m t p t , a t := { a m t } M m =1 , a t := { a m t } M m =1 , and p t := { p m t } M m =1 . The summary on all notations is in App. C. Thus, the seller applies a (pricing) algorithm A that sets reserve prices p 1: T := { p t } T t =1 in response to the buyers’ bids b 1: T := { b t } T t =1 . W e consider the deterministic online learning case when the reserve price p m t for a bidder m ∈ M in a round t ∈ { 1 , . . . , T } can depend only on bids b 1: t − 1 of all bidders during the pre vious rounds and, possibly , the horizon T . Let A M be the set of such algorithms. Hence, giv en a pricing algorithm A ∈ A M , the buyers’ bids b 1: T uniquely define the corresponding price sequence { p t } T t =1 , which, in turn, determines the seller’ s total re venue P T t =1 a t p t . This re v enue is usually compared to the re venue that would have been earned by offering the highest v aluation v := max m ∈ M v m if the valuations v = { v m } M m =1 were known in adv ance to the seller [ 6 , 27 ]. This leads to the notion of the r e gr et of the algorithm A : Reg ( T , A , v , b 1: T ) := P T t =1 ( v − a t p t ) . Follo wing a standard assumption in mechanism design that matches the practice in ad exchanges [ 59 , 29 ], the seller’ s pricing algorithm A is announced to the b uyers in advance . A bidder can then act strategically against this algorithm. In contrast to the case of one bidder ( M = 1 ), where the b uyer can get an optimal behavior in adv ance, and the repeated mechanism reduces thus to a two-stage game [ 6 , 59 , 27 ]; in our setting, a bidder has incomplete information since he may not kno w the valuations and behaviors of the other bidders. Therefore, in order to model buyer strate gic behavior under this uncertainty , we assume that, in each round t , each buyer optimizes his utility on subgame of future rounds giv en the av ailable history of pre vious rounds and his beliefs about the other buyers. Formally , in a round t , gi ven the seller’ s pricing algorithm A , a strategic buyer m ∈ M observes a history h m t := ( b m 1: t − 1 , p m 1: t , a m 1: t − 1 , p m 1: t − 1 ) av ailable to him and derives his optimal bid ˚ b m t from a (possibly mixed) strate gy σ ∈ S T 2 that maximizes his future γ m -discounted surplus: 2 A buyer str ate gy is a map σ : H 1: T → R + that maps any history h ∈ H t in a round t to a bid σ ( h ) ∈ R + , where H 1: T := t T t =1 H t and H t := R t − 1 + × R t + × Z t − 1 2 × R t − 1 + . Let S T denote the set of all possible strate gies. 2 Sur t : T ( A , γ m , v m , h m t , β m , σ ) = E h T X s = t γ s − 1 m a m s ( v m − p m s ) | h m t , σ , β m i , (1) where γ m ∈ (0 , 1] is the discount rate 3 of the bidder m . The expectation in Eq. (1) is taken over all possible continuations of the history h m t w .r .t. a strategy σ ∈ S T of the b uyer m and his beliefs β m about the strategies of the other bidders M − m := M \ { m } 4 . The buyer m assumes that the other bidders are strate gic in the sense described above as well, what is taken into account in the beliefs β m 5 . When T rounds has been played, let ˚ b t := { ˚ b m t } M m =1 be the optimal bids that depend on ( T , A , v , γ , β ) , where γ = { γ m } M m =1 and β = { β m } M m =1 . W e define the strate gic r e gr et of the algorithm A that faced M strategic b uyers with valuations v ∈ [0 , 1] M and beliefs β ov er T rounds as SReg( T , A , v , γ , β ) := Reg  T , A , v , ˚ b 1: T ( T , A , v , γ , β )  . In our setting, follo wing [ 6 , 59 , 27 , 29 ], we seek for algorithms that attain o ( T ) strategic re gret for the worst-case v aluations v ∈ [0 , 1] M . Formally , an algorithm A is said to be a no-r e gr et one when sup v ∈ [0 , 1] M , β SReg( T , A , v , γ , β ) = o ( T ) in our multi-buyer case. The optimization goal is to find algorithms with the lowest possible strate gic regret upper bound O ( f ( T )) , i.e., f ( T ) has the slowest growth as T → ∞ or , alternativ ely , the averaged re gret has the best rate of con ver gence to zero. Background on pricing algorithms. T o the best of our kno wledge, there is no work studied worst- case re gr et optimizing algorithms that set reserve prices in repeated second-price auctions with strategic bidders whose valuation is private, b ut fixed over all rounds. Howe ver , in the case of one bidder , M = 1 , the bidder has no ri vals, and, thus, the second-price auction in a round t reduces to a posted-price auction, where the b uyer decision reduces to a binary action: to accept or to reject a currently offered price p 1 t . Let A RPP A ⊂ A 1 be the subclass of the 1-bidder algorithms s.t. each reserve price p 1 t depends only on the past binary decisions a 1 1: t − 1 of the buyer to get or do not get a good for a posted reserv e price. For this subclass, all our strate gic setting of repeated second-price auctions reduces to the setup of repeated posted-price auctions (RPP A) earlier introduced in [6]. Pricing algorithms in the strategic setup of RPP A with fixed pri vate v aluation and worst-case regret optimization were well studied last years [ 6 , 59 , 27 , 29 ]. It is known that, if the discount rate γ = 1 , any algorithm has a linear strategic regret, i.e., the re gret has lower bound Ω( T ) [ 6 ], while, for the other cases γ ∈ (0 , 1) , the lower bound of Ω(log log T ) holds [ 46 , 59 ]. The first algorithm with optimal strategic regret bound of Θ(log log T ) was found in [ 27 ]. It is Penalized Reject-Revising Fast Exploiting Search (PRRFES), which is horizon-independent and is based on Fast Search [ 46 ] modified to act against a strategic buyer . The modifications include penalizations (see Def. 1). A strategic b uyer either accepts the price at the first node or rejects this price in subsequent penalization ones [ 59 , 27 ]. PRRFES is also a right-consistent algorithm: a RPP A algorithm A 1 is right-consistent ( A 1 ∈ C R ) if it ne ver of fers a price lower than the last accepted one [ 27 ]. The algorithm PRRFES was further modified by the transformation pre to obtain the one that nev er decreases offered prices and has a tight strategic re gret bound of Θ(log log T ) as well [29]. The workflow of a RPP A algorithm A 1 is usually described by a labeled binary tree T ( A 1 ) [ 59 , 27 , 29 ]: initialize the tracking node n to the root e ( T ( A 1 )) ; in each round, the label p( n ) is offered as a price; if it is accepted (rejected), move the tracking node to the right child n := r ( n ) (the left child n := l ( n ) , resp.); and go to the next round. The left (right) subtrees rooted at the node l ( n ) ( r ( n ) , resp.) are denoted by L ( n ) ( R ( n ) , resp.). When trees T 1 and T 2 hav e the same node labeling, we write T 1 ∼ = T 2 . Definition 1. For a RPP A algorithm A 1 ∈ A RPP A , nodes n 1 , ..., n r ∈ T ( A 1 ) are said to be a ( r - length) penalization sequence if n i +1 = l ( n i ) , p( n i +1 ) = p( n i ) , and R ( n i +1 ) ∼ = R ( n i ) , i = 1 , .., r − 1 . Overview of our r esults. W e cannot directly apply the optimal RPP A algorithms [ 27 , 29 ], because our bidders have incomplete information in the game, while the proofs of optimality of these algorithms strongly rely on complete information. This completely different information structure of the multi-buyer g ame results in very complicated bidder beha vior even in the absence of reserve prices [ 14 ]. Hence, it is challenging to find, in the multi-buyer case, a pricing algorithm that has 3 Note that only buyer utilities are discounted over time, what is moti vated by real-world markets as online advertising where sellers are far more willing to wait for re venue than b uyers are willing to wait for goods [ 59 , 29 ]. 4 So, σ and β m determine the future outcomes a m s and p m s , that are thus random variables. 5 In our setup, we do not require that the strategies actually used by the b uyers M − m match with the b uyer m ’ s beliefs β m (an equilibrium requirement), because our results hold without this requirement. 3 regret upper bound of the same asymptotic behavior as the best one in the 1 -buyer RPP A setting. Our research goal comprises closing of this research question on the existence of such algorithms. First, we propose a no vel technique to transform a RPP A algorithm to our setup that is based on cyclic elimination of all bidders except one by means of high enough prices (Sec. 3). Separate playing with each buyer remov es his uncertainty about the outcome of a current round; and, despite remaining uncertainty about future rounds, this is enough to construct a tool to locate his v aluation (Prop. 1). Second, we transform PRRFES in this way and sho w that its regret is affected by two learning processes: the one learns bidder v aluations and the other learns which bidders have the maximal valuation (Sec. 4). The former learning is controlled by the design of the source PRRFES, while the latter one is achiev ed by a special stopping rule that excludes bidders from suspected ones. A proper combination of parameters for the source pricing and the stopping rule pro vides an algorithm with strategic re gret in O (log log T ) , see Th. 1. Related work. Sev eral studies maximized rev enue of auctions in an of fline/batch learning fashion: either via estimating or fitting of distributions of buyer v aluations/bids to set reserv e prices [ 38 , 69 , 64 ], or via direct learning of reserve prices [ 57 , 58 , 67 , 54 ]. In contrast to them, we set prices in repeated auctions by an online deterministic learning approach. Re venue optimization for repeated auctions was mainly concentrated on algorithmic reserve prices, that are updated in online way ov er time, and was also kno wn as dynamic pricing [ 33 , 25 ]. Dynamic pricing was considered: under game-theoretic vie w [ 50 , 22 , 11 , 8 , 56 ]; from the bidder side [ 44 , 75 , 39 , 13 ]; in experimental studies [ 52 , 16 , 76 ]; as bandit problems [ 5 , 51 , 18 ]; and from other aspects [ 66 , 31 , 21 , 41 ]. Repeated auctions with a contextual information about the good in a round were considered in [ 7 , 24 , 53 , 49 ]. The studies [ 68 , 36 , 26 , 43 , 71 ] elaborated on setups of repeated posted-price auctions with a strategic buyer holding a fixed valuation, but maximized expected revenue for a gi ven prior distribution of valuations, while we optimize re gret w .r .t. worst-case valuations without kno wing their distribution. There are studies on reserv e price optimization in repeated second-price auctions, b ut they consider ed scenarios different to ours. Non-strategic bidders are considered in [ 19 ]. Kanoria et al. [ 45 ] studied strategic b uyers (similarly to our work), b ut maximized expected re venue w .r .t. a prior distrib ution of valuations. Our setup can be considered as a special case of repeated V ickrey auctions in [ 40 ], but their regret upper bound is O ( T α ) in T and holds only when selling several goods in a round. Howe ver , the most relev ant works to ours are [ 6 , 59 , 27 , 29 ], where our strategic setup with fix ed priv ate valuation is considered, but for the case of one bidder , M = 1 . The most important results of these works are discussed abov e in this section (see “Background on pricing algorithms"). 3 Dividing algorithms and div -transf ormation Barrage pricing. In our setting, a pricing algorithm is able to set personal (individual) reserv e prices to each bidder and is able hence to “eliminate" particular bidders from particular rounds. Namely , in a round t , an algorithm can set a reserv e price p bar s.t. a strategic bidder m , independently of his valuation, will ne ver accept p bar , i.e., will nev er bid no lower than this price; such a price is referred to as a barra ge r eserve price . From here on we use p bar = 1 / (1 − γ 0 ) , γ 0 ∈ (0 , 1) : accepting it once will result in a negati ve surplus for a buyer with discount γ i ≤ γ 0 . W e use the phrase “the bidder m is eliminated 6 from participation in the round t " to describe this case. Dividing algorithms. In this subsection, we introduce a subclass of the algorithms A M that is denoted by A div M ⊂ A M and is referred to as the class of dividing algorithms (stands for lat. “Divide et impera"). A dividing algorithm A ∈ A div M works in periods and tracks a feasible set of suspected bidders S aimed to find the bidder (or bidders) with the maximal v aluation v . Namely , it starts with all bidders S 1 := M at the first period which lasts M rounds. In each period i ∈ N , the algorithm iterates ov er the currently suspected bidders S i : in a current round, it picks up m ∈ S i , sets a non-barrage reserve price to the bidder m , sets a barrage reserv e price to all other bidders M − m , and goes to the next round within the period by picking up the next buyer from S i . Thus, the algorithm meaningfully interacts with only one bidder in each round through elimination of all other bidders by means of barrage pricing. After the i -th period, the algorithm A identifies somehow which bidders from S i should be left as suspected ones in the next period (i.e., be included in the set S i +1 ). 6 Note that, (a) formally , all bidders participate in all rounds (see Sec. 2) and (b), if a bidder is not eliminated, it does not mean that he is in M t (he may bid below his reserv e price which can be a non-barrage one). So, the word “elimination" is purposely associate with barrage pricing in order to refer to this case. 4 When the game has been played with the dividing algorithm A , one can split all the rounds into I periods: { 1 , . . . , T } = ∪ I i =1 T i . Each period i < I consists of |T i | = | S i | rounds (the last one of |T I | ≤ | S I | ). Let t m i ∈ T i denote the round of a period i in which a bidder m is not eliminated by the seller algorithm (i.e., recei ves a non-barrage reserv e price). Thus, I m := { t m 1 , ..., t m I m } are all such rounds of the bidder m and I m = |I m | is referred to as the subhorizon of the bidder m (the number of periods where he participates). Note that (a) I m and I m depend on the bids b 1: T of all buyers M ; (b) the following identities hold: { 1 , . . . , T } = ∪ M m =1 I m and T i = { t m i | m ∈ M s.t. I m ≥ i } . So, in a round t m i , the algorithm A eliminates the bidders M − m (i.e., sets the reserves p m 0 t m i = p bar ∀ m 0 ∈ M − m ), while the reserve price p m t m i set for the b uyer m is determined only by his bids during the pre vious rounds { t m 1 , ..., t m i − 1 } where he has not been eliminated: i.e., p m t m i = p m ( b m t m 1 , ..., b m t m i − 1 ) . Hence, the algorithm A ’ s interaction with the bidder m in the rounds I m can be encoded by a 1 -buyer algorithm from A 1 , which sets prices in the rounds { t m i } I m i =1 instead of { i } I m i =1 . W e denote this algorithm by A m and refer to it as the subalgorithm of A against the buyer m . Let Reg m ( I m , A m , v m , b m 1: T ) := P I m i =1 ( v m − a m t m i p m t m i ) be the regret of the subalgorithm A m for given bids b m 1: T of the buyer m ∈ M in the rounds I m . The lemma holds (the trivial proof is in App. A.1.1). Lemma 1. Let A ∈ A div M be a dividing algorithm, A m ∈ A 1 , m ∈ M , be its subalgorithms (as described above), and ˚ b 1: T = ˚ b 1: T ( T , A , v , γ , β ) be optimal bids of the strate gic buyer s M . Then, for any v ∈ [0 , 1] M , γ ∈ (0 , 1] M , and β , the strate gic r egr et of A can be decomposed into two parts SReg( T , A , v , γ , β ) = SReg ind ( T , A , v , γ , β ) + SReg dev ( T , A , v , γ , β ) , wher e SReg ind ( T , A , v , γ , β ) := P m ∈ M Reg m ( I m , A m , v m , ˚ b m 1: T ) is the individual part of the r egr et and SReg dev ( T , A , v , γ , β ) := P m ∈ M I m ( v − v m ) is the deviation part of the r e gret. Informally , this lemma states that the regret consists of the individual regrets against each b uyer m in his rounds I m and the de viation of the b uyer valuations v from the maximal one v . So, we see a clear intuition: a good algorithm should (1) learn the valuations v of the b uyers (minimizing indi vidual regrets) and (2) learn whic h buyer s have the highest valuation v (minimizing the de viation regret). div -transformation. Let A 1 ∈ A RPP A be a 1 -buyer RPP A algorithm. An algorithm div M ( A 1 , sr) is said to be a div -transformation of A 1 with a stopping rule sr : M × T ( A 1 ) M → bool when it is a dividing algorithm from A div M s.t. its subalgorithms A m are A 1 and the stopping rule sr determines which bidders are not suspected ones in S i +1 after a period i . Namely , first, the algorithm div M ( A 1 , sr) tracks the state of each buyer m ∈ M in the tree T ( A 1 ) of the RPP A algorithm A 1 (see Sec. 2) by means of a personal (indi vidual) feasible node. For each period i and for each round t m i ∈ T i , the current state (i.e., the history of previous actions) of the buyer m is encoded by the tracking node n m i ∈ T ( A ) ; in particular, in the round t m i , he receiv es the reserve price p( n m i ) of this node n m i (the other bidders M − m get a barrage reserv e price p bar ). If a b uyer m is not more suspected in a period i > I m (i.e., m 6∈ S i ), we formally set n m i := n m I m +1 . Second, after a period i , the stopping decision is based on the past buyer binary actions that are coded by means of the nodes { n m i +1 } M m =1 in the binary tree T ( A 1 ) : if the stopping rule sr( m 0 , { n m i +1 } M m =1 ) is true , then the b uyer m 0 6∈ S i +1 . The pseudo-code of the div -transformation of a RPP A algorithm is in Appendix B.1. For a RPP A right-consistent algorithm A 1 ∈ C R with penalization rounds, let hA 1 i denote the transfor- mation of A 1 s.t. it is equal to A 1 , but each penalization sequence of nodes { n j } r j =1 ⊂ T ( A 1 ) , r ≥ 2 , (see Def. 1) is reinfor ced in the follo wing way: all the prices in the nodes { n j } ∪ R ( n j ) , j = 2 , ..., r , are replaced by 1 (the maximal v aluation domain value); the sequence and the rounds are then referred to as r einfor ced penalization ones. After this, a strategic buyer will certainly either accept the price at the node n 1 , or reject the prices in all the nodes { n j } r j =1 ev en in the case of uncertainty about the future. Let δ l n := p( n ) − inf m ∈ L ( n ) p( m ) be the left increment [59, 27] of a node n ∈ T ( A 1 ) . In order to obtain upper bounds on strategic regret, it is important to hav e a tool that allows to locate the v aluation of a strategic bidder . Such a tool can be obtained for div -transformed right-consistent RPP A algorithms with reinforced penalization rounds based on the follo wing proposition, which is an analogue of [27, Prop.2] in our case with buyer uncertainty about the future. Proposition 1. Let γ m ∈ (0 , 1) , A 1 ∈ A RPP A ∩ C R be a RPP A right-consistent pricing algorithm, n ∈ T ( A 1 ) be a starting node in a r -length penalization sequence (see Def . 1), r > log γ m (1 − γ m ) , sr : M × T ( A 1 ) M → bool be a stopping rule, and the div -transformation div M ( hA 1 i , sr) be used by the seller for setting r eserve prices. If, in a r ound, the node n is r eached and the price p( n ) is r ejected 5 by a str ategic b uyer m ∈ M (i.e., he bids lower than p( n ) ), then the following inequality on v m holds: v m − p( n ) < ζ r,γ m δ l n , wher e ζ r,γ := γ r / (1 − γ − γ r ) . (2) Pr oof sketch. The full proof is in App.A.1.2. Let t be the round in which the bidder m reaches the node n and rejects his reserve price p m t = p( n ) . In particular , it is the round where he is the non- eliminated buyer and t = t m i ∈ T i for some period i . Since the buyers are di vided and A 1 ∈ A RPP A , w .l.o.g., any strategy can be treated as a map to binary decisions { 0 , 1 } . Let ˚ σ be the optimal strategy used by the buyer m ; h m t ; a be the continuation of the current history h m t by a binary decision a m t = a , while ˆ σ a denote an optimal strategy among all possible strategies in which the binary buyer decision a m t is a ∈ { 0 , 1 } ; and S m t ( σ ) := Sur t : T ( A , γ m , v m , h m t , β m , σ ) be the future expected surplus when following a strate gy σ ∈ S T . Rejection of the price p m t when following the optimal strate gy ˚ σ easily implies: S m t ( ˆ σ 1 ) ≤ S m t ( ˆ σ 0 ) . Let us bound each side of this inequality . First, S m t ( ˆ σ 1 ) = γ t − 1 m ( v m − p( n )) + Sur t +1: T ( A , γ m , v m , h m t ;1 , β m , ˆ σ 1 ) ≥ γ t − 1 m ( v m − p( n )) , (3) where we used the facts (i) that if the bidder accepts the price p( n ) , then he necessarily gets the good since all other bidders M − m are eliminated by a barrage price in this round t ; and (ii) that the expected surplus in rounds s ≥ t + 1 is at least non-negati ve, because the subalgorithm A 1 ∈ C R is right-consistent. Second, S m t ( ˆ σ 0 ) = Sur t m i + r : T ( A , γ m , v m , h m t ;0 , β m , ˆ σ 0 ) < γ t + r − 1 m 1 − γ m ( v m − p( n ) + δ l n ) , where we (i) used the fact that if the bidder rejects the price p m t , then the future rounds { t m i + j } r − 1 j =1 will be reinforced penalization ones (the strategic bidder will reject in all of them); and (ii) upper bounded the surplus in remaining rounds by assuming that only this bidder will get remaining goods for the lowest reserve price from the left subtree L ( n ) . W e unite these bounds on S m t ( ˆ σ a ) and get ( v m − p( n )) (1 − γ m − γ r m ) < γ r m δ l n , what implies Eq. (2), since r > log γ m (1 − γ m ) . W e emphasize that the dividing structur e of the algorithm is crucially exploited in the proof of Prop. 1. Namely , the fact that all other bidders M − m are eliminated by a barrage price in the round t is used (a) to guarantee obtaining of the good at price p( n ) by the buyer m and (b) to lower bound thus the future surplus S m t ( ˆ σ 1 ) in the case of acceptance in Eq. (A.4). If we dealt with a non-dividing algorithm, then another bidder might win the good or make the payment of the bidder m higher than his reserv e price p( n ) ; in both cases, S m t ( ˆ σ 1 ) could only be lo wer bounded by 0 in a general situation, what would result in an useless inequality instead of Eq. (2). For a right-consistent algorithm A 1 ∈ C R , the increment δ l n is bounded by the dif ference between the current node’ s price p( n ) and the last accepted price q by the buyer m before reaching this node. Hence, the Prop. 1 provides us with a tool to locate the valuation v m despite the strategic buyer does not myopically report its position (similar to [ 27 , Prop.2]). Namely , if the buyer m bids no lo wer than p( n ) , then v m ≥ p( n ) ; if he bids lo wer than p( n ) , then q ≤ v < p( n ) + ζ r,γ m (p( n ) − q ) and the closer an offered price p( n ) is to the last accepted price q the smaller the location interval of possible valuations v m (since its length is (1 + ζ r,γ m )(p( n ) − q ) ). 4 divPRRFES algorithm In this section, we will sho w that we can use an optimal algorithm from the setting of repeated posted- price auctions to obtain the algorithm for our multi-bidder setting with upper bound on strategic regret with the same asymptotic. Namely , let us div -transform PRRFES [ 27 ], further denoted as A 1 . Since a div -transformation of PRRFES (with penalization reinforcement) indi vidually tracks position of each buyer in the binary tree T ( hA 1 i ) , we adapt the key notations of PRRFES [ 27 ] to our case of multiple bidders and periods. Against a buyer m ∈ M , PRRFES hA 1 i works in phases initialized by the phase index l := 0 , the last accepted price before the current phase q m 0 := 0 , and the iteration parameter  0 := 1 / 2 ; at each phase l ∈ Z + , it sequentially offers prices p m l,k := q m l + k  l , k ∈ N ( explor ation r ounds ), with  l = 2 − 2 l ; if a price p m l,k is rejected, setting K m l := k − 1 ≥ 0 , (1) it of fers the price 1 for r − 1 r einforced penalization r ounds (if one of them is accepted, 1 will be of fered in all remaining rounds), (2) it offers the price p m l,K m l for g ( l ) exploitation r ounds , and (3) PRRFES goes to the next phase by setting q m l +1 := p m l,K m l and l := l + 1 . Individual tracking of bidders by the div -transformed PRRFES implies that dif ferent buyers can be in dif ferent phases in the same period i . 6 Hence, let l m i denote the current phase of a buyer m ∈ M in the round t m i of a period i ≤ I m , and let l m i := l m I m +1 in all subsequent periods i > I m (when the buyer m is no more suspected). In particular , q m l m i is the last accepted price by the buyer m before the phase l m i in the period i . W e rely on the decomposition from Lemma 1 in order to bound the strategic re gret of a div -transformed PRRFES. Upper bound f or individual regr ets. Before specifying a particular stopping rule, let us obtain an upper bound on individual strate gic regret Reg m ( I m , hA 1 i , v m , ˚ b m 1: T ) , m ∈ M . This regret is not equal to SReg( I m , hA 1 i , ( v m ) , ( γ m )) since, in the latter case, the 1 -bidder game does not depend on behavior of the other bidders M − m (while, in the former case, does). In other words, the rounds I m = { t m i } I m i = 1 do not constitute the I m -round 1 -buyer game of the RPP A setting considered in [ 6 , 27 ], because the subhorizon I m and exact rounds I m (they determine the used discount factors: γ t − 1 m , t ∈ I m ) are unknown in adv ance and depend on actions of the other bidders. Hence, this does not allow to straightforwardly utilize the result on the strategic regret for PRRFES prov ed in [ 27 , Th.5] for the setting of RPP A. So, we have to pro ve the bound O (log log T ) for our case with buyer uncertainty about the future. Let us introduce the notation: r γ :=  log γ  (1 − γ ) / 2  ∀ γ ∈ (0 , 1) . Lemma 2. Let γ 0 ∈ (0 , 1) , A 1 be the PRRFES algorithm with r ≥ r γ 0 and the exploitation rate g ( l ) = 2 2 l , l ∈ Z + , and sr : M × T ( A 1 ) M → bool be a stopping rule. Then, if I m ≥ 2 , the individual r egr et of the div -transformed PRRFES div M ( hA 1 i , sr) against the b uyer m ∈ M is upper bounded: Reg m ( I m , hA 1 i , v m , ˚ b m 1: T ) ≤ ( r v m + 4)(log 2 log 2 I m + 2) ∀ γ m ∈ (0 , γ 0 ] ∀ v m ∈ [0 , 1] , (4) wher e ˚ b 1: T = ˚ b 1: T ( T , div M ( hA 1 i , sr) , v , γ , β ) ar e optimal bids of the strate gic buyers M . Pr oof sketch. Decompose the indi vidual re gret ov er the rounds I m into the sum of the phases’ re grets: Reg m ( I m , hA 1 i ,v m , ˚ b m 1: T ) = P L m l =0 R m l , where L m := l m I m is the number of phases conducted by the algorithm against the b uyer m . For l ∈ Z L m − 1 : R m l = P K m l k =1 ( v m − p m l,k ) + r v m + g ( l )( v m − p m l,K m l ) , where the terms correspond to the accepted exploration rounds, the reject-penalization ones, and the exploitation ones. PRRFES and each rejected price p m l,K m l +1 satisfy the conditions of Prop. 1, what implies v m − p m l,K m l +1 < ( p m l,K m l + 1 − p m l,K m l ) =  l (since ζ r,γ m ≤ 1 for r ≥ r γ 0 and γ m ≤ γ 0 ). Hence, v m ∈ [ q m l +1 ,q m l +1 + 2  l ) (since q m l +1 = p m l,K m l and PRRFES is right-consistent) and the number of exploration rounds is thus bounded: K m l +1 < 2 2 l +1 . All further steps are similar to [ 27 , Th.5]: P K m l k =1 ( v m − p m l,k ) < 2 ; for each phase l , we get that R m l ≤ r v m + 4 ; and the number of phases L m ≤ log 2 log 2 I m + 1 . The full proof is in Appendix A.2.1 of Supp. Materials. Upper bound for deviation r egret. Prop. 1 pro vides us with the tool that locates the v aluation v m of a bidder m ∈ M at least in the segment [ u m i , w m i ] := [ q m l m i , q m l m i + 2  l m i − 1 ] right after a period i − 1 (see the proof [sketch] of Lemma 2), when r ≥ r γ m . This means: if, after playing a period i − 1 , the upper bound w m i of the v aluation of a bidder m ∈ M is lower that the lower bound u ˆ m i of the v aluation of another bidder ˆ m ∈ M − m , i.e., w m i < u ˆ m i , then the bidder m does definitely hav e non-maximal valuation (i.e., v m < v ) and needs not to be suspected in the period i and subsequent ones. For giv en parameters r and g ( · ) of the PRRFES algorithm A 1 , any state n ∈ T ( A 1 ) of the algorithm can be mapped to the current phase l ( n ) and the last accepted price q ( n ) before the phase l ( n ) . Thus, we define the stopping rule: sr A 1 ( m, { n m } M m =1 ) := ρ ( m, { l ( n m ) } M m =1 , { q ( n m ) } M m =1 ) , where ρ ( m, l , q ) := ∃ ˆ m ∈ M − m : q m + 2  l m − 1 < q ˆ m ∀ l ∈ Z M + ∀ q ∈ R M + . (5) The div -transformation div M ( hA 1 i , sr A 1 ) of the PRRFES algorithm A 1 with the stopping rule sr A 1 defined in Eq. (5) is referred to as the dividing P enalized Reject-Revising F ast Exploiting Sear ch ( divPRRFES ). The pseudo-code of divPRRFES is presented in Appendix B.2 of Supp.Materials. Lemma 3. Let γ 0 ∈ (0 , 1) , the discounts γ ∈ (0 , γ 0 ] M , and the seller uses the divPRRFES pricing algorithm div M ( hA 1 i , sr A 1 ) with the number of penalization r ounds r ≥ r γ 0 , with the exploitation rate g ( l ) = 2 2 l , l ∈ Z + , and with the stopping rule sr A 1 defined in Eq. (5). Then, for a bidder m ∈ M with non-maximal valuation, i.e., v m < v , his subhorizon I m is bounded: I m ≤ 24( v − v m ) − 1 + r  1 + log 2 log 2 (4( v − v m ) − 1 )  < (24 + 5 r )( v − v m ) − 1 . (6) Pr oof sketch. Let m be a buyer with the maximal v aluation v . Note that, in any period j = 1 , .., I m , the location intervals [ q m l m j , q m l m j + 2  l m j − 1 ] and [ q m l m j , q m l m j + 2  l m j − 1 ] must intersect (otherwise, the 7 stopping rule sr A 1 has eliminated the buyer m before the period j , and, hence, j > I m ). In particular , in the period I m ,  L ( m 0 ,m ) ≥ ( v − v m ) / 4 holds for either m 0 = m or (not exclusi vely) m 0 = m , where L ( m 0 , m ) := l m 0 I m . From the definition of the iteration parameter  l , i.e. log 2  l = − 2 l , one can obtain the bound on one of the phases: min { L ( m, m ) , L ( m, m ) } ≤ log 2 log 2 (4 / ( v − v m )) . T o bound the subhorizon I m , decompose it into the numbers of exploration, reject-penalization, and exploitation rounds in each phase l = 0 , . . . , L ( m 0 , m ) played by a buyer m 0 ∈ { m, m } . Applying techniques similar to the ones used in the proof of Lemma 2 (in particular , the bound on the number of exploration rounds: K m 0 l ≤ 2 · 2 2 l − 1 ), we get: I m ≤ ( L ( m 0 , m ) + 1) r + 2 · 3 · 2 2 L ( m 0 ,m ) for m 0 ∈ { m, m } . This combined with the previous inequality implies Eq. (6). The full proof is in App. A.2.2. This lemma implies the upper bound for the de viation part of the strategic re gret of the divPRRFES pricing algorithm A = div M ( hA 1 i , sr A 1 ) against the strategic b uyers M : SReg dev ( T , A , v , γ , β ) = P M m =1 I m ( v − v m ) ≤ (24 + 5 r )( M − 1) . Let us denote by M := { m ∈ M | v m = v } the set of bidders with the maximal v aluation and by v := max m ∈ M \ M v m the highest valuation among non-maximal ones. Thus, we sho wed that learning of the max-v aluation bidders M con verges with the rate in versely proportional to v − v (i.e., after the period d (24 + 5 r ) / ( v − v ) e the set of suspected bidders is al ways S i = M ) and this learning contributes a constant (w .r .t. the horizon T ) to the strategic regret. Finally , Lemma 1, 2, and 3 trivially imply (see App. A.2.3) the follo wing theorem. Theorem 1. Let γ 0 ∈ (0 , 1) , A 1 be the PRRFES algorithm with r ≥ r γ 0 and the exploitation rate g ( l ) = 2 2 l , l ∈ Z + , and sr A 1 be the stopping rule defined in Eq.(5). Then, for T ≥ 2 , the str ate gic r e gr et of the divPRRFES pricing algorithm A = div M ( hA 1 i , sr A 1 ) against the buyer s M is upper bounded: SReg( T , A , v , γ , β ) ≤ M ( r v + 4)(log 2 log 2 T + 2) + (24 + 5 r )( M − 1) ∀ γ ∈ (0 ,γ 0 ] M ∀ v ∈ [0 , 1] M ∀ β . (7) 5 Discussion, extensions of the result, and conclusions Other auction formats. The techniques and algorithms dev eloped in our work can be applied in repeated auctions where another format of selling a good in a round is used. Namely , our results hold in our repeated setting with an auction format (within rounds) that satisfies the follo wing: (a) personal reserve prices are allo wed; and (b) if a buyer m is only one non-eliminated participant in a round t , then his bidding mechanism allows him to choose between getting the good for the reserve price p m t and rejecting it. This holds e.g. for first(/third/..)-price auctions, for PP A with multiple bidders, etc. Regret dependence on M . The upper bound of the di vPRRFES regret in Eq. (7) linearly depends on M . W e believ e that it is not an artif act of our analysis tools, b ut a payment for the div -transformation. Consider the case in which all bidders ha ve the same v aluation, i.e., all their v aluations are v . Each bidder will be always suspected by di vPRRFES (i.e., be in S i ∀ i ). Hence, divPRRFES will just learn the valuation v for each of M bidders independently and, thus, M times slower; i.e., it is natural that the regret of divPRRFES is M times larger than the regret of PRRFES against a single buyer . Howe ver , there might exist an algorithm that do not suf fer from dividing structure in this w ay . So, existence of an algorithm with a more fa vorable re gret dependence on M is an open research question. Lower bound and optimality . For the case M = 1 , there does exist the lo wer bound: the strategic regret of any pricing algorithm is Ω(log log T ) [ 59 ]. Hence, our upper bound for the algorithm divPRRFES is optimal in the general case of any number of bidders. Nonetheless, structure of the game with non-single b uyer ( M ≥ 2 ) is much more complicated, since a buyer has to act in the presence of ri vals and under uncertainty about the future. This is an additional opportunity that can be exploited by a pricing algorithm. Thus, the validity of the lo wer bound Ω(log log T ) for M ≥ 2 is an open research question. Sev eral other discussions of the results are also in App. D. 6 Conclusions W e studied the scenario of repeated second-price auctions with reserve pricing where a seller interacts with multiple strategic buyers. Each buyer participates in each round of the game, holds a fixed priv ate valuation for a good, and seeks to maximize his expected future discounted surplus. First, we proposed the so-called dividing transformation that upgrades an algorithm designed for the setup with a single buyer to the multi-b uyer case. Second, the transformation allowed us to obtain a nov el horizon-independent algorithm that can be applied against strategic b uyers with regret upper bound 8 of O (log log T ) . Finally , we introduced non-tri vial techniques such as (a) the method to locate the valuation of a strategic buyer in a played round under b uyer uncertainty about the future; (b) the decomposition of strategic re gret into the individual and de viation parts; and (c) the approach to learn the highest-valuation bidder with de viation regret of O (1) . 9 Acknowledgments I would like to thank Ser gei Izmalkov who inspired me to conduct this study . References [1] D. Agarwal, S. Ghosh, K. W ei, and S. Y ou. Budget pacing for targeted online advertisements at link edin. In KDD’2014 , pages 1613–1619, 2014. [2] G. Aggarwal, G. Goel, and A. Mehta. Efficienc y of (rev enue-) optimal mechanisms. In EC’2009 , pages 235–242, 2009. [3] G. Aggarwal, S. Muthukrishnan, D. Pál, and M. Pál. General auction mechanism for search advertising. In WWW’2009 , pages 241–250, 2009. [4] S. Agrawal, C. Daskalakis, V . Mirrokni, and B. Sivan. Rob ust repeated auctions under heterogeneous buyer beha vior. arXiv pr eprint arXiv:1803.00494 , 2018. [5] K. Amin, M. Kearns, and U. Syed. Bandits, query learning, and the haystack dimension. In COLT , pages 87–106, 2011. [6] K. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers. In NIPS’2013 , pages 1169–1177, 2013. [7] K. Amin, A. Rostamizadeh, and U. Syed. Repeated contextual auctions with strate gic buyers. In NIPS’2014 , pages 622–630, 2014. [8] I. Ashlagi, C. Daskalakis, and N. Haghpanah. Sequential mechanisms with ex-post participation guarantees. In EC’2016 , 2016. [9] I. Ashlagi, B. G. Edelman, and H. S. Lee. Competing ad auctions. Harvard Business Sc hool NOM Unit W orking P aper , (10-055), 2013. [10] M. Babaioff, S. Dughmi, R. Kleinberg, and A. Slivkins. Dynamic pricing with limited supply . ACM T ransactions on Economics and Computation , 3(1):4, 2015. [11] S. Balseiro, O. Besbes, and G. Y . W eintraub. Dynamic mechanism design with budget constrained b uyers under limited commitment. In EC’2016 , 2016. [12] S. R. Balseiro, O. Besbes, and G. Y . W eintraub . Repeated auctions with budgets in ad exchanges: Approximations and design. Management Science , 61(4):864–884, 2015. [13] M. S. Baltaoglu, L. T ong, and Q. Zhao. Online learning of optimal bidding strategy in repeated multi- commodity auctions. In Advances in Neural Information Pr ocessing Systems , pages 4507–4517, 2017. [14] S. Bikhchandani. Reputation in repeated second-price auctions. J ournal of Economic Theory , 46(1):97–119, 1988. [15] B. Caillaud and C. Mezzetti. Equilibrium reserve prices in sequential ascending auctions. Journal of Economic Theory , 117(1):78–95, 2004. [16] O. Carare. Reserve prices in repeated auctions. Review of Industrial Or ganization , 40(3):225–247, 2012. [17] L. E. Celis, G. Lewis, M. M. Mobius, and H. Nazerzadeh. Buy-it-no w or take-a-chance: a simple sequential screening mechanism. In WWW’2011 , pages 147–156, 2011. [18] N. Cesa-Bianchi, T . Cesari, and V . Perchet. Dynamic pricing with finitely many unkno wn valuations. arXiv pr eprint arXiv:1807.03288 , 2018. [19] N. Cesa-Bianchi, C. Gentile, and Y . Mansour . Regret minimization for reserve prices in second-price auctions. In SOD A ’2013 , pages 1190–1204, 2013. [20] D. Charles, N. R. Dev anur, and B. Si van. Multi-score position auctions. In WSDM’2016 , pages 417–425, 2016. [21] S. Cha wla, N. R. De vanur , A. R. Karlin, and B. Siv an. Simple pricing schemes for consumers with e volving values. In SOD A ’2016 , pages 1476–1490, 2016. [22] Y . Chen and V . F . Farias. Robust dynamic pricing with strate gic customers. In EC’2015 , pages 777–777, 2015. [23] M. Chhabra and S. Das. Learning the demand curv e in posted-price digital goods auctions. In ICAAMS’2011 , pages 63–70, 2011. [24] M. C. Cohen, I. Lobel, and R. Paes Leme. Feature-based dynamic pricing. In EC’2016 , 2016. [25] A. V . den Boer . Dynamic pricing and learning: historical origins, current research, and new directions. Surve ys in operations r esearc h and management science , 20(1):1–18, 2015. 10 [26] N. R. Dev anur, Y . Peres, and B. Siv an. Perfect bayesian equilibria in repeated sales. In SODA ’2015 , pages 983–1002, 2015. [27] A. Drutsa. Horizon-independent optimal pricing in repeated auctions with truthful and strategic buyers. In WWW’2017 , pages 33–42, 2017. [28] A. Drutsa. On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic buyer . CoRR , abs/1707.05101, 2017. [29] A. Drutsa. W eakly consistent optimal pricing algorithms in repeated posted-price auctions with strategic buyer . In ICML’2018 , pages 1318–1327, 2018. [30] P . Dütting, M. Henzinger , and I. W eber . An expressi ve mechanism for auctions on the web. In WWW’2011 , pages 127–136, 2011. [31] M. Feldman, T . Koren, R. Livni, Y . Mansour , and A. Zohar . Online pricing with strategic and patient buyers. In NIPS’2016 , pages 3864–3872, 2016. [32] H. Fu, P . Jordan, M. Mahdian, U. Nadav , I. T algam-Cohen, and S. V assilvitskii. Ad auctions with data. In Algorithmic Game Theory , pages 168–179. Springer , 2012. [33] D. Fudenberg and J. M. V illas-Boas. Behavior -based price discrimination and customer recognition. Handbook on economics and information systems , 1:377–436, 2006. [34] N. Golrezaei, M. Lin, V . Mirrokni, and H. Nazerzadeh. Boosted second-price auctions for heterogeneous bidders. 2017. [35] R. Gomes and V . Mirrokni. Optimal revenue-sharing double auctions with applications to ad exchanges. In WWW’2014 , pages 19–28, 2014. [36] O. D. Hart and J. T irole. Contract renegotiation and coasian dynamics. The Review of Economic Studies , 55(4):509–540, 1988. [37] J. D. Hartline and T . Roughgarden. Simple versus optimal mechanisms. In Pr oceedings of the 10th ACM confer ence on Electr onic commerce , pages 225–234. A CM, 2009. [38] D. He, W . Chen, L. W ang, and T .-Y . Liu. A game-theoretic machine learning approach for rev enue maximization in sponsored search. In IJCAI’2013 , pages 206–212, 2013. [39] H. Heidari, M. Mahdian, U. Syed, S. V assilvitskii, and S. Y azdanbod. Pricing a low-re gret seller . In ICML ’2016 , pages 2559–2567, 2016. [40] Z. Huang, J. Liu, and X. W ang. Learning optimal reserve price against non-myopic bidders. In Advances in Neural Information Pr ocessing Systems , pages 2042–2052, 2018. [41] P . Hummel. Reserve prices in repeated auctions. International Journal of Game Theory , 47(1):273–299, 2018. [42] P . Hummel and P . McAfee. Machine learning in an auction en vironment. In WWW’2014 , pages 7–18, 2014. [43] N. Immorlica, B. Lucier , E. Pountourakis, and S. T aggart. Repeated sales with multiple strategic b uyers. In EC’2017 , pages 167–168, 2017. [44] K. Iyer , R. Johari, and M. Sundararajan. Mean field equilibria of dynamic auctions with learning. ACM SIGecom Exchanges , 10(3):10–14, 2011. [45] Y . Kanoria and H. Nazerzadeh. Dynamic reserve prices for repeated auctions: Learning from bids. 2014. [46] R. Kleinberg and T . Leighton. The value of knowing a demand curve: Bounds on regret for online posted-price auctions. In F oundations of Computer Science , pages 594–605, 2003. [47] V . Krishna. Auction theory . Academic press, 2009. [48] S. Lahaie, A. M. Medina, B. Siv an, and S. V assilvitskii. T esting incenti ve compatibility in display ad auctions. In WWW’2018 , 2018. [49] R. P . Leme and J. Schneider . Contextual search via intrinsic v olumes. arXiv preprint , 2018. [50] R. P . Leme, V . Syrgkanis, and É. T ardos. Sequential auctions and externalities. In SODA ’2012 , pages 869–886. SIAM, 2012. [51] T . Lin, J. Li, and W . Chen. Stochastic online greedy learning with semi-bandit feedbacks. In NIPS’2015 , pages 352–360, 2015. [52] J. A. List and J. F . Shogren. Price information and bidding beha vior in repeated second-price auctions. American Journal of Agricultur al Economics , 81(4):942–949, 1999. [53] J. Mao, R. Leme, and J. Schneider . Contextual pricing for lipschitz buyers. In Advances in Neural Information Pr ocessing Systems , pages 5648–5656, 2018. 11 [54] A. M. Medina and S. V assilvitskii. Revenue optimization with approximate bid predictions. In NIPS’2017 , pages 1856–1864, 2017. [55] V . Mirrokni, R. Paes Leme, P . T ang, and S. Zuo. Non-clairvoyant dynamic mechanism design. 2017. [56] V . Mirrokni, R. Paes Leme, P . T ang, and S. Zuo. Optimal dynamic auctions are virtual welfare maximizers. A vailable at SSRN , 2018. [57] M. Mohri and A. M. Medina. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In ICML’2014 , pages 262–270, 2014. [58] M. Mohri and A. M. Medina. Non-parametric revenue optimization for generalized second price auctions. In U AI’2015 , 2015. [59] M. Mohri and A. Munoz. Optimal regret minimization in posted-price auctions with strategic b uyers. In NIPS’2014 , pages 1871–1879, 2014. [60] J. H. Morgenstern and T . Roughgarden. On the pseudo-dimension of nearly optimal auctions. In NIPS’2015 , pages 136–144, 2015. [61] R. B. Myerson. Optimal auction design. Mathematics of operations r esearc h , 6(1):58–73, 1981. [62] N. Nisan, T . Roughgarden, E. T ardos, and V . V . V azirani. Algorithmic game theory , volume 1. v .1 CUPC, 2007. [63] M. Ostrovsky and M. Schwarz. Reserve prices in internet advertising auctions: A field experiment. In EC’2011 , pages 59–60, 2011. [64] R. Paes Leme, M. Pál, and S. V assilvitskii. A field guide to personalized reserve prices. In WWW’2016 , 2016. [65] M. Peters and S. Se verinov . Internet auctions with many traders. Journal of Economic Theory , 130(1):220– 245, 2006. [66] T . Roughgarden and J. R. W ang. Minimizing regret with multiple reserv es. In EC’2016 , pages 601–616, 2016. [67] M. R. Rudolph, J. G. Ellis, and D. M. Blei. Objectiv e variables for probabilistic re venue maximization in second-price auctions with reserve. In WWW’2016 , 2016. [68] K. M. Schmidt. Commitment through incomplete information in a simple repeated bargaining game. Journal of Economic Theory , 60(1):114–139, 1993. [69] Y . Sun, Y . Zhou, and X. Deng. Optimal reserve prices in weighted gsp auctions. Electr onic Commerce Resear ch and Applications , 13(3):178–187, 2014. [70] D. R. Thompson and K. Leyton-Brown. Rev enue optimization in the generalized second-price auction. In EC’2013 , pages 837–852, 2013. [71] A. V anunts and A. Drutsa. Optimal pricing in repeated posted-price auctions. arXiv preprint arXiv:1805.02574 , 2018. [72] H. R. V arian. Position auctions. international J ournal of industrial Organization , 25(6):1163–1178, 2007. [73] H. R. V arian. Online ad auctions. The American Economic Review , 99(2):430–434, 2009. [74] H. R. V arian and C. Harris. The vcg auction in theory and practice. The A.E.R. , 104(5):442–445, 2014. [75] J. W eed, V . Perchet, and P . Rigollet. Online learning in repeated auctions. JMLR , 49:1–31, 2016. [76] S. Y uan, J. W ang, B. Chen, P . Mason, and S. Seljan. An empirical study of reserve price optimisation in real-time bidding. In KDD’2014 , pages 1897–1906, 2014. [77] Y . Zhu, G. W ang, J. Y ang, D. W ang, J. Y an, J. Hu, and Z. Chen. Optimizing search engine rev enue in sponsored search. In SIGIR’2009 , pages 588–595, 2009. [78] M. Zoghi, Z. S. Karnin, S. Whiteson, and M. De Rijke. Copeland dueling bandits. In NIPS’2015 , pages 307–315. A Missed proofs A.1 Missed proofs fr om Section 3 A.1.1 Proof of Lemma 1 Pr oof. Let I m = { t m i } I m i =1 be the set of rounds in which the bidder m is not eliminated by a barrage re- serve pricing. Therefore, we hav e decomposition of the sequence of all rounds into the union of these sets: { 1 , . . . , T } = ∪ m ∈ M I m . Note that we also hav e a splitting in periods { 1 , . . . , T } = ∪ I i =1 T i and the intersec- tion I m ∩ T i = { t m i } for m ∈ M , i = 1 , . . . , I m . 12 So, formally , we hav e SReg( T , A , v , γ , β ) = Reg  T , A , v , ˚ b 1: T ( T , A , v , γ , β )  = T X t =1 ( v − a t p t ) = X m ∈ M I m X i =1 ( v − a t m i p t m i ) , (A.1) where the two first identities follow from definitions, while the latter one is just a change of the order of summation (since { 1 , . . . , T } = ∪ m ∈ M I m = ∪ m ∈ M { t m i } I m i =1 ). The terms in the sum could be decomposed in the follo wing way: v − a t m i p t m i = v − v m + v m − a t m i p t m i . Also note, since, in each round t m i , the bidders M − m are eliminated by a barrage reserve price, then the allocation indicator a t m i and the transferred payment p t m i depend only on the behavior of the bidder m in this round, i.e., a t m i = a m t m i , p t m i = p m t m i , and, if a m t m i = 1 , p t m i = p m t m i = p m t m i . So, we can continue Eq. (A.1): SReg( T , A , v , γ , β ) = X m ∈ M I m X i =1 ( v − v m + v m − a t m i p t m i ) = M X m =1 I m X i =1 ( v − v m ) + M X m =1 I m X i =1 ( v m − a m t m i p m t m i ) = M X m =1 I m ( v − v m ) + M X m =1 Reg m ( I m , A m , v m , ˚ b m 1: T ) , = SReg dev ( T , A , v , γ , β ) + SReg ind ( T , A , v , γ , β ) . (A.2) A.1.2 Proof of Pr oposition 1 Pr oof. Let t be the round in which the bidder m reaches the node n and rejects his reserve price p m t , which is equal to p m t = p( n ) by the construction of the algorithm div M ( hA 1 i , sr) . Note that, in the round t , all other bidders M − m are eliminated by a barrage price and the reserve prices set by the div -algorithm div M ( hA 1 i , sr) depend only on a 1: T (because A 1 ∈ A RPP A and sr : M × T ( A 1 ) M → bool ). Therefore, it is easy to see that, for any strategy σ , the expected future surplus Sur t : T ( A , γ m , v m , h m t , β m , σ ) of the bidder m as a function of the bid b m t = σ ( h m t ) in the round t depends, in fact, only on the binary decision a m t = I { b m t ≥ p m t } : more formally , the expected surplus is constant when the bid b m t is changed within { b m t ≥ p m t } and is constant when the bid b m t is changed within { b m t < p m t } . Moreover , since the b uyers are divided (in the whole game) and A 1 ∈ A RPP A , if two strategies σ 0 and σ 00 ∈ S T do not dif fer in their binary output, i.e., I { σ 0 ( h ) ≥ p m t } = I { σ 00 ( h ) ≥ p m t } ∀ h ∈ H 1: T , then they have the same future discounted surplus. Hence, an y strategy can be treated as a map to binary decisions { 0 , 1 } (instead of R + ). Let ˆ σ a denote an optimal strategy among all possible strategies in which the binary decision a m t in the round t is a ∈ { 0 , 1 } , i.e., I { ˆ σ a ( h m t ) ≥ p m t } = a and ˆ σ a maximizes E [ T X s = t γ s − 1 m a m s ( v m − p m s ) | h m t , a m t = a, σ, β m ] . Giv en a strategy σ ∈ S T , let us denote the future expected surplus when follo wing this strategy by S m t ( σ ) := Sur t : T ( A , γ m , v m , h m t , β m , σ ) . When the optimal strategy ˚ σ m (used by the b uyer) is pure , we directly ha ve S m t ( ˆ σ 1 ) ≤ S m t ( ˚ σ m ) = S m t ( ˆ σ 0 ) , since the price p m t is rejected ( a m t = 0 ) by our strategic b uyer . In the general case, when the b uyer’ s optimal strategy ˚ σ m is mixed , let α 0 be the probability of a reject ( a m t = 0 ) and, thus, 1 − α 0 be the probability of an acceptance ( a m t = 1 ) in this strategy . Since the strategy is optimal, its surplus S m t ( ˚ σ m ) = α 0 S m t ( ˆ σ 0 ) + (1 − α 0 ) S m t ( ˆ σ 1 ) must be no lower than the surplus S m t ( ˆ σ 1 ) of the strategy ˆ σ 1 : α 0 S m t ( ˆ σ 0 ) + (1 − α 0 ) S m t ( ˆ σ 1 ) ≥ S m t ( ˆ σ 1 ) . Since the price p m t is rejected, the probability α 0 > 0 and, thus, α 0 S m t ( ˆ σ 0 ) ≥ α 0 S m t ( ˆ σ 1 ) . In any way , we obtain: S m t ( ˆ σ 1 ) ≤ S m t ( ˆ σ 0 ) . (A.3) Let us bound each side of this inequality: S m t ( ˆ σ 1 ) = E [ T X s = t γ s − 1 m a m s ( v m − p m s ) | h m t , a m t = 1 , ˆ σ 1 , β m ] = = γ t − 1 m ( v m − p( n )) + E [ T X s = t +1 γ s − 1 m a m s ( v m − p m s ) | h m t , a m t = 1 , ˆ σ 1 , β m ] ≥ ≥ γ t − 1 m ( v m − p( n )) , (A.4) 13 where, in the second identity , we used the fact that if the bidder accepts the price p( n ) , then he necessarily gets the good since all other bidders M − m are eliminated by a barrage price in this round t ( it is the key point of the proof! ). In the last inequality , we used that the expected surplus in rounds s ≥ t + 1 is at least non-negati ve, because the subalgorithm A 1 ∈ C R is right consistent and accepting of the of fered price p( m ) in some reached node m ∈ T ( A 1 ) s.t. p( m ) > v m will thus result in reserv e prices for him higher than his v aluation v m in all subsequent rounds as well (so, the b uyer has no incentiv e to get a local negati ve surplus in a round, because it will result in non-positiv e surplus in all subsequent rounds). S m t ( ˆ σ 0 ) = E [ T X s = t γ s − 1 m a m s ( v m − p m s ) | h m t , a m t = 0 , ˆ σ 0 , β m ] = = E [ T X s = t m i + r γ s − 1 m a m s ( v m − p m s ) | h m t , a m t = 0 , ˆ σ 0 , β m ] ≤ ≤ T X s = t + r γ s − 1 m ( v m − p( n ) + δ l n ) < γ t + r − 1 m 1 − γ m ( v m − p( n ) + δ l n ) , (A.5) where i is the current period of the div -algorithm div M ( hA 1 i , sr) , i.e., the round t = t m i ∈ T i is such that the buyer m is the non-eliminated participant in this round (see Sec.3). In the second identity , we used the fact that if the bidder rejects the price p m t , then the future rounds { t m i + j } r − 1 j =1 (in which the bidder will be non-eliminated) will be reinforced penalization rounds (and the strategic bidder will reject prices in all of them as well). In the first inequality , we just upper bounded surplus by assuming that only this bidder left among the suspected bidders S j , j > i, and he recei ves the lo west possible reserve price from the left subtree L ( n ) of the node n . The latter inequality is just a simple arithmetic upper bound for the sum of discounts P T s = t + r γ s − 1 m . W e unite these bounds on S m t ( ˆ σ 0 ) and S m t ( ˆ σ 1 ) (i.e., Eq. (A.3), (A.4), and (A.5)), divide by γ t − 1 m , and get ( v m − p( n ))  1 − γ r m 1 − γ m  < γ r m 1 − γ m δ l n , (A.6) that implies the inequality claimed by the proposition, since r > log γ m (1 − γ m ) . A.2 Missed proofs fr om Section 4 A.2.1 Proof of Lemma 2 Pr oof. The game has been played and ˚ b 1: T = ˚ b 1: T ( T , div M ( hA 1 i , sr) , v , γ , β ) are the resulted optimal bids of the strategic buyers M . So, let L m := l m I m be the number of phases conducted by the algorithm during the rounds I m = { t m i } I m i =1 against the strategic b uyer m . Then we decompose the total individual re gret over these rounds into the sum of the phases’ regrets: Reg m ( I m , hA 1 i , v m , ˚ b m 1: T ) = P L m l =0 R m l . For the regret R l at each phase except the last one, the follo wing identity holds: R m l = K m l X k =1 ( v m − p m l,k ) + r v m + g ( l )( v m − p m l,K m l ) , l = 0 , . . . , L m − 1 , (A.7) where the first, second, and third terms correspond to the exploration rounds with acceptance, the reject- penalization rounds, and the exploitation rounds 7 , respectiv ely . Since the basis of the subalgorithm PRRFES A 1 ∈ C R is right-consistent [ 27 ], as discussed in the proof of Proposition 1 (see Appendix A.1.2), the optimal strategy of the bidder m is non-losing [ 27 ]: the b uyer has no incentive to get a local ne gativ e surplus in a round, because it will result in non-positiv e surplus in all subsequent rounds. Hence, since the price p m l,K m l is 0 or has been accepted, we have p m l,K m l ≤ v m . Second, since the price p m l,K m l +1 is rejected, we ha ve v m − p m l,K m l +1 < ( p m l,K m l +1 − p m l,K m l ) =  l (by Proposition 1 since ζ r,γ m ≤ 1 for r ≥ r γ 0 and γ m ≤ γ 0 ). Hence, the valuation v m ∈  p m l,K m l , p m l,K m l + 2  l  and all accepted prices p m l +1 ,k , ∀ k ≤ K m l +1 , from the next phase l + 1 satisfy: p m l +1 ,k ∈ [ q m l +1 , v m ) ⊆  p m l,K m l , p m l,K m l + 2  l  ∀ k ≤ K m l +1 , because any accepted price has to be lower than the valuation v m for the strategic buyer (whose optimal strategy is locally non-losing one, as we stated abov e). This infers K m l +1 < 2  l / l +1 = 2 N l +1 , where 7 Note that the prices at the exploitation rounds p m l,K m l are equal to either 0 or an earlier accepted price, and are thus accepted by the strategic buyer (since the b uyer’ s decisions at these rounds do not affect further pricing of the algorithm divPRRFES). 14 N l :=  l − 1 / l =  − 1 l − 1 = 2 2 l − 1 . Therefore, for the phases l = 1 , . . . , L m , we hav e: v m − p m l,K m l < 2  l ; v m − p m l,k <  l  2 N l − k  ∀ k ∈ Z 2 N l ; and K m l X k =1 ( v m − p m l,k ) <  l 2 N l − 1 X k =1  2 N l − k  =  l 2 N l − 1 2  2 · 2 N l − 2 N l  ≤ 2 N l · N l  l = 2 N l ·  l − 1 = 2 , where we used the definitions of N l and  l . For the zeroth phase l = 0 , one has trivial bound P K m 0 k =1 ( v − p m 0 ,k ) ≤ 1 / 2 . Hence, by definition of the exploitation rate g ( l ) , we hav e g ( l ) =  − 1 l and, thus, R m l ≤ 2 + r v m + g ( l ) · 2  l ≤ rv m + 4 , l = 0 , . . . , L − 1 . (A.8) Moreov er , this inequality holds for the L m -th phase, since it differs from the other ones only in possible absence of some rounds (reject-penalization or exploitation ones). Namely , for the L m -th phase, we hav e: R m L = K m L X k =1 ( v m − p m L m ,k ) + r L m v m + g L m ( L m )( v m − p m L m ,K m L m ) , (A.9) where r L m is the actual number of reject-penalization rounds and g L m ( L m ) is the actual number of exploitation ones in the last phase. Since r L m ≤ r and g L m ( L m ) ≤ g ( L m ) , the right-hand side of Eq. (A.9) is upper- bounded by the right-hand side of Eq. (A.7) with l = L m , which is in turn upper-bounded by the right-hand side of Eq. (A.8). Finally , one has Reg m ( I m , div M ( hA 1 i , sr) , v m , ˚ b m 1: T ) = L m X l =0 R m l ≤ ( rv m + 4) ( L m + 1) . Thus, one needs only to estimate the number of phases L m by the subhorizon I m . So, for 2 ≤ I m ≤ 2 + r + g (0) , we have L m = 0 or 1 and thus L m + 1 ≤ 2 ≤ log 2 log 2 I m + 2 . For I m ≥ 2 + r + g (0) , we have I m = P L m − 1 l =0 ( K m l + r + g ( l )) + K m L m + r L m + g L m ( L m ) ≥ g ( L m − 1) with L m > 0 . Hence, g ( L m − 1) = 2 2 L m − 1 ≤ I m , which is equi valent to L m ≤ log 2 log 2 I m + 1 . Summarizing, we get the claimed upper bound of the lemma. A.2.2 Proof of Lemma 3 Pr oof. Let m ∈ M be one of the bidders M = { m ∈ M | v m = v } that have the maximal valuation v . Then, the stopping rule sr A 1 (which is based on the rule ρ ( m, l , q ) := ∃ ˆ m ∈ M − m : q m + 2  l m − 1 < q ˆ m ∀ l ∈ Z M + , ∀ q ∈ R M + ) is executed no later than the period i 0 where the upper bound q m l m i 0 + 2  l m i 0 − 1 of the bidder m ’ s valuation becomes lo wer than the lower bound q m l m i 0 of the bidder m ’ s valuation 8 . Moreov er , since v m ∈ [ q m l m j , q m l m j + 2  l m j − 1 ] and v m ∈ [ q m l m j , q m l m j + 2  l m j − 1 ] for any period j , the stopping rule is executed no later than the period i where both the phase iteration parameter  l m i of the bidder m and the phase iteration parameter  l m i of the bidder m become smaller than one quarter of the difference between the valuations of these bidders, i.e.,  l m i and  l m i < v − v m 4 (because, in this case, the segments [ q m l m i , q m l m i + 2  l m i − 1 ] and [ q m l m i , q m l m i + 2  l m i − 1 ] do not intersect at all, what implies q m l m i + 2  l m i − 1 < q m l m i ). Therefore, in the periods i ≤ I m , it is not possible to have simultaneously  l m i < v − v m 4 and  l m i < v − v m 4 . So, in the period i = I m , either  l m I m ≥ v − v m 4 , or (not exclusiv ely)  l m I m ≥ v − v m 4 holds. In particular, from the definition of the phase iteration parameter  l = 2 − 2 l , we hav e: if  l ≥ δ for some l ∈ Z + and δ ∈ (0 , 1 / 2) , then  l = 2 − 2 l ≥ δ ⇔ − 2 l ≥ log 2 δ ⇔ 2 l ≤ log 2 1 δ ⇔ l ≤ log 2 log 2 1 δ . Hence, in the period I m , the following holds: l m I m ≤ log 2 log 2 4 v − v m or (not exclusi vely) l m I m ≤ log 2 log 2 4 v − v m , 8 Note that it is correct to consider l m i in any period i ev en though the buyer m is not suspected in this period, i.e., m / ∈ S i . This is because the algorithm stops change the tracking node n m i in the subalgorithm tree T ( hA 1 i ) after the period I m , but l m i just remains the same in all subsequent periods, i.e., we formally set l m i = l m I m for all i > I m . 15 and, thus, min { l m I m , l m I m } ≤ log 2 log 2 4 v − v m . (A.10) Finally , we bound I m . Let, L m 0 ; m := l m 0 I m be the phase of a b uyer m 0 ∈ { m, m } in the period I m . As in the proof of Lemma 2 (see Appendix A.2.1) we decompose I m into the numbers of exploration, reject-penalization, and exploitation rounds in each phase l = 0 , . . . , L m 0 ; m passed by the buyer m 0 . Namely , I m = L m 0 ; m − 1 X l =0 ( K m 0 l + r + g ( l )) + K m 0 L m 0 ; m + r m 0 L m 0 ; m + g m 0 L m 0 ; m , (A.11) where r m 0 l and g m 0 l are the numbers of penalization rounds and exploitation rounds, resp., passed by the b uyer m 0 in the last phase l = L m 0 ; m before reaching the period I m . Let us trivially bound r m 0 L m 0 ; m ≤ r and g m 0 L m 0 ; m ≤ g ( L m 0 ; m ) . W e also know that, for any l ∈ Z + , K m 0 l ≤ 2 · 2 2 l − 1 (see the proof of Lemma 2 in Appendix A.2.1). Therefore, Eq. A.11 implies I m ≤ L m 0 ; m X l =0 (2 · 2 2 l − 1 + r + 2 2 l ) ≤ L m 0 ; m X l =0 (3 · 2 2 l + r ) ≤ ( L m 0 ; m + 1) r + 2 · 3 · 2 2 L m 0 ; m , (A.12) T aking m 0 = m and m 0 = m , we get the following from Eq. (A.12): I m ≤ (min { l m I m , l m I m } + 1) r + 6 · 2 2 min { l m I m ,l m I m } ≤ r (log 2 log 2 4 v − v m + 1) + 6 · 4 v − v m , (A.13) where we used the definition of L m 0 ; m := l m 0 I m and the upper bound for the phases l m I m and l m I m in Eq. (A.10). So, Eq. (A.13) implies the claim of the lemma. A.2.3 Proof of Theor em 1 Pr oof. From Lemma 1, we have: SReg( T , A , v , γ , β ) = M X m =1 Reg m ( I m , A m , v m , ˚ b m 1: T ) + M X m =1 I m ( v − v m ) . (A.14) From Lemma 2, if I m ≥ 2 , one can upper bound the first term in right-hand side of Eq. (A.14) since A m = hA 1 i : Reg m ( I m , A m , v m , ˚ b m 1: T ) ≤ ( rv m + 4)(log 2 log 2 I m + 2) ≤ ( r v + 4)(log 2 log 2 T + 2) , (A.15) where we bounded the subhorizon I m of each bidder m ∈ M by the time horizon T (i.e., I m ≤ T ) and the valuation v m of each bidder m ∈ M by the maximal valuation (i.e., v m ≤ v ). Note that the latter bound of Eq. (A.15) holds for Reg m ( I m , A m , v m , ˚ b m 1: T ) in the case of I m = 1 as well (this case has not been provided by Lemma 2). From Lemma 3, one can upper bound the second term in right-hand side of Eq. (A.14): M X m =1 I m ( v − v m ) ≤ X { m ∈ M | v m 6 = v } 24 + 5 r v − v m ( v − v m ) ≤ (24 + 5 r )( M − 1) , (A.16) where we used that at least one bidder m ∈ M has v m = v and, hence, |{ m ∈ M | v m 6 = v }| ≤ M − 1 . Thus, plugging Eq. (A.15) and Eq. (A.16) into Eq. (A.14), we obtain the claimed bound for the strategic regret of divPRRFES. B The pseudo-codes B.1 The pseudo-code of div -transformation 16 Algorithm B.1 Pseudo-code of a div -transformation div M ( A 1 , sr) of a RPP A algorithm A 1 ∈ A RPP A . 1: Input: M ∈ N , A 1 ∈ A RPP A , sr : M × T ( A 1 ) M → bool 2: Initialize: t := 1 , S := M , n [ ] := { e ( T ( A 1 )) } M m =1 3: while t ≤ T do 4: for all m ∈ S do 5: Set the price p( n [ m ]) as reserve to the b uyer m 6: Set the price p bar as reserve to the b uyers from M − m 7: b [ ] ← get bids from the b uyers M 8: if b [ m ] ≥ p( n [ m ]) then 9: Allocate t -th good to the buyer m for the price p( n [ m ]) 10: n [ m ] := r ( n [ m ]) 11: else 12: n [ m ] := l ( n [ m ]) 13: end if 14: t := t + 1 15: if t > T then 16: break 17: end if 18: end for 19: S old := S 20: for all m ∈ S old do 21: if sr( m, n [ ]) then 22: S := S \ { m } 23: end if 24: end for 25: end while B.2 The pseudo-code of divPRRFES 17 Algorithm B.2 Pseudo-code of the algorithm divPRRFES. 1: Input: M ∈ N , r ∈ N , and g : Z + → Z + 2: Initialize: t := 1 , S := M , q [] := { 0 } M m =1 , l [] := { 0 } M m =1 , x [] := { 0 } M m =1 , state[] := { ” explore ” } M m =1 3: while t ≤ T do 4: for all m ∈ S do 5: if state[ m ] = ” penalize ” then 6: p := 1 // a reinforced penalization round for the buyer m 7: x [ m ] := x [ m ] − 1 8: end if 9: if state[ m ] = ” explore ” then 10: p := q [ m ] + 2 − 2 l [ m ] // an exploration round for the b uyer m 11: else 12: p := q [ m ] // an e xploitation round for the buyer m 13: x [ m ] := x [ m ] − 1 14: end if 15: Set the price p as reserve to the b uyer m 16: Set the price p bar as reserve to the b uyers from M − m 17: b [ ] ← get bids from the b uyers M 18: if b [ m ] ≥ p then 19: Allocate t -th good to the buyer m for the price p 20: q [ m ] := p 21: if state[ m ] = ” penalize ” then 22: x [ m ] := − 1 // a reinforced penalization price is accepted; set 1 to the buyer m all his rounds 23: end if 24: else 25: if state[ m ] = ” explore ” then 26: state[ m ] := ” penalize ” 27: x [ m ] := r // an exploration price is rejected; mo ve the b uyer m to penalization 28: end if 29: end if 30: if state[ m ] = ” penalize ” and x [ m ] = 0 then 31: state[ m ] := ” exploit ” 32: x [ m ] := g ( l [ m ]) // penalization rounds are ended; mov e the buyer m to exploitation 33: end if 34: if state[ m ] = ” exploit ” and x [ m ] = 0 then 35: state[ m ] := ” explore ” 36: l [ m ] := l [ m ] + 1 // e xploitation rounds are ended; move the b uyer m to the next phase 37: end if 38: t := t + 1 39: if t > T then 40: break 41: end if 42: end for 43: S old := S 44: q max := max m ∈ M ( q [ m ]) 45: for all m ∈ S old do 46: if q [ m ] + 2 ∗ 2 − 2 l [ m ] − 1 < q max then 47: S := S \ { m } // remove the b uyer m from suspected ones if the stopping rule is satisfied 48: end if 49: end for 50: end while 18 T able C.1: General notations: part I. Notation Expression Description E [ · ] expectation I B the indicator: I B = 1 , when B holds, and 0 , otherwise. T the [time] horizon, the number of rounds in the repeated game t a round in the repeated game, t ∈ { 1 , . . . , T } v m the valuation of a b uyer m v = max m ∈ M v m the highest valuation among the b uyers v = max m ∈ M \ M v m the maximal valuation among non-highest valuations ot the b uyers (if exists) m a buyer that has the highest v aluation v m t = argmax m ∈ M t b m t the winning bidder in a round t for a gi ven play of the game (if e xists) b m t the bid of a buyer m in a round t for a gi ven play of the game p m t the reserve price set to a buyer m in a round t for a given play of the game a m t = I b m t ≥ p m t indicator of bidding higher than the reserve price by a buyer m in a round t for a gi ven play of the game a m t = I { M t 6 = ∅ & m = m t } the allocation outcome of a round t for a bidder m for a given play of the game a t = I { M t 6 = ∅ } the allocation outcome of a round t ov er all bidders for a gi ven play of the game p m t = a m t p t the payment outcome of a round t for a bidder m for a giv en play of the game p t = max { p m t t , max m ∈ M − m t t b m t } the payment outcome of a round t ov er all bidders for a gi ven play of the game x = { x m } M m =1 the vector of b uyer values of some notion x (e.g., v aluations v , bids b t , reserve prices p t , payments p t , allocations a t and a t etc) x t 1 : t 2 = { x t } t 2 t = t 1 the subseries of some time series { x t } T t =1 (e.g., bids b 1: T , reserve prices p 1: T , payments p 1: T , allocations a 1: T and a 1: T etc) A M the set of pricing algorithms of the seller against M buyers A RPP A ⊂ A 1 the subclass of 1 -buyer pricing algorithms for repeating posted-price auctions A a pricing algorithm (generally , from the set A M ) M the number of buyers in the repeated game M = { 1 , . . . , M } the set of buyers (bidders) M = { m ∈ M | v m = v } the set of buyers whose v aluation is the highest one v M − m = M \ { m } the set of buyers (bidders) without the b uyer m M t = { m ∈ M | b m t ≥ p m t } the set of actual buyers in a round t (they bid higher than reserve prices) C Summary on used notations Note that we use sev eral mnemonic notations: • upper index for a v alue of a particular buyer (e.g., v m , a m t , p m t , etc.); • boldface for a vector of v alues for all bidders (e.g., v , a t , p t , etc.); • bar (ov erline) for terms associated with best value / winning (e.g., the winner m t , the highest valuation v , etc.); etc. The full list of used notations is summarized below in the follo wing tables. C.1 General notations See T ables C.1, C.2, and C.3. 19 T able C.2: General notations: part II. Notation Expr ession Description Reg( . . . ) regret of a pricing algorithm SReg( . . . ) strategic re gret of a pricing algorithm Sur( . . . ) expected surplus of a b uyer (bidder) γ m the discount rate of a buyer m ∈ M γ = { γ m } M m =1 the vector of the discount rates of the b uyers h a buyer history h m t = ( b m 1: t − 1 , p m 1: t , a m 1: t − 1 , p m 1: t − 1 ) the history av ailable to a b uyer m in a round t for a giv en play of the game σ ∈ S T a buyer strate gy β m ∈ S M − 1 T the beliefs of a buyer m on the strategies of the other bidders β = { β m } M m =1 the beliefs of all buyers H t the set of all possible histories in a round t H t 1 : t 2 = t t 2 t = t 1 H t the disjoint union of the sets of histories in rounds t 1 , . . . , t 2 S T the set of all possible buyer strate gies ˚ σ m an optimal strategy of a b uyer m in a round t ˚ b m t the optimal bid of a buyer m in a round t for a gi ven play of the game ˚ b t = { ˚ b m t } M m =1 the optimal bids of all buyers in a round t for a giv en play of the game ˚ b 1: T the optimal bids of all buyers in all rounds for a gi ven play of the game T able C.3: General notations: part III (related to RPP A algorithms). Notation Expression Description T ( A 1 ) the complete binary tree associated with a RPP A algorithm A 1 n or m a node in the complete binary tree T ( A 1 ) of a RPP A algorithm A 1 r ( n ) the right child of a node n l ( n ) the left child of a node n R ( n ) the right subtree of a node n (its root is r ( n ) ) L ( n ) the left subtree of a node n (its root is l ( n ) ) e ( T ) the root of a tree T p( n ) the price in a node n (that is of fered to a b uyer when an algorithm reaches this node) T 1 ∼ = T 2 the trees T 1 and T 2 are price-equiv alent δ l n = p( n ) − inf m ∈ L ( n ) p( m ) the left increment of a node n C.2 Notations related to di viding algorithms See T able C.4. C.3 Notations related to di vPRRFES See T able C.5. 20 T able C.4: Notations related to dividing algorithms. Notation Expression Description i a period of a dividing algorithm ( do not confuse with (1) a round of the game and (2) a phase of PRRFES algorithm!) t m i the round in a period i in which the bidder m is not eliminated by a barrage price (i.e., m is non-eliminated participant) of a dividing algorithm for a giv en play of the game p m, bar or p bar a barrage reserve price S i the set of bidders suspected by a dividing algorithm in a period i for a giv en play of the game T i the rounds of a period i for a gi ven play of the game I m = { t m i } I m i =1 the rounds in which the bidder m is not eliminated by a barrage price (i.e., m is non-eliminated participant) of di viding algorithm for a gi ven play of the game I m = |I m | the subhorizon of a buyer m (the number of periods in which he is suspected, i.e., m ∈ S i ) for a giv en play of the game A m the subalgorithm of a dividing algorithm that acts ag ainst a buyer m Reg m ( . . . ) Regret of the subalgorithm of a di viding algorithm that acts against a buyer m div M ( . . . ) a div -transformation of 1 -buyer pricing algorithm to the case of M buyers SReg ind ( . . . ) individual strate gic regret of a dividing algorithm SReg dev ( . . . ) deviation strate gic regret of a dividing algorithm sr a stopping rule used in a div M -transformation of 1 -buyer pricing algo- rithm hAi a transformation of a RPP A algorithm A s.t. all penalization sequences of nodes are replaced by reinforced penalization ones n m i the tracking node of a buyer m by div M -transformed RPP A algorithm in a period i for a gi ven play of the game T able C.5: Notations related to divPRRFES. Notation Expression Description r the number of penalization rounds (a parameter of PRRFES) g ( l ) the exploitation rate (a parameter of PRRFES) l a phase of PRRFES ε l = 2 − 2 l the iteration parameter of a phase l q m l the last accepted price by a buyer m before a phase l for a given play of the g ame p m l,k the k -th exploration price of a b uyer m in a phase l for a giv en play of the game K m l the last accepted e xploration price of a b uyer m in a phase l for a giv en play of the game l m i the current phase of a buyer m in a period i for a gi ven play of the game l ( n ) the phase of a node n from the tree of the algorithm PRRFES q ( n ) the last accepted price before the current phase of a node n from the tree of the algorithm PRRFES 21 D Discussion & extensions of the result Impro vements of divPRRFES. For practical use, there are sev eral places where divPRRFES can be improv ed. For instance, (a) the penalization parameter r can be made adaptiv e to take into account the rounds in which a buyer is eliminated (i.e., reduce the number of penalizations by the number of ri vals currently suspected by the seller); (b) or the stopping rule sr A 1 can faster eliminate bidders, since the lower bound u m i can be updated each time the buyer m accepts an exploration price p m l,k . Despite these improv ements would require some additional pages in our proofs, they do not impro ve the asymptotic bound of O (log log T ) . Horizon independence. The algorithm divPRRFES is horizon-independent since it is based on the horizon- independent PRRFES A 1 , which induces the subalgorithm hA 1 i and the stopping rule sr A 1 . Hence, the seller is not required to know in adv ance the number of rounds T of the game, when she applies divPRRFES. 22

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment