Stochastic Optimal Control for Online Seller under Reputational Mechanisms

Stochastic Optimal Control for Online Seller under Reputational Mechanisms Milan Bradonji ´ c ∗ , Matthe w Causley † , Albert Cohen ‡ October 30, 2018 Abstract In this work we propose and analyze a model which addresses the pulsing behavior of sellers in an online auction (store). This pulsing behavior is observed when sellers switch between adv ertising and processing states. W e assert that a seller switches her state in order to maximize her proﬁt, and further that this switch can be identiﬁed through the seller’ s reputation. W e show that for each seller there is an optimal reputation, i.e. , the reputation at which the seller should switch her state in order to maximize her total proﬁt. W e design a stochastic behavioral model for an online seller , which incorporates the dy- namics of resource allocation and reputation. The design of the model is optimized by using a stochastic advertising model from [16] and used effecti vely in the Stochastic Optimal Control of Advertising [12]. This model of reputation is combined with the effect of online reputation on sales price empirically v er- iﬁed in [9]. W e deriv e the Hamilton-Jacobi-Bellman (HJB) differential equation, whose solution relates optimal wealth le vel to a seller’ s reputation. W e formulate both a full model, as well as a reduced model with fe wer parameters, both of which have the same qualitative description of the optimal seller behavior . Coincidentally , the reduced model has a closed form analytical solution that we construct. K eywords: Stochastic optimal control models; Online stores; Hamilton-Jacobi-Bellmann equation. 1 Intr oduction In this work, we consider sellers in an online market like Amazon without a bidding mechanism. The setting may include concurrent online auctions on a site like eBay , for a product or service with only a “buy-it-no w” option, under the assumption that buyers can compare reputation among online platforms. In this online en vironment, the buyer and seller ha ve a certain amount of anon ymity , which has implications on the fairness of the auction [2]. One of the only counter measures to ensure fairness is the use of feedback forums, such as Amazon, eBay , and Y ahoo!, at online auction sites. The feedback provided by customers provides a ranking system for the sellers, which in turn re wards fair sellers and penalizes unfair or unreliable sellers [11, 15, 14]. Since it is in the sellers best interest to maximize their proﬁts, it follows that a seller ∗ Milan Bradonji ´ c is with Mathematics of Netw orks and Systems, Bell Labs, Alcatel-Lucent, 600 Mountain A venue, Murray Hill, NJ 07974, USA, e-mail: milan@research.bell-labs.com . † Matthew Causley is with the Department of Mathematics, Kettering University , Flint, MI 48504, USA, e-mail: mcausley@kettering.edu . ‡ Albert Cohen is with the Department of Statistics and Probability , and Department of Mathematics, Michigan State Uni versity , East Lansing, MI 48824, USA, e-mail: albert@math.msu.edu . 1 will seek to use reputation as a means to optimize their decision making [7, 3]; e.g., for the optimal bidding strategies in sequential auctions, see [1]. Online auction sellers can be broadly categorized by tw o types of behavior: those focused on advertising and customer service, which includes follo wing up with past customers; and those who instead focus only on processing existing orders. While the latter behavior will likely lower the seller reputation on a verage, this immediate increase in wealth may offset the long-term damage from a lower reputation. W ith this in mind, it is compelling to ask whether there exists an optimal long term strategy , in which a seller attempts to maximize her proﬁt by achie ving a ﬁxed reputation. W e propose a simple behavioral model to study the relationship between a seller’ s wealth and reputation in the auctions en vironment with only the “buy-it-no w” option. Follo wing the standard Nerlo ve-Arro w con- struction [10], which was extended to a stochastic setting by Sethi [16], the wealth and growth of the seller are described by continuous stochastic processes. Note that our model is a continuous approximation of a discrete auctioning process. The discrete model of the process may be less tractable but still very interesting and useful. W e propose an optimal inﬁnite horizon strategy , and incorporate the wealth-growth model of Mink and Seifert [9], based on empirical studies of ebay sellers. Also, we use the model for reputation proposed by Sethi [16] and Rao [13] which has been adopted by many others in optimal advertising models, such as Raman’ s recent work in Boundary V alue Problems in the Stochastic Optimal Control of Advertis- ing [12]. This work incorporates the idea that humans do not multi task well, b ut rather switch, see [8]. In other words, switching costs do not factor in for an individual agent, but only for large online retailers who hav e marketing department reorganization costs. Additionally , the model in this work allo ws all sellers to hav e a little extra capacity/money for either shipping or advertising (e.g., writing a letter or similar individual campaign), which is rather small and not unlimited. The rest of the paper is laid out as follo ws. In Section 2, we deﬁne the stochastic Nerlove-Arro w model, along with the growth rate due to Mink and Seifert. Using the principal of optimality we derive the corresponding Hamilton Jacobi Bellman (HJB) boundary v alue problem satisﬁed by the value function. In Section 3, we propose sev eral important modiﬁcations to the existing methodology , and in doing so deriv e a ne w model which is normalized with respect to the seller reputation, and has a linear growth rate. In making these modiﬁcations, we also ﬁnd a reduced version of our new model, which is governed by a geometric Brownian motion. W e prove that both models admit unique piecewise continuous solutions, and are qualitatively similar . Howe ver the reduced model, surprisingly , has a closed form piecewise deﬁned solution. This allo ws us to prove that pulsing beha vior is an optimal strategy for sellers and also serves as a benchmark to our numerical solution of the full model in Section 4, which does not have a closed form analytic solution. Our results are compelling, and warrant further in vestigation for the ﬁnite horizon case, which we discuss with concluding remarks in Section 5. 2 Explicit Resour ce Allocation Mechanism Consider a seller in an online marketplace as a triple ( W, R, µ ) representing her wealth, reputation, and excess rate, respecti vely . Here, the excess rate is extra effort that can be allocated to either expedite payment or increase her reputation. In this setting, the reputation R is a positiv e number reﬂecting customer satis- faction. As an initial approach, we consider only the ef fects of reputation on the optimal way to grow her wealth, and leave the more general model with shipping costs for future work. Also, in our mathematical formulation, we consider the reputation mechanism ﬁrst suggested by the work of Nerlove and Arrow [10] and later generalized to stochastic settings in [16, 13, 12]. The seller is one of many competing in the online marketplace, and has her transactions veriﬁed and 2 processed automatically via the marketplace. All sellers competing in this marketplace hav e an expected speed for mailing out completed orders to be able to sell in the marketplace, with some time allowed for delayed shipping before the marketplace intervenes. In return, the marketplace guarantees the transaction. Payment is modeled to be released to the seller once the purchased item is mailed to the b uyer . Ho wev er , the seller can choose to go beyond these expected standards. For example, she may prepare packages and driv e to the post of ﬁce earlier than expected to expedite the payment from the marketplace. Or she may spend time and money communicating with buyers in a bid to increase their fav orable ratings. This may include sending a small unexpected gift in the package to be mailed out or spending extra time to indi vidually craft and send emails out to purchasers. Our model assumes that at most only one of these extra actions can happen with noticeable return at any gi ven time. The seller may also decide to do nothing extra, and in this case µ = 0 . For e xample, if she is focused on getting her payment released, then she is not able to focus on creative ways to engage her buyers post sale. W e also assume that her extra capacity to do either extra action is limited by absolute minimum mailing rates and her pool of resources she is willing to contrib ute to going beyond e xpected standards. By choosing to shift up her resources from promotion to e xpedited mailing and back, the seller can inﬂuence her wealth and reputation lev els via the excess rate µ . Positiv e µ corresponds to a promotional state, while negati v e µ corresponds to an expedited mailing state. W e note that if our seller is in fact a large company with many such agents that can work on either processing or sales, then the company can choose to shift the ratio of workers who work in processing or sales, and this model is the subject of future work. The work we present here is limited to a one agent seller . W e note here that our model also assumes no cost for the individual to switch behavior from promotion to expedited mailing. In a large company , one would expect such costs to arise in restricting the size of advertising and shipping departments. One related paper that addresses such switching costs, albeit in a real options in v estment setting, is the paper by Duckworth et al . [4]. W e utilize a continuous time formulation for wealth and reputation, as well as control of these processes. The seller will be determined to behav e optimally if she maximizes her expected present value of total earnings, until her reputational v alue reaches R = 0 . This produces a fully consistent formulation of both the v alue function V , and the strategy to achie v e this optimal v alue. 2.1 Mathematical Formulation Follo wing the frame work in [5, 6], reputation R is observ ed in O := (0 , ∞ ) and the control µ is constrained to U := [ − ,  ] . Since this process will be stochastic, we introduce a probability space (Ω , F , P ) and corresponding reference probability systems ν = ((Ω , {F s } , P ) , B ) where F s ⊂ F and B is an F s - adapted Bro wnian motion. The ev olution of wealth and reputation is hence modeled by a controlled two- state Markov process ( W µ . . , R µ . . ) , where wealth W grows at a rate proportional to a function h ( R ) that links reputation to re v enue per sale, subject to a controlled extra processing rate µ per unit time chosen from the admissible class A ν , A ν :=  µ ∈ F s | µ is F s -p.m. , µ . ∈ U on [0 , ∞ ) , E h Z τ O 0 e − ρs    (1 − µ s ) h ( R ( µ ) s )    ds i < ∞  . (1) Here, p.m. denotes pr o gr essively measur able , ρ is the constant discount factor to account for the time v alue of mone y , and τ O is the ﬁrst exit time of reputation R from O , or ∞ if R ∈ O for all s ≥ 0 . W e set ∞ as an absorbing state for reputation. Symbolically , 3 τ O := inf n t ≥ 0 | R ( µ ) t / ∈ O o . (2) Formally , our two state controlled Mark ov process is dW ( µ ) t = (1 − µ t ) h ( R ( µ ) t ) dt dR ( µ ) t = ( µ t − κR ( µ ) t ) dt + σ dB t , where κ is a proportionality constant which accounts for mean re version [10] and σ is the constant volatilty accounting for random ef fects to reputation. The major difﬁculty is in deﬁning the growth rate h ( R ) . Howe v er , an explicit mechanism has been proposed by Mink and Seifert [9]. There the authors not only propose, but empirically justify a growth rate of h ( R ) = A + C  1 − 1 ln ( e + R )  = A  1 + C A  1 − 1 ln ( e + R )  , (3) where A relates to the inherent value of the object for sale and C is a parameter to be ﬁtted. This is ac- complished by obtaining data using an auction robot and then computing a single regression, which gi ves C = 2 . 50 in Equation (3). T o the best of our knowledge, the model in [9] is among the ﬁrst to giv e an explicit relationship between reputation and price. W e also note that the parameter C A represents the maximal repu- tational ef fect of a seller as a fraction of inherent v alue A , and is what we choose in the numerical examples belo w . The model also suggests a multiple re gression formula where other f actors, such as shipping costs and whether a “buy-it-no w” price is of fered, are considered as well, in which case C = 1 . 93 . The authors in [9] also comment that highly e xperienced sellers have higher feedback scores and design the auction more fa vorably , which reﬂects to their higher re venue. In fact, the y sho w that the coefﬁcient attributed to shipping costs is lar ger than one, implying that customers put a high value on shipping when deciding on their bids, and that savvy agents take this into consideration. Notice that since h is bounded on O , our admissible class reduces to A ν := { µ ∈ F s | µ is F s -p.m. , µ . ∈ U on [0 , ∞ ) } . (4) Finally , the work in [9] posits that the horizon does not affect the revenue stream as much as the shipping cost and reputation factors, and so we consider an inﬁnite horizon model here. 2.2 Hamilton-Jacobi-Bellman F ormulation W ith the stochastic dynamics for reputation proposed in [16], a gro wth rate model for reputational ef fect on sales [9], and an inﬁnite horizon, we expect that switching would depend only on the current reputational state. W e no w seek a twice-continuously differentiable, polynomially growing function V ∈ C 2 [0 , ∞ ) ∩ 4 C p [0 , ∞ ) which is a candidate solution of the optimal control problem ¯ V ( R ) = sup ν sup µ ∈A ν E  Z τ 0 0 e − ρs (1 − µ s ) h ( R ( µ ) s ) ds | R 0 = R  = sup ν sup µ ∈A ν E  Z τ O 0 e − ρs (1 − µ s ) h ( R ( µ ) s ) ds + 1 { τ O < ∞} e − ρτ O 1 +  ρ h ( ∞ )1 n R ( µ ) τ O = ∞ o | R 0 = R  h ( R ) = A + C  1 − 1 ln ( e + R )  τ 0 := inf { t ≥ 0 | R t = 0 } . (5) One approach to ﬁnding ¯ V would be to solv e Equation (5) directly . For example, by the deﬁnition of ¯ V , it follo ws that ¯ V (0) = 0 ¯ V ( ∞ ) = sup ν sup µ ∈A ν E  Z ∞ 0 e − ρs (1 − µ s ) h ( R ( µ ) s ) ds | R 0 = ∞  = 1 +  ρ h ( ∞ ) . (6) Ho wev er , we shall instead apply the principle of optimality , and in doing so arri ve at the following nonlinear Hamilton-Jacobi-Bellman (HJB) boundary value problem (suppressing the e xplicit dependence of R on µ ) 0 = max −  ≤ µ ≤   ( µ − κR ) ∂ V ∂ R + σ 2 2 ∂ 2 V ∂ R 2 + (1 − µ ) h ( R ) − ρV  V (0) = 0 V ( ∞ ) = 1 +  ρ h ( ∞ ) h ( R ) = A + C  1 − 1 ln ( e + R )  = A + C ln  1 + R e  1 + ln  1 + R e  , (7) which simpliﬁes to ρV = h ( R ) − κR ∂ V ∂ R + σ 2 2 ∂ 2 V ∂ R 2 +     ∂ V ∂ R − h ( R )    V (0) = 0 V ( ∞ ) = 1 +  ρ ( A + C ) h ( R ) = A + C ln  1 + R e  1 + ln  1 + R e  . (8) W e therefore instead solv e the HJB problem, which is justiﬁed by the following standard theorem (Corollary IV .5.1 in [5]): Theorem 1. Let V ∈ C 2 ( O ) ∩ C b ( ¯ O ) be a twice-continuously differ entiable and bounded solution to an associated HJB equation of a contr ol pr oblem V PM ( x ) := inf ν inf A ν E h Z τ O 0 e − ρs G ( R µ s , µ s ) ds i (9) 5 for a pr ocess R whose SDE has drift and volatility that ar e Lipsc hitz in µ and R , and a G ( R , µ ) that is continuous and polynomially gr owing for all R ∈ O and continuous for all µ ∈ U . Mor eover , assume either that β > 0 or that τ O < ∞ with pr obability 1 for every admissible pr ogr essively measurable contr ol pr ocess µ . Then V ( x ) = V PM ( x ) for an optimal contr ol u ∗ ( s ) ∈ argmin[ L R [ V ] + G ( R, u )] , wher e L is the gener ator of the pr ocess R . W e no w ha ve our v eriﬁcation theorem: Theorem 2. Let V ∈ C 2 ( O ) ∩ C b ( ¯ O ) be a twice-continuously differ entiable and bounded solution to the boundary value pr oblem Equation (8) . Then V is an optimal solution to the optimal contr ol pr oblem Equation (5) . Pr oof. By Corollary IV .5.1 in [5], since ( i ) the drift and v olatility are (uniformly) Lipschitz in R and µ , ( ii ) ρ > 0 , and ( iii ) we impose an additional boundary condition on the solution as R → ∞ as our indeﬁnite horizon is no w [0 , τ O ) = [0 , τ 0 ) = [0 , ∞ ) , it follo ws directly that our bounded classical solution V = ¯ V and the control µ ∗ ( R ) ∈ argmax µ ∈ [ − , ]  ( µ − κR ) ∂ V ∂ R + σ 2 2 ∂ 2 V ∂ R 2 + (1 − µ ) h ( R ) − ρV  (10) leads to an optimal, stationary Marko v control µ ∗ ( R ( µ ∗ ) s ) . Remark 3. W e note her e that it is sufﬁcient to requir e polynomial gr owth on V , not bounded gr owth. However , as h is bounded above by A + C on O , we can r estrict our attention to bounded V . Furthermor e , an upper bound (and in fact limit as R → ∞ ) for V is the value obtained by a seller earning maximum value A + C on her items on the time interval [0 , ∞ ) while always pr ocessing order s. 3 Market Share Based Pricing Mechanism W e observe that under its current deﬁnition, the reputation R ∈ O . In this section, we employ the mapping Y := f ( R ) = R 1 + R , (11) which results in a normalized market shar e reputation Y ∈ Q := (0 , 1) . This is consistent with prior assumptions [9] that the value function h ( R ) be concav e, and bounded. W e therefore also simplify (3) accordingly , replacing logarithms with a rational function ¯ h ( R ) = A + C R 1 + R = A  1 + C A R 1 + R  . (12) 6 This modiﬁcation, when combined with the mapping from a reputational score to a market share score, R → Y , produces a more intuitiv e gro wth rate, ˜ h ( Y ) which grows linearly with respect to market share reputation. Recall that ∞ is an absorbing state for R ; consequently , we ha ve d Y t ≡ 0 if Y t ≥ 1 . For Y t < 1 , it follo ws from Ito calculus that d Y t = f 0 ( R ) dR t + 1 2 f 00 ( R ) dR t dR t and so d Y t = (1 − Y ) 2   µ − κ Y 1 − Y  dt + σ dB t  + 1 2 ( − 2(1 − Y ) 3 ) σ 2 dt =  µ (1 − Y ) 2 − κY (1 − Y ) − σ 2 (1 − Y ) 3  dt + σ (1 − Y ) 2 dB t ˜ h ( Y ) = A + C Y ˜ V ( y ) = sup ν sup µ ∈A ν E  Z τ 0 0 e − ρs (1 − µ s ) ˜ h ( Y s ) ds | Y 0 = y  . (13) Note that a function that is bounded for market share Y ∈ (0 , 1) is also correspondingly bounded for all reputation R ∈ (0 , ∞ ) . W e therefore study the market share model below , conﬁdent that our results will hold for the full reputation model by virtue of the in v erse map of Equation (11). 3.1 Rescaled HJB Model After market share rescaling, the HJB no w takes a form which is in fact degenerate at both endpoints y = 0 , 1 : − σ 2 2 (1 − y ) 4 V 00 ( y ) + ρV = ˜ h ( y ) −  κy (1 − y ) + σ 2 (1 − y ) 3  V 0 ( y ) +     (1 − y ) 2 V 0 ( y ) − ˜ h ( y )    V (0) = 0 , V (1) = 1 +  ρ ˜ h (1) . (14) In theory , a closed form solution for this boundary v alue problem can be formally constructed piece wise, to the left and right of the special point y ∗ , for which ˜ h ( y ∗ ) = (1 − y ∗ ) 2 V 0 ( y ∗ ) . W e shall refer to y ∗ as the switching point belo w , since it is the reputational value at which the seller switches from advertising to processing. In practice, we shall resort to numerical computation to approximate this solution, and in doing so estimate the v alue y ∗ . Theorem 4. If V ∗∗ ∈ C 2 ( Q ) is a monotonically increasing solution to the HJB pr oblem Equation (14) , then V ∗∗ = ˜ V . Pr oof. Enforcing the boundary condition on V ∗∗ (1) enforces bounded growth on our monotonically in- creasing V ∗∗ . By the in verse map of Equation (11), the result follo ws as a corollary to Theorem 2. Note that Q = (0 , 1) is open, but y = 1 is not reached in ﬁnite time from anywhere in Q and the dif fusion Y is in fact absorbed for all time in state Y = 1 . Hence, the exact v alue of V ∗∗ (1) imposed. 3.2 Reduced Model W ith the previous mapping, note that the market (reputation) share Y is absorbed at Y = 1 , which cor- responds to R → ∞ . Moreov er , as in the Stochastic Nerlove e volution, we have that the probability that Y t < 0 for some positi v e t is greater than 0, and the drift term is a third-order polynomial in Y . Based on 7 these observations, we propose a reduced model for Stochastic Reputation Share , expressed by the following stochastic dif ferential equation and associated stochastic optimal control problem ˇ V ( y ) = sup ν sup µ ∈A ν E  Z τ 0 0 e − ρs (1 − µ s )( A + C Y s ) ds | Y 0 = y  d Y t 1 − Y t = µ dt + σ dB t . (15) This leads to a corresponding nonlinear HJB boundary v alue problem, 0 = max −  ≤ µ ≤   µ (1 − y ) V 0 ( y ) + σ 2 2 (1 − y ) 2 V 00 ( y ) + (1 − µ )( A + C y ) − ρV  V (0) = 0 V (1) = sup ν sup µ ∈A ν E  Z ∞ 0 e − ρs (1 − µ s ) ˜ h (1) ds  = ρ − 1 (1 +  ) ˜ h (1) , (16) which can be recast as − σ 2 2 (1 − y ) 2 V 00 ( y ) + ρV = ˜ h ( y ) +     (1 − y ) V 0 ( y ) − ˜ h ( y )    V (0) = 0 V (1) = 1 +  ρ ˜ h (1) . (17) One of the most attractiv e features of the reduced model (17) is that it has a closed form, piece wise deﬁned analytic solution, which we deri ve in the Appendix. Here we state the follo wing theorem. Theorem 5. There exists a solution V ∗ ∈ C 2 ( Q ) ∩ C b ( ¯ Q ) to the reduced model (17) . Furthermor e , V ∗ = ˇ V , and is given by the piecewise solution V ∗ ( y ) = ( V ` ( y ) , 0 ≤ y ≤ y ∗ V r ( y ) , y ∗ ≤ y ≤ 1 , (18) wher e V ` = c 1 (1 − y ) γ ` − + c 2 (1 − y ) γ ` + + α ` + β ` y (19) V r = c 3 (1 − y ) γ r + + α r + β r y , (20) with constants deﬁned in the Appendix. Pr oof. The construction of this piecewise solution is found in the Appendix. The solution constructed is monotonically increasing in y . Enforcing the boundary condition on V ∗ (1) uniformly bounds V ∗ on Q . The proof of equality V ∗ = ˇ V then follows from Corollary IV .5.1 in [5], as in our Theorem 2 abo ve. Remark 6. Since the r educed model has a known closed form solution, we can measure the err or made in constructing numerical solutions in Section 4, which in turn acts as a benchmark for the code we use to study the full model. It is uncommon to ﬁnd a closed form solution for most optimal contr ol pr oblems. 8 4 Numerical Results In this section, we construct numerical solutions of the market share scaled model Equation(14), as well as our proposed reduced model Equation (8). In Sections A and A.1, the piece wise analytic solution and a nonlinear equation for the reduced switching point y ∗ are found. These will be used to v alidate the numerical results for the reduced model, which act as a benchmark for the full problem. Please note that in what follo ws, the symbol y ∗ is used to denote the switching point in reputational share for both the reduced and full model and their corresponding ODEs. 4.1 Numerical Results f or the Reduced Model The boundary value problem Equation (8) is discretized using a standard ﬁnite difference scheme. Let y k = k / N y , and V k = V ( y k ) , for k = 0 , 1 , . . . , N y . Then we solv e a linear system of N y + 1 equations of the form M v = f , where M v ≈ ρV − σ 2 2 (1 − y ) 2 V 00 , f ≈ h +     (1 − y ) V 0 − h    . Since f depends on V , the solution is implicit, and therefore must be obtained by using a ﬁxed point iteration M v ( k +1) = f ( k ) . In our numerical experiments, we let N y = 1000 , and set the maximum iteration count at K = 20 . Con v er gence is observed in all tested cases. In Figure 1, sev eral numerical solutions V ( y ) are shown, for ﬁxed A = 1 , C = 0 . 15 ,  = 0 . 02 , and various values of ρ = 0 . 1 , 0 . 2 , 0 . 5 , 2 . 0 and σ = 0 . 2 , 0 . 5 , 1 . 0 , 5 . 0 . The values of A and C are chosen via C A = 0 . 15 to reﬂect a maximal 15% reputational premium for sellers abov e the inherent v alue A . It follows that both ρ and σ hav e a dramatic ef fect not only the shape of the solution, but also, as is demonstrated in Figure 2, on the switching point y ∗ . These numerical results suggest that both the discount rate ρ and the v olatility σ of a seller’ s reputation can “dramatically” affect her behavior , particularly the critical reputation at which she switches from the processing to advertising mode. 4.2 Numerical Results f or the Full Model The same numerical discretization is emplo yed to solve the full Nerlo ve-Arro w model Equation (14), where we solve a linear system of the form M v = f , with M v ≈ ρV +  κy (1 − y ) + σ 2 (1 − y ) 3  V 0 ( y ) − σ 2 2 (1 − y ) 4 V 00 ( y ) , f ≈ h +     (1 − y ) 2 V 0 − h    . W e ﬁrst set κ = 0 , and hold all remaining parameters ﬁxed, and plot the results in Figures 3 and 4. The full solutions ha ve more curvature than those of the reduced model, but nonetheless remain monotone, and hav e a single unique switching point y ∗ . The additional curv ature is due to the appearance of V 0 terms in the dif ferential operator , which now depend on σ , as well as κ , which we recall incorporates mean rev ersion. That is, independent of the sellers strategy , reputation will tend to decrease to a smaller amount of the mark et share, with constant rate κ . In Figures 5 and 6, the same solutions are sho wn with κ = 1 . Here it becomes apparent that both the rate of mean re version, as well as the v olatility will af fect the seller’ s optimal strategy . 9 (a) (b) (c) (d) Figure 1: A plot of se veral numerical solutions V ( y ) for the reduced boundary v alue problem Equation (17) are sho wn with A = 1 , C = 0 . 15 , and  = 0 . 02 . ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . Remark 7. Our numerical results illustrate that a wide rang e of seller behaviors can be described by varying the parameters, and therefor e that reputational value is a str ong indicator of optimal seller behavior . 5 Conclusions and Futur e W ork In this work, we hav e proposed a model which describes the behavior of sellers participating in online auctions (markets). Our model is based on the premise that buyer feedback greatly impacts the sales rate, which motiv ates our assumption that the rev enue per sale (or the price per unit) is solely dependent upon seller reputation. W e assumed that a seller has a ﬁxed amount of resources that must be allocated either to 10 (a) (b) (c) (d) Figure 2: A plot of the quantity (1 − y ) V 0 ( y ) , where V ( y ) is a numerical solution of the the reduced boundary v alue problem Equation (17). The v alue y ∗ is deﬁned as the intersection of these curv es with h ( y ) (the solid line). ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . (i) advertising to new buyers or (ii) processing orders for current customers. In doing so, we were able to design an optimal selling strategy , wherein the seller switches their behavior when an optimal market share reputation is reached, which depends on statistical modeling parameters. These modeling parameters were introduced through the wealth-reputation mechanisms in [10], which hav e been generalized to stochastic settings in [16, 13, 12]. W e hav e additionally modiﬁed the empirical model h ( R ) found in [9], relating the price per unit to the reputation. Rather than viewing reputation R as an unbounded quantity , we instead introduced a mark et scaled reputation Y = R / (1 + R ) , where Y ∈ [0 , 1) , which reduces the price per unit to a simple linear model h ( Y ) = A + C Y . 11 (a) (b) (c) (d) Figure 3: A plot of sev eral numerical solutions V ( y ) for the full boundary v alue problem Equation (14) are sho wn with A = 1 , C = 0 . 15 ,  = 0 . 02 , and κ = 0 . ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . W e then optimized the stochastic model over the control of the excess rate µ using the Hamilton-Jacobi- Bellman equation, yielding a deterministic equation relating the v alue per unit to the seller reputation. The resulting boundary value problem is then solved numerically , and we qualitati vely see that the value of sellers goods increase monotonically with reputation, b ut that a unique optimal reputation y = y ∗ determines when the seller should switch from advertising to processing. The numerical scheme was validated with a reduced model, which has a closed form piece wise analytic solution, and permits direct determination of y ∗ . The numerical results validate our modeling assumptions, and provide a frame work for studying seller behavior based on seller reputation. Although the used techniques are standard, we believe that the optimal strategy presented both analytically and numerically has implications on the way reputational information can be used to predict the behavior of an indi vidual seller in an online mark et. 12 (a) (b) (c) (d) Figure 4: A plot of the quantity (1 − y ) 2 V 0 , where V is a numerical solution of the the full boundary value problem Equation (14). The value y ∗ is deﬁned as the intersection of these curves with h ( y ) (black solid line). ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . This w ork can naturally be generalized in a v ariety of ways. F or instance, the model could be de veloped in the case of ﬁnite horizon time T , since a more realistic assumption is that a seller strate gy depends on both time and the current reputation state. Unlike our current inﬁnite horizon model, the resulting HJB equation will lead to a time-dependent partial dif ferential equation, which will be the subject of future work. 13 (a) (b) (c) (d) Figure 5: The same as Figure 3, but with κ = 1 . ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . A Piecewise Analytic Solution of Reduced Model W e now construct the exact solution to the reduced HJB boundary value problem Equation (8), which incidentally can be found analytically . The principal dif ﬁculty that must be overcome is the appearance of an absolute value, which contains the unknown variable V 0 . W e therefore consider a piecewise deﬁned solution V ( y ) , which remains C 2 across the switching point y ∗ , deﬁned by the vanishing of the e xpression in the absolute v alue (1 − y ∗ ) V 0 ( y ∗ ) = ˜ h ( y ∗ ) . (21) 14 (a) (b) (c) (d) Figure 6: The same as Figure 4, but with κ = 1 . ( a ) ρ = 0 . 1 ; ( b ) ρ = 0 . 2 ; ( c ) ρ = 0 . 5 ; ( d ) ρ = 2 . 0 . W e now separately consider the re gions y < y ∗ and y > y ∗ in the reduced model Equation (8), and deﬁne the corresponding linear dif ferential operators L ± as L ± [ V ] =  − σ 2 2 (1 − y ) 2 d 2 dy 2 ±  (1 − y ) d dy + ρ  V . (22) The boundary v alue problem can then be decomposed into two smaller problems for V ` and V r . The bound- ary conditions pro vide one condition for each of V ` and V r . The remaining tw o conditions are pro vided by enforcing C 2 smoothness of the solution at the switching point Equation (21). Hence we now formulate two 15 (well-posed) boundary v alue problems: L − [ V ` ] = (1 −  ) ˜ h ( y ) 0 < y < y ∗ (23) V ` (0) = 0 , (1 − y ∗ ) V 0 ` ( y ∗ ) = ˜ h ( y ∗ ) (24) and L + [ V r ] = (1 +  ) ˜ h ( y ) y ∗ < y < 1 (25) V r (1) = 1 +  ρ ˜ h (1) , (1 − y ∗ ) V 0 r ( y ∗ ) = ˜ h ( y ∗ ) . (26) Once solved, enforcing continuity uniquely deﬁnes the v alue of the switching point y ∗ : V ` ( y ∗ ) = V r ( y ∗ ) . (27) A.1 Constructing the Solution W e shall construct the solutions V ` and V r separately ﬁrst, although the approach for both solutions will be the same. Following standard methods, we ﬁrst decompose the full solution into a homogeneous and particular solution, where the homogeneous part satisﬁes L ± [ u ] = 0 . Since the operators L ± are equi- dimensional, we seek solutions of the form u = c (1 − y ) γ , (28) and the application of the dif ferential operator yields L ± [ u ] = c (1 − y ) γ  − σ 2 2 ( γ 2 − γ ) ∓ γ + ρ  = 0 . (29) W e set the parenthetical term of this expression to zero in order to solv e for the admissible e xponents γ γ ` ± = 1 2 +  σ 2 ± s  1 2 +  σ 2  2 + 2 ρ σ 2 (from L − ) (30) γ r ± = 1 2 −  σ 2 ± s  1 2 −  σ 2  2 + 2 ρ σ 2 (from L + ) . (31) Because γ r − < 0 for ρ > 0 , the homogeneous solution (1 − y ) γ r − will be unbounded as y → 1 , and so we exclude it from further consideration below . Furthermore, since ˜ h ( y ) = A + C y is a linear function, the particular solution will also be linear . W e therefore ha ve a general solution of the form V ` = c 1 (1 − y ) γ ` − + c 2 (1 − y ) γ ` + + α ` + β ` y (32) V r = c 3 (1 − y ) γ r + + α r + β r y . (33) The particular solution is ﬁxed by enforcing that the dif ferential Equations (23) and (25) are satisﬁed, which yields α ` =  1 −  ρ  ( A + C ) −  1 −  ρ +   C, β ` =  1 −  ρ +   C (34) α r =  1 +  ρ  ( A + C ) −  1 +  ρ −   C, β r =  1 +  ρ −   C . (35) 16 The homogeneous coef ﬁcients c 1 and c 2 are then found by applying the boundary conditions (24), and the resulting linear system  1 1 γ ` − (1 − y ∗ ) γ ` − γ ` + (1 − y ∗ ) γ ` +   c 1 c 2  =  − α ` β ` (1 − y ∗ ) − h ( y ∗ )  is solved by c 1 = − α ` γ ` + (1 − y ∗ ) γ ` + + β ` (1 − y ∗ ) − ˜ h ( y ∗ ) γ ` + (1 − y ∗ ) γ ` + − γ ` − (1 − y ∗ ) γ ` − (36) c 2 = β ` (1 − y ∗ ) − ˜ h ( y ∗ ) + α ` γ ` − (1 − y ∗ ) γ ` − γ ` + (1 − y ∗ ) γ ` + − γ ` − (1 − y ∗ ) γ ` − . (37) The boundary condition Equation (26) at y = 1 is automatically satisﬁed by the particular solution, and so the ﬁnal coef ﬁcient c 3 is determined by the condition at y = y ∗ , c 3 = β r (1 − y ∗ ) − ˜ h ( y ∗ ) γ r + (1 − y ∗ ) γ r + . (38) Finally , ha ving determined the general solutions V ` and V r , the solution V is obtained by enforcing continu- ity . This also ﬁxes the v alue y ∗ , which must no w satisfy the nonlinear transcendental equation c 1 (1 − y ∗ ) γ ` − + c 2 (1 − y ∗ ) γ ` + + α ` + β ` y ∗ = c 3 (1 − y ∗ ) γ r + + α r + β r y ∗ . (39) By expanding the e xpression, it follo ws that 0 =  (1 − y ∗ ) ( γ ` + − γ ` − ) − 1   β ` (1 − y ∗ ) − ˜ h ( y ∗ )  − α ` ( γ ` + − γ ` − )(1 − y ∗ ) γ ` + +  γ ` + (1 − y ∗ ) ( γ ` + − γ ` − ) − γ ` −  α ` − α r + ( β ` − β r ) y ∗ − β r (1 − y ∗ ) − ˜ h ( y ∗ ) γ r + ! . (40) A.2 Comparison with µ ≡ 0 If the seller decides to take no action to enhance her reputation and long term wealth growth by letting µ ≡ 0 , then her v alue function solves − σ 2 2 (1 − y ) 2 V 00 0 ( y ) + ρV 0 ( y ) = ˜ h ( y ) 0 < y < 1 V 0 (0) = 0 V 0 (1) = 1 ρ ˜ h (1) , (41) which leads to a solution V 0 ( y ) = ˜ h ( y ) ρ − C ρ (1 − y ) 1 2 + q 1 4 + 2 ρ σ 2 . (42) It follo ws that for V which solves (3 . 15) that A + C ρ  = V (1) − V 0 (1) ≤ sup 0 ≤ y ≤ 1    V ( y ) − V 0 ( y )    . (43) Hence, she enhances her ov erall expected wealth by order  by pulsing strategies. 17 Acknowledgments The authors thank Andre w Christlieb, Nir Gavish, Song Y ao, John Chadam, and Stev en Shre ve for useful discussions, as well as anonymous revie wers whose suggestions greatly improved this paper . Part of this work was done while Milan Bradonji ´ c was at UCLA and LANL. Refer ences [1] A P T , K . R . , A N D M A R K A K I S , E . Optimal strategies in sequential bidding. In Pr oceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - V olume 2 (Richland, SC, 2009), AAMAS ’09, International Foundation for Autonomous Agents and Multiagent Systems, pp. 1189–1190. [2] B E A M , C . , A N D S E G E V , A . Auctions on the internet: A ﬁeld study , 1998. W orking Paper 98-WP1032. Fisher Center for Management and Information T echnology , Haas School of Business, University of California, Berkele y . [3] D E L L A RO C A S , C . Reputation mechanism design in online trading en vironments with pure moral hazard. Information Systems Resear c h 16 , 2 (2005), 209–230. [4] D U C K W O RT H , K . , A N D Z E RVO S , M . A model for in v estment decisions with switching costs. Ann. Appl. Pr obab . 11 , 1 (02 2001), 239–260. [5] F L E M I N G , W . , A N D S O N E R , H . M . Contr olled Marko v pr ocesses and viscosity solutions . Stochastic modelling and applied probability . Springer , Ne w Y ork, 2006. [6] F L E M I N G , W . H . , A N D R I S H E L , R . W . Deterministic and stochastic optimal contr ol . Springer-V erlag, Berlin and Ne w Y ork, 1975. [7] H O U S E R , D . , A N D W O O D E R S , J . Reputation in auctions: Theory , and evidence from ebay . Journal of Economics & Manag ement Strate gy 15 , 2 (2006), 353–369. [8] L E V I T I N , D . J . Extracted fr om The Or ganized Mind: Thinking Straight in the Age of Information Overload . Dutton Penguin Random House, 2014. [9] M I N K , M . , A N D S E I F E RT , S . Reputation on ebay and its impact on sales prices. In Gr oup Decision and Ne gotiation International Confer ence (June 2006), pp. 253–255. [10] N E R L O V E , M . , A N D A R R O W , K . J . Optimal advertising policy under dynamic conditions. Economica 29 (1962), 129–142. [11] P A T R I C K B A JA R I , A . H . Economic insights from internet auctions. J ournal of Economic Literatur e 42 , 2 (2004), 457–486. [12] R A M A N , K . Boundary v alue problems in stochastic optimal control of advertising. A utomatica 42 (August 2006), 1357–1362. [13] R AO , R . C . Estimating continuous time advertising-sales models. Marketing Science 5 , 2 (1986), 125–142. 18 [14] R E S N I C K , P . , A N D Z E C K H A U S E R , R . T rust among strangers in internet transactions: Empirical analysis of ebay’ s reputation system. In The Economics of the Internet and E-Commer ce , M. R. Baye, Ed., vol. 11. Else vier Science, 2002, pp. 127–157. [15] R E S N I C K , P . , Z E C K H AU S E R , R . , S W A N S O N , J . , A N D L O C K W O O D , K . The value of reputation on ebay: A controlled experiment. Experimental Economics 9 , 2 (2006), 79–101. [16] S E T H I , S . P . Deterministic and stochastic optimization of a dynamic advertising model. Optimal Contr ol Applications and Methods 4 (1983), 179–184. 19

Stochastic Optimal Control for Online Seller under Reputational Mechanisms

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment