Social Big Data Analytics of Consumer Choices: A Two Sided Online Platform Perspective

SOCIAL BIG DATA ANALYTICS OF CONSUMER CHOICES: A TWO SIDED ONLINE PLATFORM PERSPECTIVE by Meisam Hejazi Nia APPROVED BY SUPERVISORY COMMITTEE: ___________________________________________ Brian T. Ratchford, Chair ___________________________________________ Ozalp Ozer, Co-Chair ___________________________________________ Dmitri Kuksov ___________________________________________ Ahmet Serdar Simsek Copyright 2016 Meisam Hejazi Nia All Rights Reserved Dedicated to my kind parents SOCIAL BIG DATA ANALYTICS OF CONSUMER CHOICES: A TWO SIDED ONLINE PLATFORM PERSPECTIVE by MEISAM HEJAZI NIA, MS DISSERTATION Presented to the Facult y of The University of Texas at Dallas in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY IN MANAGEMENT SCIENCE THE UNIVERSITY OF TEXAS AT DALLAS August 2016 v A C K N O W L E D G E M E N T S Thanks t o m y disse rtation committee fo r th eir support, guidance, and dedication. Dr. Ratchford and Dr. Ozer, thanks for yo ur helpful advice and t rust, and for always believing in me; Dr. Simsek thanks f or your helpful advice, trust, immense patience, and guida nce. D r. Kuksov, thanks for your support and suggestions. Thanks Mo m, Dad, and siblings, fo r al ways believ ing in me and st aying in touch. This wo rk was not possib le without sacrificing four years of my life th at I could hav e spent wit h you. It seems like yesterday when I left my country pursuing a dream. I thank God that today it is a realit y. April 2016 vi SOCIAL BIG DATA ANALYTICS OF CONSUMER CHOICES: A TWO SIDED ONLINE PLATFORM PERSPECTIVE Publication No. ___________________ Meisam Hejazi Nia, PhD The Universit y of Texas at Dallas, 2016 A B S T R A C T Supervising Professor: Brian T. Ratchford, Chair Ozalp Ozer, Co-Chair This dissertation ex amines three distinct big data anal y tics problems r elated to th e social aspects of consumers’ choices. The main goal o f this line of research is to help two s ided platform firms to targ et their marketin g policies g iven the g reat heterogeneity among their customers. In t hree essays, I combined structural modeling and machine learning approaches to fi rst understand customers’ responses to int rinsic and extrinsic factors, using u nique data s ets I s craped from th e web, and then explore methods to optimize two sid ed platforms’ firms’ reactions accordingly. The first essay examines “social learning” in the mobile app store context, co ntrolling for intrinsic value of hedonic and utilitarian mobile apps, price, advertising, and number of options available. The proposed model ex tracted a soc ial influ ence proxy measure from a macro diffusion model using an unscented K alman filter, and it incorpor ated this social influence measure in a mixed logit choice model wi th hierarchical Dirichl et Process prior. Results su ggest vii significant e ffects of so cial influence, whic h underscores the importance of choosin g different marketing p olicies for pervasive goods. The co mparison of mob ile app adoption parameters suggests that among seve ral classical goods mobile app adoption pattern is v ery similar to that of music CDs. T he simulation counterfactual anal ysis suggests t hat early targeted viral marketing policy might be an optimal strategy for the app-store platforms. The second essay inv estigates bidders’ anticipated winner and loser regret in the co ntext of the eBay online auction platform . I d eveloped a structural model that account s for bidders’ learning and their anticipation of winner and loser re grets in an auction platfor m. Winner and loser regrets are defined as regretting f or pa ying too much in c ase of winnin g an au ction and regretti ng for not bidding hi gh enou gh in case of losing it, respectively. Usi ng a large d ata set from eBa y an d empirical Ba yesian estimation method, I quantify th e bidders’ anticipation of re gret in v arious product categories, and in vestigate th e role of experience in ex plaining the bidders’ regret and learning behaviors. The counterfactual an alyses showed that sh utting down the b idder re gret via appropriate notification policies can increase eBay’s revenue b y 24% . The third essay investigates the effects of Gamification incentive m echanisms in a n online platform for use r generated content. I use an ensemble meth od over LDA, mixed nor mal and k- mean clustering methods to s egment users into competitors, collaborat ors, achievers, explorers and un interested users. Then, I develop a state-d ependent choic e model that accommodates t he effect of number of bad ges, the rank in the leaderboard, reputation points, inertia, and reciprocity, and allow f or heterogeneit y b y Dirichlet Process p rior. The results suggested that estimating th e model on small sa mples generate biased esti mates. Furthermore, the y su ggest that the effects o f Gamification elements are heterogeneous, significantly positive or negative for viii different users. I found sensitivit y patterns that explain importance of certain Gamification elements for users with certain nationalities. These findin gs help the G amification platform to target its us ers. The s imulation counterfactual analysis su ggests that a two sided platfor m can increase the number of user contributions, by mak ing earning badges more difficult. ix TABLE OF CONTENTS ACKNO WL EDGEMENTS .............................................................................................................v ABSTRACT ............................................................................................................................. ...... vi LIST OF FIGURES ...................................................................................................................... xi i LIST OF TABLES ....................................................................................................................... xiv CHAPTER 1: SOCIAL LEARNING AND DI FFUSION OVER THE PERVASIVE GOODS: An Em pirical Study of an African App-STORE ....................................................................1 1.1. ABSTRACT ...................................................................................................................... ........2 1.2. INTRODUCTION ............................................................................................................ ........2 1.3. LITERATURE REVIEW .............................................................................................. ...........8 1.3.1. Interdependence of consumer preference ................................................................9 1.3.2. Mobile app store dynamics ...................................................................................10 1.3.3. Global Macro and Micro Diffusion and Social Learning .....................................11 1.4. MODEL .................................................................................................................................. 12 1.5. DATA .................................................................................................................................. ...24 1.6. IDENTIFICATION AND ESTIMATION .............................................................................31 1.7. RESULTS ....................................................................................................... ........................38 1.8. COUNTERFACTUAL ANALYSIS ........ ..............................................................................48 1.9. CONCLUSION .......................................................................................................................51 CHAPTER 2: DO BIDDERS ANTICIPATE REGRET DURING AUCTIONS? AN EMPIRICAL STUDY OF AN A FRICAN APP-STORE ............................. .................................52 2.1. ABSTRACT ...................................................................................................................... ......53 x 2.2. INTRODUCTION ............................................................................................................ ......53 2.3. LITERATURE REVIEW .............................................................................................. .........60 2.3.1. Customer Relationship Management of Auction Platforms .. ...............................61 2.3.2. Bounded Rationality, Learning, and Affiliated Value of Bidders.........................63 2.3.3. Emotionally Rational or Regretful Bidders ...........................................................64 2.3.4. Theoretical, Experimental, and Empirical Auctions studies .................................67 2.4. DATA .................................................................................................................................. ...69 2.5. MODEL DEVELOPMENT .................................................................................................... 75 2.5.3. ESTIMATION PROCEDURE ............................................................................................ 85 2.6. RESULTS ....................................................................................................... ........................91 2.7. COUNTERFACTUAL ANALYSIS ......................................................................................99 2.8. ROBUSTNESS CHECKS .................................................................................... ................104 2.9. CONCLUSION .....................................................................................................................107 MCHAPTER 3: MEASURING GAMIFICATION E LEMENTS’ EFF ECTS ON USER CONTENT GENERATION: AN EMPIRICA L STUDY OF STACKOVERFLOW’S TWO SIDED PLATFORM’S BIG DATA ............................................................................................109 3.1. ABSTRACT ...................................................................................................................... ....110 3.2. INTRODUCTION ............................................................................................................ ....111 3.3. LITERATURE REVIEW .............................................................................................. .......116 3.3.1. User Generated Content (UGC) ............................................................................ .117 3.3.2. Gamification mechanisms and rewards in loyalty programs .................................118 3.3.3. Optimal stimuli level and state dependent choice models .....................................120 3.3.4. Behavioral aspects of decision making ..................................................................121 3.4. DATA .................................................................................................................................. .122 3.5. MODEL ................................................................................................................................1 34 3.6. ESTIMATION . .............................................................................................................. .......143 xi 3.7. RESULTS AND MANAGERIAL IMPLICATIONS ..........................................................148 3.8. COUNTERFACTUAL ANALYSIS AND ITS MANAGERIAL IMPLICATIONS ...........162 3.9. CONCLUSION .....................................................................................................................163 APPENDIX ..................................................................................................................................165 APPENDIX 1.A: DIRECT ACYC LIC GRAPH OF COND ITIONAL DISTRIBUTIONS .......165 APPENDIX 1.B: UNSCENTED KALMAN FI LTER ................................................................166 APPENDIX 1.C: CONDITIONAL DISTRIBUTIONS FOR ESTIMATION OF THE MICRO CHOICE MODEL ...................................................................................................................... .167 APPENDIX 1.D: CHOICE PARAMETER ESTIMATES FOR ALTERNATIVE MODELS ...170 APPENDIX 2.A: LATENT DIRICHLET ALLOCAT ION . .......................................................186 APPENDIX 2.B: K-MEANS CLUSTERING .............................................................................194 APPENDIX 2.C: ESTIMATION PROCEDURE ........................................................................196 APPENDIX 2.D: EXTRA TAB L ES FOR THE MA IN AND ALTERNATIVE MODEL (ONLINE COMPANION) .................................................................................................... .......200 APPENDIX 3.A: CONDITIONAL DISTRIBUTIONS FOR ESTIMATION OF THE GAMIFICTION CHOICE MODEL ............................................................................................231 APPENDIX 3.B: EXTRA TABLES FO R THE ALTERNATIVE MODE L ..... .........................233 REFERENCES ............................................................................................................................255 VITA . ....................................................................................................................................... ....270 xii L I S T O F F I G U R E S Figure 1.1. Box and Arrow Representation of the Model............. ................................................24 Figure 1.2. Intercontinental (across 30 cities) Diffusion Curves for the mobile apps within the Categories ..........................................................................................................................27 Figure 1.3. Popularity (market share) of App Categories on Apple Inc. App Store ......................29 Figure 1.4. Free mobile apps versus paid mobile apps ..................................................................30 Figure 1.5. 1-Step-ahead Forecast for Local Diffusio n (Green Line: a step ahead; Red line: the actual) ........................................................................................................................... ......39 Figure 1.6. 1-Step-ahead Forecast for Global Diffusion level (Green line: a step ahead, Red line: the actual) ...........................................................................................................................39 Figure 1.7. PARAMETER DISTRIBUTION: Hetero geneity in Individual Choice (Local Adopters)....................................................................................................... .....................46 Figure 1.8. COUNTERFACTUAL ANALYSIS: an optimal social influence strategy to increase expected adoption level b y 14% (log scale) ......................................................................50 Figure 2.1. Evolution of Bids in six sample auctions ....................................................................72 Figure 2.2. Evolution of number of participating bidders in six sample auctions ............. ............73 Figure 2.3. Histogram of regret and valuation evolution parameters across bidder segments .. ....95 Figure 2.4. Counterfactual analysis of shutting down winner regret (blue line the optimal biddin g when regret is shut down, and red line the observed) ............................................... .......101 Figure 3.1. Contribution of a sample of four users over time ............................................... .......128 Figure 3.2. Evolution of Expl anatory V ariables Over time (Median shown in B lack, and the interval between 25% and 75% interval is colored gray) .. ..............................................132 xiii Figure 3.3. Evolution of Total number of Gold, Silver, and Bronze Badges Granted to Users in the Sample ........................................................................................................................1 33 Figure 3.4. Box and Arrow Model of State Dependent Util ity of a user to contribute................138 Figure 3.5. HISTOGRAM OF PARAMETER ESTIMATES: Individual C hoice parameters....150 Figure 1.A.1. Probabilistic graphical model of custo mer mobile app choices under social influence ....................................................................................................................... ....165 Figure 1.D.1. PARAMETER DISTRIBUTION: Heterogeneity in Individu al Choice (Local Imitators) ..........................................................................................................................182 Figure 1.D.2. PARAMETER DISTRIBUTION: Heterogeneity in Individu al Choice (Global Imitators) ..........................................................................................................................183 Figure 1.D.3. PARAMETER DISTRIBUTION: Heterogeneity in Individu al Choice (Global Adopters)....................................................................................................... ...................184 Figure 1.D.4. PARAMETER DISTRIBUTION: Heterogeneity in Individu al Choice (No social influence) .........................................................................................................................185 Figure 2.A.1. Graphical model representation of LDA ..... ..........................................................188 Figure 2.B.1. Within groups sum of square based o n number of clusters in K-Means algorithm for bidders .. ......................................................................................................................194 Figure 2.B.2. Within groups’ sum of square based on number of clusters in K -Means algorithm for auctions.................................................................................................... ...................195 Figure 2.D.1. The probabilis tic graphical plate model of the main mo del ..................................209 Figure 2.D.2. Histogram of regret and valuation evolutio n parameters across bidder segments 224 Figure 2.D.3. Counterfactual anal y sis of shutting d own winner regret (blue line the optimal bidding when regret is shut down, and red line the observed) .........................................225 Figure 2.D.4. Histogram of winner regret parameter distribution across item cate gories ...........226 Figure 2.D.5. Histogram of loser regret parameter dis tribution across item categories . .............227 Figure 2.D.6. Histogram of distributi on of learning parame ter distributio n across categorie ....228 Figure 2.D.7. Histogram of valuation revalation parameter distribution acro ss categories ........229 xiv L I S T O F T A B L E S Table 1.1. Literature Position of this stud y ........................... ...........................................................9 Table 1.2. Model Variable Definitions ............................................. .............................................22 Table 1.3. Categories Basic Statistics ............................................................................................26 Table 1.4. Mobile app categories basic statistics ...........................................................................28 Table 1.5. MODEL COMPARISON .............................................................................................38 Table 1.6. Performance of the Proposed Model for local and international c ategory adoption ....39 Table 1.7. Factor Loading Matrix (Varimax rotation) ...................................................................40 Table 1.8. Factor Names . ............................................................................................................... 40 Table 1.9. PARAMETER ESTIMATES: Global Ad option ..........................................................41 Table 1.10. PARAMETER ESTIMATES: Local Adoption ................... .......................................42 Table 1.11. PARAMETER ESTIMATES: Individual Choice effect (Local Adopt ers) ................44 Table 1.12. PARAMETER ESTIMATES: Individual Choice Hierarchical Model (Local Adopters): CustomerTenure (number of days since registeration on the app-store) explanation of the effects ...................................................................................................45 Table 1.13. PARAMETER ESTIMATES: Individual Choice effect .. ..........................................47 Table 1.14. COUNTERFACTUAL ANALYSIS: Ch ange in the adoption level by intervening social influence ..................................................................................................................49 Table 1.15. COUNTERFACTUAL ANALYSIS: Ex plain optimal social influence improvement with popularity rank of the app category on the ap p-store ................................................49 Table 2.1. A sample bid sequence on an eBay auction with $25 reserva tion bid and $1 minimum increment....................................................................................................... .....................70 xv Table 2.2. Auction categories in the eBay data .............................................................................71 Table 2.3. Sample auctions in the eBay data .................................... .............................................72 Table 2.4. Summar y statistics of the average bidder characteristics within each of 19 auction categories ...........................................................................................................................74 Table 2. 5. Notation .................................................................................................................... ...77 Table 2.6. Summar y statistics of the average bidder characteristics within each of 47 bidder segments ............................................................................................................................. 92 Table 2.7. Summary statistics for the bidder specific parameter estimations . ..............................93 Table 2.8. t-Test: Paired Two Sample for Means ..........................................................................94 Table 2.9. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments ..96 Table 2.10. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments ..96 Table 2.11. Explaining winner regret i α , the loser regret i β , the update of valuation parameters i δ and the learning parameter i ρ estimates across 47 bidder segments ............................97 Table 2.12. Summary statistics for the auction specific para meter estimations ............. ...............99 Table 2.13. Counterfactual analysis of shutting down winner and both winner/lo ser regret ......102 Table 2.14. Counterfactual revenue improvements exp lained by the characteristics of bidder on each auction bidder category ...........................................................................................104 Table 3.1. The relevant streams of litrature in five clusters .........................................................117 Table 3.2. Sample set of Bandges in different knowledge domains (tags) ............... ...................124 Table 3.3. Type of activity, description and inclusion in all variable ........... ...............................125 Table 3.4. Sample Observations’ statistics ............................................. .....................................126 Table 3.5. Sample Observations’ basic statistics .........................................................................129 xvi Table 3.6. Adjusted Random Index measure for cl ustering agreements .....................................129 Table 3.7. Gamification Segment Names ......................................... ...........................................130 Table 3.8. Gamification Segment Names (heat map configured at row level) ............................131 Table 3.9. Utility model Variables Definition ........................................ .....................................142 Table 3.10. MODEL COMPARISON .........................................................................................149 Table 3.11. PARAMETER ESTIMATES: Individual Content Co ntribution Choice explained by whole information set (Sample with 10K explained) ....................... ...............................151 Table 3.12. PARAMETER ESTIMATES: Individual Choice effect significance ......................152 Table 3.13. PARAMETER ESTIMATES: Individual Choice Hierarchical Model ................ ....156 Table 3.14. Counterfactual Anal y s is Result . ...............................................................................163 Table 1.D.1. PARAMETER ESTIMATES: Indivi dual Choice effect (Local imitators) ............170 Table 1.D.2. PARAMETER ESTIMATES: Indivi dual Choice effect (Global imitators)...........171 Table 1.D.3. PARAMETER ESTIMATES: Indivi dual Choice effect (Global Adopters) . .........172 Table 1.D.4. PARAMETER ESTIMATES: Indivi dual Choice effect (No social influence)......173 Table 1.D.5. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (Local imitators): Tenure explanation of the effects ...................................................................174 Table 1.D.6. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (Global imitators): Tenure explanation of the effects ...................................................................175 Table 1.D.7. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (Global Adopters): Tenure explanation of the effects ...................................................................176 Table 1.D.8. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (No social influence): Tenure explanation of the effects ............................................ ......................177 Table 1.D.9. PARAMETER ESTIMATES: Indivi dual Choice effect ........................................178 Table 1.D.10. PARAMETER ESTIMATES: Individu al Choice effect ......................................179 Table 1.D.11. PARAMETER ESTIMATES: Individu al Choice effect ......................................180 xvii Table 1.D.12. PARAMETER ESTIMATES: Individu al Choice effect ......................................181 Table 2.B.1. Cluster center comparison between k-mean and mixture normal clustering ..........195 Table 2.D.1. Bidder’s characteristics within each auction categor y ............................................200 Table 2.D.2. Maximum A Posteriori of the model ................................. .....................................201 Table 2.D.3. Bidder’s segment profile after mixture normal clusterin g . .....................................202 Table 2.D.4. The winner regret i α estimates across bidder’s segments .......................................203 Table 2.D.5. The loser regret i β estimates across bidder’s segments ..........................................204 Table 2.D.6. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments.......................................................................................... ...................205 Table 2.D.7.The winner regret i α and the loser regret i β estimates across auction categories ....206 Table 2.D.8. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments.......................................................................................... ...................207 Table 2.D.9. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments ...................................................................................................208 Table 2.D.10. Summar y statisti cs for the bidder specific parameter estimations ........................209 Table 2.D.11. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments 210 Table 2.D.12. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across 47 bidder segments .............210 Table 2.D.13. Explaining winner regret i α , the loser regret i β , the update of valuation parameters i δ and the learning parameter i ρ estimates across 47 bidder segments ..........................211 xviii Table 2.D.14. Summar y statisti cs for the auc tion s pecific parameter estimations ......................212 Table 2.D.15. Counterfactual anal y si s of shuttin g down only winner and both winner/loser regret213 Table 2.D.16. Auction’s Cluster profile . ......................................................................................214 Table 2.D.17. Maximum A Posteriori of the model ....................................................................215 Table 2.D.18. The winner regret i α estimates across bidder’s segments .....................................216 Table 2.D.19. The loser regret i β estimates across bidder’s segments ........................................217 Table 2.D.20. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments.......................................................................................... ...................218 Table 2.D.21. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments ...................................................................................................219 Table 2.D.22. The winner regret i α and the loser regret i β estimates across auction categories .221 Table 2.D.23. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments.......................................................................................... ...................222 Table 2.D.24. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments ...................................................................................................223 Table 2.D.25. Bidder’s segment profile (based on k-means approach) ..... ..................................230 Table 3.B.1. PARAMETER ESTIMATES: Individual Ch oice effect (10K sample size with model that explains parameters with fixed variables at Hierarchy) .. ...............................233 Table 3.B.2. PARAMETER ESTIMATES: Individual Ch oice effect (10K sample size with model that explains parameters with fixed variables at Hierarchy) .. ...............................234 Table 3.B.3. PARAMETER ESTIMATES: Individual Ch oice effect (5K sample size with model that explains parameters with all variables at Hierarchy) ................................................235 xix Table 3.B. 4. PARAMETER ESTIMATES: Individual Cho ice effect (5K sample size with model that explains parameters with all variables at Hierarchy) .....................................236 Table 3.B.5. PARAMETER ESTIMATES: Individual Ch oice effect (5K sample size with model that explains parameters with full variables at Hierarchy) ........................ ......................237 Table 3.B.6. PARAMETER ESTIMATES: Individual Ch oice effect.........................................238 Table 3.B.7.PARAMETER ESTIMATES: Individual C hoice effect (1K size for k-mean stratified sample with model that explains parameters with fixed variables) ..................239 Table 3.B.8. PARAMETER ESTIMATES: Individual Ch oice effect (1K size for k-mean stratified sample with model that explains parameters with fixed variables) ..................240 Table 3.B.9. PARAMETER ESTIMATES: Individual Ch oice effect (1K size for LDA stratified sample with model that explains parameters with fix ed variables at Hierarchy) ............241 Table 3.B.10. PARAMETER ESTIMATES: Individual C hoice effect (1K size for LDA stratified sample with model that explains parameters with fix ed variables at Hierarchy) ............242 Table 3.B.11. PARAMETER ESTIMATES: Individual C hoice effect (1K size for Uniform stratified sample with model that explains parameters with fixed variables) ..................243 Table 3.B.12. PARAMETER ESTIMATES: Individual C hoice effect (1K size for Uniform stratified sample with model that explains parameters with fixed variables) ..................244 Table 3.B.13. PARAMETER ESTIMATES: Individual C hoice effect (1K size for mixed-normal stratified sample with model that explains parameters with fixed variables at Hierarch y )245 Table 3.B.14. PARAMETER ESTIMATES: Individual C hoice effect (1K size for mixed-normal stratified sample with model that explains parameters with fixed variables) ..................246 Table 3.B.15. PARAMETER ESTIMATES: Individual C hoice effect (1K size for mixed-normal stratified sample with model that explains parameters with full variables) .....................247 Table 3.B.16. PARAMETER ESTIMATES: Individual C hoice effect (1K size for mixed-normal stratified sample with model that explains parameters with full variables) .....................248 Table 3.B.17. PARAMETER ESTIMATES: Individual C hoice effect (1K size for k-mean stratified sample with model that explains parameters with full variables ......................249 Table 3.B.18. PARAMETER ESTIMATES: Individual C hoice effect (1K size for k-mean stratified sample with model that explains parameters with full variables) .....................250 Table 3.B.19. PARAMETER ESTIMATES: Individual C hoice effect (1K size for LDA stratified sample with model that explains parameters with full variables at Hierarch y) ...............251 xx Table 3.B.20.PARAMETER ESTIMATES: Individual Cho ice effect (1K size for LDA stratified sample with model that explains parameters with full variables at Hierarch y) ...............252 Table 3.B.21. PARAMETER ESTIMATES: Individual C hoice effect (1K size for Uniform stratified sample with model that explains parameters with full variables) .....................253 Table 3.B.22. PARAMETER ESTIMATES: Individual C hoice effect (1K size for Uniform stratified sample with model that explains parameters with full variables) .....................254 1 CHAPTER 1 SOCIAL LEARNING AND DIFFUSION OVER THE PERVASIVE GOODS: AN EMPIRICAL STUDY OF AN AFRICAN APP-STORE Meisam Hejazi Nia Naveen Jindal School of Management, Department of Marketing, SM32 The University of Texas at Dallas 800 W. Campbell Road Richardson, TX, 75080-3021 2 1.1. ABSTRACT I developed a structural model that co mbines a macro diffusion model with a micro choice model to control for so cial infl uence on the mobile app choices of custo mers o ver app-stores. Social influence i s measured by the densi t y of adopters within the proximi ty of the customers. Usi ng a large d ata set from an African app-store and Ba yesian esti mation methods, I q uantif y the effect of social influence on customer cho ices over the app-store, and inv esti gate the effect of ignoring this process in estimating customer choices. I find that cu stomer ch oices on the app-store are explained better by off-line density rather than o nline density of adopters, and ignoring social influence in estim ation results in biased estimates. Furthermore, my results showed that t he mobile app-adoption process is very similar to ad option of music CDs, among all other classical goods. My counterfactual anal y sis showed that the app-store c an increase its revenue b y 13.6% through the viral marketing policy (e.g., sharing with friends and famil y button). Keywords: mobile app-store, s ocial learning, state sp ace model, st ructural model, semi parametric Ba yesian, M CEM, un scented Kalman fi lter, hiera rchical mix ture model, genetic optimization. 1.2. INTRODUCTION Smartphones pervade th e global telecommunication marke t to such an extent that, for example, in the US a consu mer h as the option to adopt a smartphone handset on a postpaid contract, no matter which mobile ope rator (e.g., T -Mobile, Ve rizon, AT&T, or Spr int) the consumer selects. The smartphone handsets and the mobile apps are complements. A mobile app-store (e.g., Google play , Apple and Microsoft’s app stores) acts as a two si ded platform th at matches 3 consumers to t he mobile app publishers/developers. The mobile app platform revenue comes from two sources: sellin g the paid apps, or advertisin g on t he freemium apps. As a result, for t he app-store platform, the consumers’ adoption of the mobile apps represents a critical problem. The app-store platfo rm has a lot o f i nformation about the con sumers’ download behavior, enabling it to customize its mar keting actions to target different consumers’ b ased on their different behaviors. For example a mobile app pl atform sho uld de cide bet ween the fr ee tri al and the viral re ferral st rategies. A viral referral strate gy can be us eful if consumers’ preferences are interrelated, because of the psychological benefits of social i dentifications/learning/inclusion or the util itarian ben efits of the network externa lities. However, a trial strategy is useful if consumers’ have a learning cost or an uncertaint y about the mobile apps. It is n ot unco mmon for customers to have inter related preference for mobile apps. Online forums are filled with questions about requests for mobile app recommendation, 1 and, in fact, app-stores try to inform us ers about the popularity of mobile apps. The interdependence of mobile app choices is not on ly relevant f or online world, but also for offline world. It is h ard for cust omers to know wh at m obile app they want, so they fi nd new mobile ap ps from famil y f riends and colleagues. App-stores h ave t ried to facilit ate th is p rocess b y creatin g “Tell a Fr iend” and “Share This Application” 2 . Therefore, an app -store pl atform needs a fra mework to quantify not only the effect o f mobile app characteristics, but also the effect of online and off lin e s ocial influence on customer choices to design policies to affect mobile app choices of its customers. 1 "M obile Applications Forum." CNET. Acc essed April 02, 2016 . http://www.cn et.com/fo rums/mobile-a pps/. 2 Wonder HowTo. " How to Share You r Favorite M obile Apps wit h Your Friends." Business I nsider. Ju ne 16, 2011. Access ed April 02, 2016 . http://www.busin essinsider .com/how-to-sh are-your-favorite- mobile-ap ps-with-your- friends-20 11-6 4 Given this context, I asked the following qu estions: (1) How can I desi gn a targeting approach for an app-store platfor m? (2) How does the so cial learning p rocess of the mobile apps’ customers d iffer fro m th at of th e classical economy p roducts, s uch as a color TV? (3) How can an app-store platfo rm capture th e heterogeneit y o f its customers and th e variation in the mobile apps to customize it s marketing actions? (4) What are t he key elements of the consumers’ u tilit y of adopting a mobile app that allows an app-store platform to group an d target its pot ential customers? To answer these questi ons, I combined a macro social learning diffusio n model with a m icro choice model. I used a choi ce model to stud y the adoption behavior o f the consumers. T o control for social influence, I applied a filtering technique (i.e., Unscented Kalman Filter) on another aggregated data s et to create a time var ying measure of social influence. Also, to control for mobile ap p characteristics, price, and advertising, I used a factor model. I ran the filtering technique on t wo aggregate adoption d ata sets fo r approximately t wo hundred da ys. These data sets in clude, on the one hand, the cumulative number o f adopters wi thin a local city in Africa, and on the other hand, th e cumulative number of adopters across all thirty cities in wh ich th e platform under t he curre nt stud y globally operate s. I r efer to t hese two data sets as the a ggregate data sets from now on. I ran the choi ce model an d the factor model on a data set of a sample of choices of one hundred f orty seven consumers over twent y w eeks. I re fer to th is sample as micro sample. I used a social learning diffusion model of Van den Bulte and Joshi (2007) to mode l the simultaneous d iffusion of the mobile apps on the app store. Such a model ing approach presents two challenges. The fir st concerns mobile app consumers’ choice sparse data, because t he 5 download of a mobile app is a rare event. To add ress this challenge, I ag gregated the d ata at an app-category l evel. The second challen ge involves dealing wit h t he possib le measurement error. For this pu rpose, I cast the Van den Bult e and Jo shi (2007) model into a dis crete time state sp ace model. Th e use o f Gaussian Pro cess t o filter t he measurement error is q uite common in online mission critical s ystems su ch as robotics as well. I n this case, I h ad to fil ter two d ouble-degree polynomial differential e quations of each mobile app category’s diffusion. I used an Unscented Klaman Filter (UK F), an approach introduced to Machine Learnin g and Rob otics to estimate t he non-linear di ffusion equ ation up to third order p recision (Julier and Uhlman 1997; Wan and van der Merve 2001). This approac h is an alternative to Extended Kalma n Filter (EKF) which estimates the non-linear diffusion equation only up to first order precision. I further used a hierarchical prior with a seemingly unrelated regression (SUR ) model to use the shared in formation in th e simultaneous diffusion of the mobile app categories, and to avoid t he over fitting of th e model with three hundred macro parameters. To estimate the macro diffusion model in the short plannin g time horizon of an app-store platform, I used a Monte Carlo Expectation Maximization (MCEM) approach to optimize the Maximum A Posteriori ( MAP) o f the parameters, in contrast to a po ssibl y slow con vergence Ba yesian samplin g al gorithms, such as Gibbs and Metropolis Hastings. To deal with t he problem of the stochastic surface search of the MCEM approach, I u sed a genetic op timization al gorithm, wi th an initial population t hat is a perturbed version of the estimates found in Van den Bulte and Joshi (2007) stud y. Next, I used the outcome of the mac ro diffusion model as a m easure fo r social influence in t he structural choic e model to extract fa ctors o f customers’ mobile a pp choices. The c hoice of a mobile app ado ption i s ver y sparse over time. In other words, I exp ected to observe several zeros 6 in the d ata. To deal with such sparsity and to filter th e possible no ise of the data, I aggregated the data on the characteristics of the mobile app categories, and the cu mulative number of imitators at a weekl y l evel. Further, to no t discard th e multi-colli near d ata on the mobile app- characteristics, I used a factor m odel to reco ver the underlying factors of the mobile ap ps profiles. To name these factors, I merge th e factor loading p rofiles and practitioners’ knowledge of customers’ mobile app choices. Given the mobile app-categor y l atent factors, the density of the imitato rs, and the download history o f the app-store platforms, I used a mix ture n ormal multinomial logit model to represent each cons umer’s choice of mobile app-adoptio n. I estimated this model by MCMC sa mpler. The hierarchical modeling and the wei ghting scheme I used m ake the appro ach appropriate for t he big data, because th e mix ture normal prior al lows for flexible structure that yet ma y not over fit. This modeling approach is appropriate for the contex t of online retailers, in which the distribution of choices follows lon g tail distribution (Anderson 2006) . I estimated t his model over a data set of a newly l aunched app-store i n Africa during May 2013 and a supplementar y dataset of n etwork location of the mobile app-store us ers scraped fro m web. The sample consists of mobile app ch oices of approx imately 2 0,000 cust omers that reside in 30 cities that the app store is available, among which approximatel y 1,000 resides in a cit y in Africa. Mobile apps belong t o various cate gories among which I selected 10 categories (presented in table 1 .3) th at were less spa rse. The estimation results show that, social influence significantly affects cu stomer a pp-adoption ch oices, an d I find that so cial in fluence at offline world (within the city) exp lains t he cus tomer cho ices better than so cial influence at online world (within the 30 cities that app-store performs). I also find t hat n ot co ntrolling for social influence 7 in mobile app choices of customers results in b iased customer preference esti mates. Furthermore, I find that a mong man y different classical goods, mobile app adoption pattern is ver y similar to music CDs adoption pattern. I further u sed the estimated micro choic e model to simulate a counterfactual po licy that intervenes in s ocial i nfluence to af fect consumers’ choices. I find a polic y that increases mobile app diffusion by 13.6%. This step is a form of prescriptive analytics that I b uilt over the descriptive and the predictive an alytics steps. F u rthermore, I find individual specific preference parameters estimated by th e choice model, whi ch can help the mobile app-store tar get its customers. The current stud y is mo stly related to studies o n consumers’ peer effect b y Yang and Allenby (2003), Stephen and Toubia (2010), Lehmmes and Croux (20 06), and Nair et al. (2010). Also, it is related to stu dies on t he global macro d iffusion b y Van den Bult and Jos hi (2007), Puts is et al. (1997), and Dekimpe et al. (1997). Another relev ant research stream includes studies on micro diffusion mod els b y Do ver et al. (2012), Chaterjee and El iashberg (1 990), and Young (2009). The last stream of relevant st udies includes studies on the app store platform b y Ghose and H an (2014), Carare (2012), G arg and Telang (2013), Liu et al. (2012 ), Ghose et al. (2011), Gho se and Han (2011b), Gho se and Han (2011a), and K i m et al. (2008). Although these studies have contributed greatl y to the understandin g of the p henomenon, none has created a pipeline which combines th e macro diffusion modeling and the micro st ructural choice modeling approaches to allow the a pp-store pl atform to target i ts consu mers. The p roposed ap proach allo ws the app-store to target its customers by applying the descriptive, predictive, and prescriptive analy tics over a high volume, high velocit y , high variety, and high veracit y big data. 8 Thus, th is paper, contribu tes to the emer g ing literature on the prescripti ve data anal ytics of the mobile app-store platfo rm i n three wa y s. F irst, it i ntroduces the combina tion of macro simultaneous social l earnin g adoption model and micro structural choice modeling approaches to design a method tha t a llows the app-store platform s to target their hetero ge neous consumers, using their big data. Sec ond, this paper b enchmarks the parameters of so cial learnin g mobile app adoption against those o f classical econom y goods such as the color-TV, personal com puter, music CD, and radio-head. Third, this pap er sh ows th at social i nfluence at offline ( local city level) drives mobile app choices of customers on the app-store, and ignoring social learning process creates biased estimates. Fourth, this paper shows th e power of its proposed model for prescriptive analytics ov er the b ig data, by fi nding an opti mal viral marketing p olic y (e.g., share with friends and family) for t he app -store th at can increase it s total expected diffusion b y 13.6%. Last but not least, to estimate the propos ed social learning model, this pape r emplo y s SUR, U KF, MCEM, and genetic algorithm to maximize th e MAP estimate of t he macro d iffusion model. In addition, it uses a hiera rchical mixture normal prior over its multinomial l ogit choice model, estimating it using MCMC samp ling method. These approaches t hat allow f or a flexible heterogeneity pattern and for a robust filterin g of process and measurement errors, as well as computational feasibilit y of big data analytics, sh ould be of interest to academia and a number of commercial entities i nterested in not onl y the des criptive and p redictive, b ut also the prescriptive analytics of their big data. 1.3.LITERATURE REVIEW 9 This stud y draws upon se veral streams investi gated, within th e literature: (1) t he interdependence of consumer preference; (2) mobile app store d ynamics; and (3) global macro and micro diffusion and social learnin g. Given the breadth of these areas across mu ltiple disciplines, the following discussion represents o nly a brief review of these relevant streams. Table 1.1 presents a summary of the position of this stud y in the literature. Table 1.1. Literature Position of this stud y Stream of Study Interd ependen ce of con sumer prefer ence App St ore Glob al micro/mac ro Simultan eous Diffusi on Current st udy * * * Yang and Allenb y (2003 ); Stephe n and Toubia (2010); Lehmmes and Croux (2 006); Bell and Song (20 07); Aral and Walker (2011); Nair et al. (20 10); Bradlo w et al. (2005); Hartmann ( 2010); Yang et al. (2005); Nar ayan et al. (2011); Kurt et al . (2011); Chun g and Rao (201 2); Choi et al. (2010). * - - Ghose and Han (2014); Carare (2012 ); Garg and Telang (2013); Liu et al. (2012 ); Ghose et al. (2011); Ghose and Ha n (2011b); Ghose and Han ( 2011a); Kim et al. (2008). - * - Van de n Bulte and J oshi (2007); Yon g (2009); Chatterjee and Eliashber g (1990); Puts is et al. (1997); Dekimpe et al. (2000); N eelamegham and Chintagu nta (1999); Talukdar et al. (2002); Gatignon et al. (1989); Takada and Jain (199 1); Dover et al. (2012). - - * 1.3.1. Interdependence of consumer preference Quantitative models of consumer purchase b ehavior often do not r ecognize that consumers’ choices may be driven b y the underl y ing social learning processes within the p opulation. Economic models of choi ce t yp icall y assu me that an ind ividual’s latent u tility is a function o f the brand and attribute preferences, r ather t han the preferences of the other customers. However, fo r pervasive exp erience goods, a new model whi ch accounts for these underlying fo rces and 10 preferences may better ex plain consumers’ choices. Man y studies have tried to address this issue, using cross sectional data to model the consumers’ preference dep endency (Yang and Allenb y 2003), online social network selle r interaction data to quantify t he network value of t he consumers (Stephen and Toubia 2010), the customer trials data at Ne tgroceer.co m to determine the importance o f it s consumers’ spatial expos ure (Bell and Song 2 007), and physician’s prescription choices and their self-reported infor mation to demonstrate th e si gnificant effect of network influence on consumers’ choices (Nair et al. 2010). Other researchers have a lso reported on t he critical role play ed b y s ocial proximity in shapi ng consumer preferences. Bradlow et al. (2005) builds on the pr evious literature to suggest that the demographic and the psychometric prox imity measures are important f or consumers’ choice. Hartmann (2010) uses cust omer data to show a correlation between social interactions and the equilibrium o utcome of an empirical discrete game. Yang et al. (2005) d emonstrate t he interdependence of s pouses’ TV viewership t o su ggest the need for consi dering choice interdependency. N arayan et al. (2011) emplo y conjoint experience data to hi ghlig ht the effects of peer influences, and finally Choi et al. (2010) draw from an internet retailer’s dataset to establish the importance of imitation effects in a ge ographical and a d emographical proxi mity. However, althou gh all st udies are si gnificant in suggesting the role of social influence on decision making, none has modeled consumers’ mobile apps choices’ interdependence. 1.3.2. Mobile app store dynamics Recently, a stream of lit erature has eme rged that pertains to the d ynamics of mobile app store. Some studies have addressed Apple and the Goo g le platforms’ competi tion (Ghose and Han 11 2014), Goo gle pl ay’s fermium st rategy ( Liu et al. 2015), an d A pple’s app-store’s bestseller rank information influence on sales (Carare 2012; Garg and Telang 2013). Oth er s tudies consider the relation between t he cont ent generation/consumption (Ghose and Han 2011b), the internet usa ge and mobile internet char acteristics (Ghos e and Han 201 1a), users’ browsin g behavio r on mobile phones and p ersonal computers (Ghos e et al. 2011 ), and voice and s hort messa ge p rice elasticity (Kim et al. 2008).Althou gh these studies are r epresent attempts to tea ch us more about the nature of the mo bile app-ma rket, none h as extr acted the effect of social d yn amics on the consumers’ choices in the context of the mobile app-store, at both the macro and micro l evels. 1.3.3. Global Macro and Micro Diffusion and Social Learning Two main st reams of literature i n product diffusion are r elevant to t his stu dy: the micro di ffusion models, and the global diffusion and social l earning models. The earliest micro diffus ion model considers co nsumers’ Ba yesian learning from t he si g nals t hat follow a Poisson process (Chaterjee and Eliashber g 1990). Later studies e mphasize the n eed for micro-diffusion modeling (Young 2009), and criticall y review the aggregation and h omogeneity of diffusion models (Peres et al. 2010). To re medy the issues, some st udies proposed micro network topology approaches (Iyengar and Van den Bulte 2011; Dover et al. 20 12). Other studies suggest structural modeling of consumers’ d ynamic-forward looking adoption choices (Song and Chin tagunta 2003), and systematic conditionin g to heterogeneous consumer’s adopti on choices (Tr usov et al. 2013). Peres et al. (2010) presents a review of this literature stream. Parallel with the micro diffusion literature, a str eam of stu dies provide sol utions for heterogeneous soci al learning p rocess (Van den Bu lte and Joshi 2007), the mixi ng (i nteractions) 12 of adoption process (Putsis et al. 1997), s imultaneous diffusion (Dekimpe et al. 1998), su pply- side relationship (e.g., p roduction economies) and omitted v ariables (e.g., income) correlations (Putsis and Srinivasan 2000), and the effect of macr o-environmental variables (Talukdar et al. 2002). This st udy builds on this literature, b y proposing a prescriptive machine learning pi peline that combines the advantages of both macro and micro modelin g a pproache s. The approach that I have propos ed recognize th at the app-store platform’s d ata ma y be a noisy measure of the variables of interest. I deal with the sparsity of the choices th rough a combi nation of a gg regation, filtering, hierarchy, and SUR processes. I su ggest a data cleanin g and modelin g approaches that may be suitable for the bi g data variet y, velocit y, veracit y, and v olume o f the app-store p latform. To estimate the model, I also suggest a genetic optimization meta-heuristic approach, which enables the stochastic surface optimization. 1.4. MODEL I start the modeling section with the choice of individuals ) ,..., 1 ( I i = at the app-store. I am interested to model co nsumer’s mobile app c hoices. However, to recognize the long t ail distribution of mobile app choices (which creates sparsity), I aggregated t he choice data at app category level. The customer makes a choice o f mobile app category j ) ,..., 1 , 0 ( J j = at a given week ) ,..., 1 , 0 ( T t = , where 0 = j d enotes the outside good option. The model of consu mer a pp choice i s different from t he prior stu dies (Carare 2012) in not modeling aggre gate pur chases, bu t modeling indiv idual sp ecific choi ces, through a rich set of mobile app c ategory characteristics. 13 The model is si milar to n ested lo git model structure o f stud ies such as R atchford (1982 ) and Kok and Xu (2011), yet to recognize sp arsity of end ch oices, I a ggregated the choices within the nest as the choice of the n ests. T his model may be u seful for mobile app-store owners, because they concern about t he diffusio n of mobile apps within the categories rath er than the diffusion of each instances of mobile app in seclusion. I spe cified the u tility of consumers’ choice of app categories on the app sto re in the foll owing form: j it jt i jt i jt i imm jt i it i ij ijt F F F c s u ε α α α α α α + + + + + + = 3 15 2 14 1 13 12 11 ) (1) where ij α denotes the rando m coefficient of individual i’s preference for mobile app category j . jt F 1 , jt F 2 and jt F 3 denote factors that control for variation in observable mobile app characteristics/quality, p rice, and adver tisin g (th e structure of factors are explained later). imm jt c ) denotes time varying soc ial in fluence measure (the s tructure of the measure is explained l ater), and it s denotes history of consumers i’s app do wnloads unt il time t, which controls for state dependence and app-choi ce interdependence. Parti cularly, if co nsumer i downlo ads an app at t- 1, then 1 1 + = − it i t s s , oth erwise if the consumer selects ou tside option, then it s remains unchanged: i.e., 1 − = it it s s . This specification induces a first-order Markov process on th e choices. Controlling for s tate d ependence and soci al influence h elps to consider the potential correlation between customers’ choices across t he categories and across the individuals. Table 1.2 presents the definition of the variables and the parameters. 14 Assuming the random utilit y ter m j it ε has t yp e I ex treme value distribution, consumer’s i 's probability of selecti ng t he app categor y j at time t is given b y a multinomial lo git model, bas ed on the deterministic portion ijt v of random utility ijt u as follows: ) exp( 1 ) exp( 1 ∑ = + = J j ijt ijt ijt v v p (2) where the mean utility of outside good is set to zero, i.e., 0 0 = it v . Vector of mobile app category characteristic incl udes average file si ze of mobile apps (a proxy for th e app qu ality), freq uency of featurin g, av erage and variance of mobile app prices, t he number of paid o r free apps and their ratio, an d the average tenure (t ime s ince creation) of all the mobile apps within the categor y. These variables can act as measures of ( proxy for) competition. I assumed each of th ese pieces of the data contains some i nformation that may b e i mportant for the consu mer, bu t th ese pieces are highly correlated. Therefore, to get a better in sight, I r educed the variation in th ese variab les into three factors that preserve 85 % of the variation. Formally, I used the following factor model process: ) , 0 ( ~ , ' ' E N e e bF x k jt jt j t jt + = (3) To model consumer so cial learning, I used filter ed l atent time v ar y ing d ensit y of imitators imm jt c ) . This approach is similar to t he classical practice of modeling consumer’s response to featured and display products, in which the modeler in cludes an aggregate measure i nto the cho ice model to measure the consumers’ response. Furthermore, the th eoretical interpret ation of this modeling approach is that as the number of imitators within th e population increases, the possi bility that an individual obs erves another individ ual who has al ready adopted the mobile app i ncreases. As a 15 result, the consumer may become more or less l ikely to adopt a mobile app wit hin mobile app category j. This theory is similar by micro modeling diffusion proposed by Ch atterjee and Eliashberg (1990), exce pt that the model does not a ssume that co nsumer receives information with a Poisson pro cess, so the proc ess can be a non-homogeneous Poisson p rocess (inter-arrival time is not memory-less anymore). I n other word, I endogenize c onsumers’ information receiving process in the c hoice model. The approach serves as an alternative to the micro- modeling approach used by Yang and Allenby ( 2003) to incorporat e interdependence o f awareness and pr eferences of consumers, but this model i s useful when micro sp atial structure information is not available. My propos ed approach may be relev ant to th e context of p ervasive goods, because these goods are more visible in daily interactions. There are two appro aches to capture the density of imitators in th e mod el. The first app roach is to model it as a latent s tate variable, and recover i t from the ch oice model. Alt hough fanc y , this approach may not be t he best approach over big d ata, because it is computationall y intractable. The alternative approach i s to use the agg regate diffusion data to filter the number of imitators. This approach combines macro aggregat e d iffusion modeling wi th micro choice modeli ng methods, t o endognize the number of imi tators, an approach that may be more suitable for t he big data. In this app roach, I can use an aggregate diffusion d ata, to fi lter t he n umber of imitators with t wo degree polynomial linear model. Then, I can use th e filtered da ta in the choi ce model, to run a nonlinear model on the data set of individual choice of consumers. To sum up, I used the whole dataset to filter the d ensit y of infl uential and imitators for the mobile app categor y j within the population at the given time t. I casted the social-learning diffusion differential-equations (Van d en Bulte and J oshi 20 07) into a di screte state space model. 16 This m odel i s like a double bar rel Bass diffusion model, and allows for heterogen eity i n t he adopters, b y s egmenting the observed cumulative number of adopte rs into the latent nu mber of imitators and i nfluentials. In contrast to classical l og likelihood and n on-linear least square methods, m y filteri ng approach in creases estimation robustness to process and measurement noises (Srinivasan 1999; Xie et al. 1997). ) , 0 ( ~ , ) 1 ( j jt j t imm jt j Inf jt j jt V N v v c c y + − + = θ θ imm jt imm jt imm j imm j imm jt j j jt j imm j imm j imm jt jt jt jt j j jt j j Inf jt e c M M c w M c w q p c W MVN e e c M M c q p c + − − + + = + − + = − − − − − ) )))( )( 1 ( ) ( ( ( ) , 0 ( ~ , ) ))( ( ( 1 1 inf inf 1 . inf i nf 1 inf inf inf 1 inf i nf . (4) where jt y denote s the observed cum ulative number of adopter s of mobile apps in the mobile app categ ory j at time (d ay) t. inf jt c denote s the latent c umulative number of adopters in influential segm ent for app category j at ti me (da y) t. imm jt c denote s the latent cumulative number of adopter s in im itator s egment for app category j at time( day) t . j θ de notes the size of the s egment of influen tial adopters, and it is bound between zero and one. inf j p denotes independe nt (random) rate of adoption of influential adopters, and inf j q denote s the dependent (influence d by other influen tial adopters) rate of adoption of influential adopters. imm j p denotes independent (random) rate of adoption of imitator adopters, and inf j q denotes the dependen t (influenced by other adopte rs) rate of adop tion of i mitat or adopters. j w denotes the degree of influence of influential adopte rs on t he adoption of imita tors. jt v denotes the noise of observation equation, and ) , ( inf imm jt j t e e denote s the vec tor of no ises of state equations. 17 In summary the first equation d enotes t he observation equ ation and the sec ond t wo the s tate equations of the st ate sp ace model. The first equation uses a discrete late nt model to i ntegrate over t he cumulative number of influential and imitator adopters. T he second equation captures the adoption p rocess of influential adopters segment, and the third captures the adoption p rocess of im itator ad opters se gment. Th e imitators are different behaviorally fr om t he ind ividuals in influential se gme nt, i n that they learn not o nly from themselves, but also from in dividuals in influential segment. This model of social inf luence measure is more suitable f or the context o f mobile apps, as it captures more social learnin g process (Van d en Bulte and Joshi 2007) than information cascade process (Bass 1969). Furthermore, it allows for heterogeneity in th e adoption process, by segmenting the adopters into influential and i mitator se gments. Van den Bulte and Joshi (20 07) find a closed for m solution for this model. I considered that the data may have measured with noise. As a result, to control for this potential measurement error, I use a state space model structure with observation and state noises. I reco gnized that t here is shared infor mation in the diffusion of v arious mobil e app categori es on the app store. As a result, I modeled these d ifferential eq uations of social learning across mobile app categories j ointl y and simultaneously. This jo int modeling captures share d in formation at two levels: covariance and prior. To acc o unt for t he simultaneity, on the co variance level, I modeled the state variance of t he latent measure of cumulative influential and imitator adopters, and th e variance of state equati on of cumulative i nfluential an d imitator adopt ers through a s eemingl y unrelated regression (SUR) model. The SUR model is presented formall y in (4) as modeling t he j oint distribu tion of th e state 18 equations in a multivariate normal mod el structur e, rathe r than modelin g t he state equation error terms individually. To jointly model the diffusions, I used a hier archical model (prior) with conditionally no rmal distribution constraint on t he fixed app-category specific diffusion parameters, which is ) , , , , , , , ( in f in f in f j j imm j j imm j j imm j j j w M M q q p p θ = Φ . This B ay esian process sh rinks th e fixed app- category specifi c parameters toward the popularity of each mobile app, because it is ex pected that more popular mobile apps have higher rate o f imitator adoptions and market size. Formall y, I defined the following structure: ) , 0 ( ~ , 2 o j j j o j N Pop σ ο ο + ∆ = Φ (5) where j Φ denotes v ector of non-state (fixed) parameters of th e diffusion. j Pop denotes t he popularity of mobile ap p categor y j. o ∆ denotes the hy per parameter o f ap p categor y specific parameter shrinkage, and j ο denotes the noise o f the hierarchi cal model, or the un observed heterogeneity of the mobile app categories. I account ed for hetero geneity in the individual choice parameters by modeling t he choices’ parameters random ef fects. To consider the p ossibilit y of misspecification that may result from rigid normal prior, I ad opted the flexible s emi-parametric approach pro posed b y Dube et al. (2010). This approa ch assumes a mix ture of multivariate no rmal distributions over the parameters’ prior, to allow for thic k tail skewed multimodal distribut ion. I d enote the vector o f fixed consumer-level parame ters by ) ,.., , ( 15 2 1 i i i i α α α = Α . I acc ommodated consumer heterogeneity by assumi ng that i Α is drawn from a distribution common across consumers, in two stages. I e mployed a mix ture of normal as the f irst stage prior, to s pecify an i nformative 19 prior that also does not overfit. The fi rst stage c onsists of a mixture of K multivariate normal distribution, and the second stage consists of prior on the parameters o f the mixture of normal density, formally: b z A z A p k k k k i i K k k k k i i | } , { , ) , | ( }) , { , | ( 1 Σ Σ ∆ − = Σ ∆ − ∑ = µ π µ φ π µ π (6) where b denotes the h y per-parameter for th e priors o n the mixin g probabilities and the parameters gover ning each mixture component. K denotes the num ber of mixture components. } , { k k Σ µ denotes mean and covari ance matrix of the distribution of individual specific parameter vector for mixture comp onent k. k π denotes the size of the th k ' component of mixture model, and φ denotes the normal de nsity f unction distribution. i z denotes information set about custom er i, which her e o nly includes o nly the t enure (the nu mber of day s from customer i’s registration on the app store). ∆ denotes the parameter of correlation between choice re sponse parame ter and information set about customer i. To obtain a truly non-parametric estimate using t he mixture of normal model it is required that the number of mixture components K increase with the sample size. I adopted the approach proposed by Rossi (2014), called non-parame t ric Bayesian approach. This approac h is equivalent to the approach mentioned above when K tends to infinity . In this structure, the parameters of mixture normal model have Dirichle t Process (DP) prior. Dirichlet process is the gene ralization of Dirichlet distribution for infinite atomic number of partitions. This process represen ts t he distribution of a random measure (i.e., probability). Diric hlet process has two parameters, t he first is the b ase distribution, which is the prior distribution on the parameters of the multivariate i Α 20 Normal-Inverse Wishart (N-IW) conjugate prior distributio n for the distributio n for the partitions that the ch oice parameters are drawn f rom, and the second parameter is the concentration parameter. Formally, the p rior for the individual specific choice par ameters has the following structure: power d k k k k d k k k Unif d Unif z z d a a Unif a a I IW a N G G DP )) /( ) ( 1 ( ~ ) , ( ~ ), , 1 ( ~ ), exp( 1 ~ ), , ( ~ : ) , , ( ) , ( ~ ), , 0 ( ~ | : ) ( 0 ) ), ( 0 ( ~ ) , ( 1 1 1 1 1 1 1 1 α α α α α υ υ υ ν ν ν υ ν λ υ ν ν µ λ α λ µ θ r s r s r s r s r − − − + − + − × × Σ Σ Σ Σ = − (7) where ) ( 0 λ G denotes the base d istribution or measure (i.e., the distribution of hyper-parameters of th e p rior distribution of the partitions). λ denotes the random measure, which represents the probability distribution of ) , , ( υ ν a . ) , , ( υ ν a denotes the hyper-parameters o f the prior distribution of the partitions that the choice p aramet ers belong to, which represent the behavior parameters of the latent segments. d denotes th e n umber o f choice parameters per customer (in my case d is equal to 15). d α denotes the concentration (also referred to as precision, ti g htness, or innovation) parameter. The idea is that DP is centered over the base measure ) ( 0 λ G with N- IW with precision parameter d α (larger value denotes tight distribution). ) , , , , , ( υ υ ν ν s r s r s r a a denotes the hy p er pa rameters vector for the second leve l p rior on hy p er p arameters of prior over the partitions distribution of the c hoice parameters. Dirichlet Process Mixture (DPM) is referred to the distribution over the probability measure defined o n some si gma-alge bra (collection of subsets) of sp ace ℵ , such that t he distribution for any finite partition of ℵ is Dirichlet distribution (Rossi 2014). In m y case, the probability measure over the partitions for mean and variance of random co efficient response parameters of 21 individual choice pa rameters sigma-algebra has the Normal-Inverse-Wishart conju gate probability. For any subset of customers C of ℵ : 1 )) ( 1 )( ( )) ( ( ) ( )] ( [ 0 0 0 + − = = d A G A G C G Var A G C G E α λ λ λ λ λ (8) By De Finetti theorem, i nt egrating ( marginalizing) out the random measure G results in the joint distribution for th e collection of in dividual specif ic mean and covariance of random coefficient choice parameters as follows: dG G p G p p ) ( ) | , ( ) , ( . . . . Σ = Σ ∫ µ µ (9) This join t distri bution c an be represented as a sequence of conditional distr ibutions that has exchangeability property: )) , ( ),..., , ( | ) , (( ))... , ( | ) , (( )) , (( )) , ( ) ,..., , (( 1 . 1 1 1 1 1 2 2 1 . 1 1 1 − − Σ Σ Σ Σ Σ Σ = Σ Σ n n I I I I p p p p µ µ µ µ µ µ µ µ (10) The DP process is si milar in nature to Chinese Restaurant Process (CRP) and Polya Urn. In t he CRP, t here is a restaurant with infinite n umber of tables (analogous to p artitions of mean and variance of the i ndividual choice random co efficients). A customer entering the restaurant selects the tables rando mly, but he selects the table w ith probability proportional to the number of customers that have s at on th e table so far (in which case the customer behaves similar t o the other customers who ar e sitt ing at the selected tab le). If the customer s elects a new table, he wi ll behave based on a parameter th at he randomly sele cts fro m restaurant customer beh avior parameters (so not necessarily identical to the parameters of the other tables). 22 Table 1.2. Model Variable Definitions Variable Description App Category Daily Downl oad( jt y ) Cumulative number of consume rs w ho download an app in app category j up until a given day t App Category Weekly Download Latent ( in f jt c ) Latent cumulative numbe r of consume rs from influen tial segment, who download an ap p in app category j u p until a given day t. Consumers fro m influential seg ment only learn from each othe r, and not from imitators. App Category Weekly Download Latent ( imm jt c ) Latent cumulative n umbers of co nsumers from im itator segment, wh o download an app in app catego ry j up until a given day t. Consume rs from imitato r segment learn both from each othe r, and adopters in influentia l segment. Segment s ize ( j θ ) A parameter between ze ro and one tha t define the size of the inf luential segment Internal Market Force ( imm j j p p , in f ) The random P ois s on rate of adoption of i ndividua ls in influent ial and imitator segment respec tively. External Ma rket Force ( imm j j q q , in f ) The endoge nized i mitation rate of adoption of individuals in influent ial a nd imitator segm ent respective l y. Learning spl it ( j w ) The degree to w hich th e indiv iduals in i mitator segment learn from adopters in the influent ial segment Market size ( imm j j M M , in f ) The market size of individ ual in influen tial and imi tator seg ments respectively . Category hiera rchy paramete rs } , { 2 2 k k Σ µ Parameter of locally wei gh ted r egressio n parameters of the hierarchical prior of app category diffusion pa rameters Full covariance matrix of state equation( W ) Full covarian ce m atrix of state eq uation of macro diffusion model, which may suggest comp lementarity o r substitution . Variance of obse r vation equat ion ( j V ) Variance of obse r vation equat ion of macro diffusion model Category data ( jt x ) Category j characteristic data at day t, including Average fil e size, total number of adds f eatured i n the category, average price, varian ce of p rice, pa id app options, f ree ap p opt ions, fraction of free to p aid app s w ithin the category, average tenure of each app category , total app options within the ca t egory Category Facto rs( jt F ) Reduced fac tors explaining the variation in category data Factor load ing of Category ) ( b Factor load ing of data item j of catego ry data vecto r Consumer utili ty from app category ( ijt u ) Consumer i ’s utility from selecting an app in app categ ory j at wee k t App category pr eference ( ij α ) App catego ry specific pref erence of consume r i Individual down load history state ( j it s ) State of indiv idual i’s do wnload history in a given catego ry j unti l week t 15 11 ... i i α α Utility para meters of consum er i ijt p Probability of selecting an app in catego ry j at time t } , { , 1 1 1 k k Σ µ π Parameter of h i erarchical mixtu re of nor mal components of in dividual choice parameters jt jt jt e e v ' , , Error terms of observation /state equation and factor model 23 The Pol ya Urn pro cess has also the same structure. In this process, the experime nter starts by drawing balls with differe nt co lors f rom th e urn. Any ti me the experimenter has a b all wit h a given color dr awn from t he urn, h e wil l add an ad ditional ball with the same color to th e urn, and he also returns the d rawn ball. The distribution of number of customers sitting at e ach tabl e in CRP and number of balls in each color in Pol y a Urn follow DP. An alternative wa y is the approach proposed by Dube et al . (2010) to fit mod els with successively large numbers of components and to g auge the adequacy of the number of components b y examining the fit ted density associated wi th the s elected number of components. However, the process of model selection is tedious in thi s case. Table 1.2 presents the definition o f variable and para meters. To sum up, I used t he co mbination of macro diffusion model and micro choice model that considers the big data nature of t he current study: variety, velocity, veracit y, and vo lume. On the variet y aspect, I u sed a flexible semi parametric mixtu re of normal distributio n as prior on the i ndividual choice model. For velocity, I us ed a si mpler linear state space mode l on the daily data ove r t he full sample, and I aggregated this da ta at weekly level to use it in the micro individual ch oice non-linear mode l. For volu me aspect o f the data, I considered sparsit y nature, so I a ggregated both macro diffusion and micro individual choice data for mobile app instances within the mobile app categories, and I used a fa ctor model to summarize the sp arse characteristics of the mobile app categories. Finally, for ver acity, I casted the social diffusion model into a discrete time state s pace m odel to add a layer of ro bustness to the potential misspecification and process erro rs. Figure 1.1 pr esents the box and arrow diagram of the proposed model. 24 Figure 1.1. Box and Arrow Representation of the Model 1.5.DATA The data set were collected b y an African telecom operator on ind ividual choices of downloading mobile apps from the app store pl atform of its global partner. The app-stores a re a type o f two sided p latform, as they match consumers’ and developers/publishers without ta king the ownership of the mobile apps. The app -store I stu died is launched with in around 330 days prior to th e curr ent stu dy in 20 13 and 2014 . I used th e a ggregate downlo ad data for a period of around 190 to 259 days as the macro sample, and the data on download choices of a sample of 1,258 consumers for a period o f 124 days as the micro sample. T he macro sample therefore includes between 1,900 to 2 ,590 ob servations, which might not be considered bi g, but the small sample 25 includes approxi mately 160,000 observations, which m ight be considered big for non-l inear models. A big data set s uch as ours creates a trade-off in estimation. On on e side, I had a big data that can give insi ght i n a short planning ho rizon, given that I used a linear model. On the other hand, I had computationally intensiv e methods that can giv e insights with p rescriptive power, given that the d ata is not big. I wanted to have a method that gives us t he adv antage of both bi g data and a computationally intensive method. As a result, I used a second de gree polynomial macro -model of social learning diff usion over the macro s ample and the non -linear computationall y intensive micro choice model over the micro sample. To deal wit h the sparsity of the data, which is driven b y th e lo ng tail di stribution of the mobile apps’ adoptions, and to reduce the daily noise in the data, I aggregated the data of the micro sample at weekly l evel before I fed it to the choi ce model. Sec ond, I aggre gated the macro app adoption, and micro app download choice data at app category level to limit the stud y to the topic o f interest for t he app-store platform, as well as to handlin g the data v olume. In additio n, I used a fl exible B ayesian prior to shrink the individual specific choice parameters. I investi ga ted two sources of consumer preference int erdependence: local and glo bal. For the lo cal interdependence, I f iltered the macro sample data to individual adopters who liv e in a city u nder the current st ud y. To do this filtering, I us ed a data set o f mapping IP ad dresses to cities that I collected b y crawling World W id e W eb. F or the global interdepend ence, I did not use th is filtering, so I us ed the aggregate information about the mobile a pp adop tions within all thirty cities from all five continents. 26 Table 1.3. Categories Basic Statistics Index Category Total Downloads within local city 1 Dating 27 2 eBook 414 3 Education & Learning 24 4 Health/Diet/Fitness 42 5 Internet & WAP 52 6 Movie/Trailer 597 7 POI/Guides 22 8 Reference/Dictionaries 55 9 TV/Shows 135 10 Video & TV 105 In a nutshell, th e data consists of around 20,000 consumers, with aro und 3 ,000 consumers in a local African cit y u nder the curren t stud y. This local cit y has around 4 ,000 app downloads for the d uration of the cur rent study. Twenty th ousand global and three thousan d l ocal consumers’ who make choice s for a course of s ix month classifies the current d ata as a big o ne, for its variety, velocit y, veracity, a nd volume. Table 1.3 illustrates the lis t of the categories that I selected and th eir corresponding total downloads wit hin t he local cit y un der t his study. Each of the 1,258 custom ers under the stud y adopts only one of th e mo bile apps during t he co urse of study, so on all o ther days sh e selects outside op tion. T his observation may s uggest that a mixed logit choice model might b e a suitable model, onl y if that an inter-temporal dependencies between the choices are controlled. 27 F i gure 1.2. Intercontinental (across 30 cities) Diffusion Curves for the mobile apps within the Categories The dataset also include d longitude and latitu de of each IP address. How ever, as mobile phones are usually attach ed to t he cus tomers who might move wi thin the cit y , a ggregating locations at city level might be relevant. Moreover, this assumption is innocuous, because of the so cial nature of mobile p hones and mobile apps (i .e., usage of mobile and mobile phones in social atmosphere). Particularly, mobile phones hav e become inseparable part of societies, to t he p oint that not o nly customers u se them when the y are al one in t he b us, when they are to sleep, or even when t hey are in the class, but also t hey use them in their pa rties, in their offices, in their leisur e 28 times, and generall y in an y social ev ents. Mobile phones use in social events makes mobile apps visible, and this visibility can create social learnin g opportunities. In the fi gure 1.2, I plotted the diffusion curves of the cumulative adoptions of a sample of six mobile app categories. For each mobile app categor y, I had the average file size, th e total nu mber o f apps f eatured, the average and the varian ce of app pri ces, and the number of paid and free options. Table 1.4 presents the basic stati stics of these variables. To explain the heterogeneity i n indi vidual responses, I used the data on the tenure of each customer. I defi ned tenu re as the number of da ys since each customer h as subscribed to the a pp-store. As different types of consumers (i.e., influential and imitators) with different psycholo g ical traits adopt the technology at different points in time (Kir ton 1976), I u sed the tenure of consumers as a p roxy for the psychological traits that can explain the heterogeneity in consumers’ choice responses. Table 1.4. Mobile app categories basic statistics Category Data Summary Mean Variance Min Max Number of available apps in the Category 35 1250 12 141 Average t enure of a pps in th e category (Da y s) 316 6,386 169 498 Number of available free apps in th e category 32 908 7 120 Average days that an app is featured in the category 0.12 0.05 0.00 0.71 Average file size of apps in the categ ory (MB) 2.00 4.00 0.50 8.00 Variance of prices of app s in the category 0.51 1.09 0.00 3.75 To explain h eterogeneity in app store cate gories, I used the popularit y of the mobile app categories on the Apple app store. As the Apple app store is the founder and the leader of the app-store platforms and its consumer s are more affl uent on es (possibl y more influential ones), 3 I expect that the popularity of the mobile a pp categories on the Apple ap p store to ex plain the diffusion of th e mobile app categories on the other app store platforms as w ell. Therefore, I used 3 "App St ore (iOS)." Wikipedia. Acces sed March 23, 2016. htt p://en.wikipe dia.org/wi ki/App_Store _(iOS). 29 the mobile app cate gories’ popularity on the Appl e app store to expl ain the heterogeneity in the mobile app cate gory parameters. These popul arity statisti cs is p resented in figure 1.3. This figure shows the long tail distribution of the popularity of the mobile apps. Figure 1.3. Popularity (market share) of App Categories on Apple Inc. App Store F i gure 1.4, from Distimo, a mobile app market res earch compan y, suggests that some mobile app categories are more susceptible to b e pai d, and o thers are more susceptible to be free. The hi gh share of free mobile apps is an important observation i n t his fig ure. The same feature exis ts in the data sets I used in this stud y. This fe ature sug gests that the k ey cost fa ctor that t he consu mers incur m ight be the co st of learning about the application, supposedly from others (e.g., their 30 friends, or over internet). This observation suggests that social influence might b e an important factor for adoption decision, but a formal model is required to confirm this conjecture. Figure 1.4. Free mobile apps versus paid mobile apps Anecdotal evidence su ggests that from m ental accounting p erspective consumers pe rceive the paid mobile apps as in vestment, for which the y are willing to pay money, and the free mobile apps, as entertainment, 4 for which the customer might be less in clined t o pay. Guided by the same intuiti on, I als o classified mobile app cate gories in the sample int o two categories: utilitarian and hedonic categories. The u tilitari an categor y includes: device tools, health/diet/fitness, intern et/WAP, and reference/dicti onary mobile apps. These t ypes of mobile apps might be prominent for their utilit y rather than their entertainment. The hedonic c ategory includes: e book, games, humor/jokes, l ogic/puzzle, a nd social networks mo bile app s. These 4 Chang, Ryan. " How to Price Your App: Free or Paid - Envat o Tuts Cod e Article." Code Envato Tuts. F ebruary 19, 2014. Acc essed Mar ch 23, 2016. http://code.tutsp lus.com/article s/how-to-pri ce-your-app-fr ee-or-paid--mob ile- 22105. 31 types of mobile apps might be more relevant for their entertainment features. This cate gorization might allow t o furthe r i nvestigate whether cu stomers reall y value the util itarian mobile app categories more t han the h edonic mobile app categories, based on t he customer choice parameters. Finally, th e mobile apps are more similar to durable g oods than to non-durables. Therefo re, t he consumers’ choice of do wnloading a mo bile app may be sparse in nature. Sparsit y here means that several choices of the consumers are no download or outside o ption choices. A suitable modeling approach that can handle this sparsity might be hierarc hical Bayesian approach, which borrows information from other sample items, when the information on an indi vidual is sparse. 1.6. IDENTIFICATION AND ESTIMATION In order to id entify the choice model, I used a r andom co efficient logit specification, whi ch has a fixed d iagona l scale. T o s et the location of the u tilit y , I nor malized the util it y of o utside option to zero. To minimize the concerns about endo geneity (o mitted variable), I control for potential correlations between choices b y explicitly m odelin g the inter-temporal choice interdependence in the choice history state variable. I also control for potential confounding effects of price, advertising, and product c haracteristics b y including the latent factors of variation in these variables in the choice m odel. To control for potenti al measurement error in the social in fluence measure, I u se Kalman Filter, and I c ontrol for potential simultaneity in the social meas ures through S eemingly Unrelated Regression (S UR) model structure. In additio n, b y random coefficient structure, the modeling approach also minimizes the concern for Independence f rom 32 Irrelevant Alternatives ( IIA), as it allows for h eterogeneit y in the individual specific choice behavior parameters. I identified indivi dual level ch oice parameters u sing the mic ro sample panel of indivi duals, a sample that consists of twent y wee k micro choices of a 1,258 customers. Bayesian sh rinkage with flexible DP prior helps to identify th e large set of individual specific parameters, without over-fitting. The mixture normal distribution is subject to label switchin g problem (i.e., t he permutation of se gment assi g nment returns th e same li kelihood). However, I immunized m yself to this problem by li miting my in ference to the join t distri bution rather than individual se gment assignment. To esti mate the micr o choice model on the micro sample, I u sed multinomial lo git with DP prior on the indivi dual specific h y per-parameter (Ba ye sian se mi-parametric) estimation code fro m Ba yesm packa ge in R. This method uses Metropolis-Hasting Random-W alk (MH- RW) method to estimate co nditional choice probabilities on cross-sectional units (i.e., customers). The limitation o f MH-R W is that random walk in crements shall be tuned to conform as closely as po ssible to the curvature i n the individual specific co nditional posterior, formally defined by: ) , , , | ( ) | ( ) , , , , | ( ∆ Σ ∝ ∆ Σ i i i i i i i z A p A y p z y A p µ µ (11) Without prior information on highl y probable val ues of first stage prior (i .e., .) | ( i A p ), tuning the Metropolis chains gi ven limited information of cross-sections (i.e., each customer) b y trial is difficult (this problem exacerbates when each cus tomer does not have s ome of th e choi ce items selected at a ll in his hi story). Therefo re, to a void si ngular hessian, the fr actional li kelihood approach prop osed by Ro ssi et al. (2005) is implemented in the used ap proach. Formally rather 33 than using individual specific likelihood, MH-RW approach forms a fractional combination of the unit-level likelihood and the pooled likelihood as follows: ( ) ∑ ∏ = = − = = = I i i i w I i i i i w i i i i n N N n y A l A l A l 1 1 ) 1 ( , , ) | ( ) ( ) ( * β β (12) where w denotes the small tuning parameter to control the effect o f pooled likelihood ∏ = I i i i i y A l 1 ) | ( . β denotes a par ameter ch osen to properl y sc ale the pooled likelihood to the same order as the unit li kelihood. i n denotes t he n umber of observations fo r customer i . Using this approach, the MH-RW generates samples conditional on the partition membership indicator for individual i from proposal density ) , 0 ( 2 Ω s N , so that: i A A i i A i A A l H V H ˆ 2 1 1 | ' * log , ) ( = − − ∂ ∂ ∂ − = + = Ω (13) where i A ˆ denotes the maximum of t he modified li kelihood ) ( * i i A l , an d A V denotes normal covariance matrix assigned to the partition (i.e., segment) that customer i belongs to. This approach considers that . i A is sufficient to mo del the ra ndom coefficient d istribution. To estimate the i nfinite mixture of normal p rior for choice parameters, a standard data augmentation with the indicator of th e normal co mponent is r equired. Conditional on this indicator, I can identify a nor mal pr ior for each customer i parame ters. The d istribution for t his indicator i s Multinomia l, which is conju ga te to Dirichlet distribution, formally: ) ( ~ | ) ( ~ π π α π Nom Mult z Dirichlet i d − (14) As a result poster ior can be defined by: 34 )) ( ),..., ( ( ~ | ) ,..., ( ~ 1 1 1 i K k i i j j K j j i z z Dirichl et z Nom Mult z δ α δ α π α α α α + + − ∑ ∑ (15) where ) ( i j z δ denotes indicators for whether or not j z i = . This result is relevant fo r D P as any finite subset of customers’ choice-behavior p arameters’ partitions has Di richlet distributio n, and finite sample ca n onl y repre sent finite number of pa rtitions. Exchang e abilit y p roperty of partitions allows the used estimation appro ach t o sequentiall y draw custo mer p arameters given the indicator value as follows: 1 ~ ) , ( ),. .., , ( | ) , ( 1 1 ) , ( 0 1 1 1 1 − + + Σ Σ Σ ∑ − = Σ − − i G i j d i i i i j j α δ α µ µ µ µ (16) The n ext portion of t his approach’s specification is the definition of t he size of the finite clusters over the finite sample that is controlled b y π . Rossi (2014) su gg ests augmenting Sethuraman’s stick br eaking not ion for draws of π . In this notion, a unit level stic k is iterativel y bro ken fro m the tail with proportion to the draws with beta distribution with p arame ter one and d α , and t he length of the brok en portion d efines the th k ' element o f the probability measure vector π (a form of multiplicative proc ess), formally : ) , 1 ( ~ ), 1 ( 1 1 ∏ − = − = k i d k i k k Beta α β β β π (17) In this notion, d α determine s t he probability distribution of t he n umber of unique values for t he DP mixture m odel, formally by: ( ) 1 ) ( ) ( )) ln( ( ) ( ) ( , ) ( ) ( ) * Pr( − + Γ Γ = + Γ Γ = = k k i d d k d k i i k i S i S k I γ α α α (18) 35 where * I denotes the nu mber of unique values of ) , ( Σ µ in a sequence of i draws from the D P prior. ) ( k i S d enotes Sterling number of first kind, and γ denotes Euler’s constant. F urthermore, to facilitate assessment, t his approach suggests the followin g distribution for d α , rather than Gamma distribution: φ α α α α α ) 1 ( s r s − − − ∝ d d (19) where α r and α s can be assessed by insp ecting the mode o f d I α | * . φ denotes the tunable power parameter to spread prior mass appropriatel y. An alternative to Gibbs sampler emplo yed b y this approach might be collapsed Gibbs sampler that in tegrates out t he i ndicator variable for partition (segment) membership of each custo mer, but Rossi (2014) argues that such an approach does not improve th e estimation proc edure. Appendix 1.C presents the series of condit ional distribution that this approach e mploys in its Gibbs sampling to recover individual specific choice parameters. I identified the latent cumulative number of influe ntial a nd imitators of mob ile app cate gories with observed cu mulative nu mber of adopters in th e co mplete dataset. Als o to avoid ov er fitting, I us ed a no rmal p rior on the fixed soci al l earning macro d iffusion m odel to re gularize the likelihood of the model. Although the local and global aggregate data sets only have two thousands observations, Bayesian shrinkage o f parameters allows identification of parameters. To estimate the latent cu mulative number of infl uential and imitators, I used t he maximum a posteriori (MAP) method, a popular method in mac hine learning, as an alternative m ethod to Markov Chain Monte Carlo (MCMC) samp ling methods. This approach uses an optimization method to maximize the a posteriori o f the model paramete rs. I used genetic algorithm for the 36 optimization, as t he number of parameters th at I estimated for the so cial learnin g diffusion model is around 300: 210 covariance elements of state covariance matrix, 10 elements of ob servation covariance matrix , and 80 el ements of fix ed parameters of the diffusion di fferential equations. Gradient descend optimization method has complexit y of ) ( P O per iteration, bu t requires tuning learning parameter, and the quasi newton optimization method has the complexity of ) ( 2 P O per iteration, where P is the number of parameters to estimate. T his complexity translates to long run time over big data, in which the nu mber of parameters increase s with t he variety, and volume of the dat a. As a r esult, I a dopted the genetic algorithm approach that Ve nkatesan et al. (2004 ) finds comparable t o the classical gradient descend or the Quasi Newton approach. In addition, genetic algorithm is known as g lobal optimization method, in contrast to local optimization of Quasi Newton method. Given that a l atent state space model li ke mixture models has multiple local maxima, genetic algorithm might be more prone to find t he global maxima than a Quasi Newton method. In o rder t o estimate the mac ro social learning diffusion model, I used Un scented Kalman Filter (UKF) nested within a Monte Carlo Ex pectation Maximization (MCEM) method. Unscented Kalman Filter (U KF ) is an approach proposed in robotics literature (Wan and Merwe 2001, Julier and Uhlmann 1 997), which achieves third order accurac y in estimating the latent state in a state space model, as op posed to the Ext ended Kalma n Filter (EKF) that onl y achieves t he first order accurac y, with the same order of computational com plexit y , i .e., ) ( T O . T he basic idea behind UKF is that rather than usin g the closed for m first o rder tailor ex pansion term, for the measurement updating of the latent state, b y computing Jacobean vector, it uses an Unscented Transformation (UT) to transform si gma vector of points around the mean, and the mean of t he 37 latent state prior of a no nlinear state equation, to estimate th e transformed n ormal distributi on posterior parameters. I explained UKF algorithm in appendix 1.B. The MCEM approach starts with an initial vector o f para meters. The n it uses MCMC, UKF in this case, to recover the latent state d istribution, an d a set o f samples. Given the latent state samples, i t computes the expected log li kelihood, and it searches for the para meters th at maximize t his expected log likelihood ( de Valpine 2012). MCEM is ap pealing for i ts speed, compared with th e full MCMC sampling method. However, MCMC approach is n otorious for slow convergence, and both ap proaches may su ffer from finding only t he local maxima. The exercise of global opti mization geneti c algorithm stochastic s earch m ay be a remed y to this stochastic surface search problem. In the optimization, I used transformation to make sure that the market sizes of t he social learning diffusion mode l ar e positive, and parameters of effect of learning f rom imitator an d influential in the imitat or state eq uation ( w ), and th e se gment size of influential and imitators ( θ ) are between zero and one. I used just in time compiler in R to speed up the estimation process. In summary, I used MCEM, UKF, MAP, GA, a nd SUR methods to estimate the soci al learnin g model on t he aggregate sample d ata, which consists of the a ggregate number of adoption o f twenty thous and adopters of mobile apps in t en mobile app categories for two hundred da ys, and I used MCMC sa mpling to estimate the mixture normal multinomial log it model of the micro mobile app choices of a hundred fort y seven customers in ten app cate gories over twenty weeks. 38 1.7. RESULTS Table1.5. presents the log -likelihood o f the proposed models. Model 1 and 2 represent social learning a ggregate diffusi on models over local adoptio n (onl y adopters wit hin one city) and global adoption (total number of adopters across 30 citi es). Local social learning mo del dominates global s ocial learning model by the likelihood. This result might su gge st that mobile app adoption process is more locall y rather than global coordinated. Model 3 and 4 use the filtered number o f imitators as a m easure f or social i nfluence, and model 5 and 6 use the observed number o f ado pters as a measure for social influence. Domination of model 5 and 6 over model 3 and 4 b y l og li kelihood mi g ht suggest that not only number of imitators but other social factors might be th e drivin g factor for mobile app adoptions. T his other f actor might include the social force of differentiators (in fact the result of micro analysis reconfirms the existence of such potential). The dominance of model 5 and 6 over model 7 (the model with no social learning) suggests that in fact social learnin g is an important force that drives individual mobile app adoption cho ices (the bias in the parameter esti mates when so cial learning is i gnored is discussed later). Table 1.5. MODEL COMPARISON Model Description Number of obs. Log Lik. 1 Local Adoption (aggregate sample) 2,000 -20,724.16 2 Global Adoption (agg r egate sample) 2, 000 - 21,649.32 3 Choice Explained by Local Im itators Signals (micro sa mple) 22,644 - 25,921.92 4 Choice Explained by Global Im itators Signals (micro sample) 22,644 - 38,310.49 5 Choice Explained by Local Adopte rs Signals (micro sam ple)* 22,644 - 12,252.85 6 Choice Explained by Global Adopters Signals (micro sample) 22,644 - 15,275.20 7 Choice with No soc ial influence measure (micro sampl e) 22,644 - 15,977.04 * dominant model Finally, do minance of model 5 over mode l 6 reconfirms the result from aggregate model that social learning at local level (within the city) rather than global level ( for example ov er the world 39 wide web) driv es the adoption cho ices of the customers. This findin g for mobile apps (as a form of pervasive good) contr asts with f indings about t he adoption o f tradi tional go ods that emphasize the importance of learning over W orld Wide W eb (Putsis et al. 1998). Figure 1.5. 1-Step-ahead Forecast for Local Diffusion (Green Line: a step ahead; Red line: the actual) Figure 1.6. 1-Step-ahead Forecast for Global Diffusion level (Green line: a step ahead, Red line: the actual) Table 1.6. Performance of the Proposed Model for local and international c ategory adoption Description MAD MSE Local Category Adoption 0.64 1.48 International Category Adoption 0.03 0.12 40 Figure 1.5 and 1.6 pr esent a step ahead forecast versus the observed cumulative number of adopters at both the lo cal and the global lev el. This visualization together with th e Mean Absolute Deviation (MAD) and the Mean Square Er ror (MSE) presented in table 1.6, su ggest that social l earning macro diffusion mo del fits t he app-store platforms’ macro app diffusion data reasonably well. I b enchmarked the esti mates wi th Van den Bulte and Joshi (2007 ) paper, and relative to th e market size the MSE is in reasonably good range. Table 1.7. Factor Loading Matrix (Varimax rotation) Loadings/Components C1 C2 C3 Average File size (a proxy for app qualit y) 0.77 -0.07 -0.09 Dummy variable of Is Featured 0.82 0.3 0.01 Average Price -0.06 0.94 -0.28 Variance of Price -0.05 0.94 -0.19 Number of Paid app Options 0.97 -0.09 -0.09 Number of Free app Options 0.96 -0.15 -0.03 Fraction of Free apps to Paid Apps -0.09 -0.25 0.87 Average Tenure (time from creation) -0.08 0.67 0.48 Total number of app Options 0.96 -0.14 -0.03 Table 1.7 presents the result of the factor analysis to extract the latent factor of mobile app characteristics. I used Vari max rotation to be able to in terpret the factors. I named the factors both from supply side and the demand side in table 1.8. Table 1.8. Factor Names Factor Supply side Name Demand Side Name C1 Red Ocean app categories Popular Apps C2 Paid app categories Investment Apps C3 Free app categories Freemiums 41 I limited th e factor/princi ple components t o th ree, as it captures alread y 0.85% of th e variation in the data. Number of paid mobile apps and free mobile apps load highly into the first factor, s o I expected that there is high demand for these mobile apps that has brou ght app developer/publishers to develop many m obile apps. Further, Ghose and Hann (2014) use the average file s ize of a mobile app as a prox y for t he qualit y of the mobile apps, and this mobile app catego ry f eature al so loads highl y into the first factor, s o I ma y be able to call the first factor as popular mobile apps. The average and v ariance of the prices load h ighly into the second factor. I called t he secon d factor investment mobile apps, guided by discussion about figure 1.4 (in the data s ection), and I refer to th e third factor b y Freemiums, because the fra ction of free mobile apps is much higher than paid mobile apps. Table 1.9. PARAMETER ESTIMATES: Global Ad option in f p in f q imm p imm q inf M imm M w θ Mobile App Categories: Device Tools 0.024 0.00 0 0.278* 0.192* 50* 580* 0.010* 0.039* eBooks 0.024 0.00 0 0.274* 0.189* 2 60* 4540* 0.007* 0.044* Games 0.026 0.00 0 0.293* 0.202* 80* 1150* 0.009* 0.046 * Health/Diet/Fitness 0.025 0.00 0 0.288* 0.199* 1 00* 1600* 0.008* 0.048* Humor/Jokes 0.026 0.00 0 0.297* 0.205* 1 00* 1410* 0.008* 0.043* Internet/WAP 0.026 0.00 0 0.296* 0.204* 1 00* 1580* 0.010* 0.039* Logic/Puzzle/Trivia 0.024 0.00 0 0.275* 0.190* 90* 1440* 0.009* 0.039 * Reference/Dictionaries 0.026 0.00 0 0.296* 0.204* 90* 1440* 0.007* 0.048 * Social Networks 0.026 0.00 0 0.297* 0.205* 40* 390* 0.008* 0.044* University 0.025 0.00 0 0.281* 0.193* 1 30* 2080* 0.009* 0.046* * 05 . 0 ≤ p Tables 1.9 and 1.10 pre sent t he parameter estimates o f t he social lear ning dif fusion models studies over global (across the cit ies within the app s tore) and local (withi n t he cit y of interest). Over both l ocal and global diffusion data, the independent random adoption rate for individuals in influential segment is not significant statisticall y across different mo bile app categories, 42 except for eBooks ( which might be driven by its assortment size). However, this rate is significantly higher than dependent adoption rate for t his segment, which might su ggest that t he model is properl y identifying the behavior of the segment of influ entials. For influential segment, the rate of independent adopti on is very close to the same rate for Ever clear music CD that Van den Bulte and Joshi (2007) find. How ever, for t his se gment, the dependent rate o f adoption is similar to the same rate for foreign language CD adoption in the mentioned stud y. This result might be driven by the low search cost of i nfluential segment on the a pp-store, which i n t urn drives their learning less from others. Table 1.10. PARAMETER ESTIMATES: Local Adoption Mobile App Categories: in f p in f q imm p imm q inf M imm M w θ Device Tools 0.025 0.0 00 0.282 * 0.194 * 5* 80* 0.10 0* 0.046* eBooks 0.024 * 0.000 0.278 * 0.191 * 1 03* 1952* 0.032 * 0.044* Games 0.024 0.0 00 0.275 * 0.189 * 3* 56* 0.24 6* 0.038* Health/Diet/Fitness 0.025 0.0 00 0.282 * 0.194 * 7* 120* 0.78 2* 0.038* Humor/Jokes 0.026 0.0 00 0.299 * 0.206 * 6* 99* 0.50 6* 0.041* Internet/WAP 0.025 0.0 00 0.285 * 0.197 * 1 1* 2 00* 0.73 8* 0.041* Logic/Puzzle/Trivia 0.025 0.0 00 0.282 * 0.194 * 6* 113* 0.34 4* 0.043* Reference/Dictionaries 0.026 0.0 00 0.299 * 0.206 * 1 2* 2 25* 0.94 0* 0.042* Social Networks 0.025 0.0 00 0.281 * 0.193 * 3* 48* 0.65 8* 0.040* University 0.025 0.0 00 0.281 * 0.194 * 6* 113* 0.55 5* 0.042* * 05 . 0 ≤ p For imitator segment, i n almost all t he categories rate o f independent adopt ion (mean of 0.288) is greater than rate o f dependent adoption ( mean of 0.198). For this segment, the in dependent rate of adoption is si gnifica ntl y more than th e same rat e for goods proposed in c lassical econom y that Van den Bulte and Joshi (2007) repo rt. T his difference can be driven by the low search cost of mobile app s for imitators. The dependent rate of ad options is similar to the same rate for Everclear music CD that Van den Bu lte and Joshi (2007) report. For global adoption, across t he mobile app categories the weight of influenti al in driving imitators’ d ependent choice o f 43 adoption is 0 .009 which is similar to th e sa me parameter for E verclear mu sic CD. How ever, this rate is 0. 50 for lo cal adoption , which is similar to the same rate for John Hiatt music CD (Van den Bulte and Joshi 2007). The size of influential segment in the observed sample for g lob al adopter data is 0.044 and for local adopter d ata is 0.042 wh ich is ver y si milar to th e same rate fo r Everclear music CD (Van den Bulte and Joshi 2007 ). To sum up, these r esults might suggest th at customers adoption behavior for mobile apps is very si milar to music CD adoptions, except that the independent rat e of adoptions for imitators are hi gher, but the dependent rate of adopti ons for influentials is less, driven by the lower search cost. Table 1.11 summarizes the individual parameters distribu tion for the choice model that uses local number o f adopters (unfiltered density) as a proxy for so cial influence. The negative mea n for the preference parameter for each mobile app categories indicates higher p reference of outside options for customers. In the cit y und er the study, the customers prefer mobile apps in Health/Diet/Fitness, Ga mes, Internet/ WAP, and device tools relatively more t han mobile apps in social n etwork, ebooks, and Humar/Jo kes categories. Relative to the apple app-store popularity statistics (presented in figure 1.3 in data section), the surprisin g result is h igh preference of the customers for Health/Diet/ Fitness mobile apps. This in formation can help this app-store to target its marketing communication message by highlighting this mobile app category. The mean for the distribution of download history state p arameter is negative and si gnificant. This negative effect of history suggests that this app-store is not doi ng well in retaining the customers, perhaps for its appearance and its nonoptimal shopping shelf. H owever, the effect o f social i nfluence is positive and significant, whi ch suggests that there i s positiv e spill-over (possibly because of awareness effect) of adoption with in the population. 44 Table 1.11. PARAMETER ESTIMATES: Individual Choice effect (Local Adopters) Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -6.22* 5.04 -14.327 -2.669 eBooks 2 α -11.34* 3.14 -15.290 -6.381 Games 3 α -4.35* 3.76 -11.222 -2.296 Health/Diet/Fitness 4 α -4.1 2.18 -5.939 2 .982 Humor/Jokes 5 α -16.32* 5.85 -22.097 -9.715 Internet/WAP 6 α -5.41* 2.29 -8.021 -3.021 Logic/Puzzle/Trivia 7 α -14.2* 3.49 -18.122 -8.332 Reference/Dictionaries 8 α -8.48* 1.92 -11.092 -4.547 Social Networks 9 α -10.54 3.47 -15.530 0.076 University 10 α -5.78* 1.39 -7.791 -2.916 States: Individual download history State 11 α -27.27* 5.46 -34.350 -13.821 Latent imitation level 12 α 0.02* 0.01 0 .011 0.035 App category characteristics (factors): Popularity of app category 13 α 1.32 0.63 -0.830 1 .767 Investment apps category 14 α 5.34 1.75 -0.922 7 .230 Hedonic apps category 15 α 7.13 4.28 -6.606 10.330 * p<0.05 Appendix 1.D presents the sa me result table for choice model wit h l ocal imitators, models with global i mitators/adopters, and model with no social influence. The model with no so cial influence u nderestimates the preference for mobil e ap ps almost in all the categories except for eBook, Humor/Jokes, Refe r ence/Dictionary, and Un iversity. In addition, this mo del underestimates the effect of popularity, i nvestment, and free cha racteristics o f the mobile apps. 45 In summary, a model that does not account for so cial influence returns bias estimates for the parameters. Table 1.12. PARAMETER ESTIMATES: Individual Choice Hierarchical Model (Local Adopters): CustomerTenure (number of days since registeration on the app-store) explanation of the effects Parameter explained by Tenure Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -0.00044* 1.01E-04 -0.00058 -0.00023 eBooks 2 α -0.00048* 2.63E-04 -0.00087 -0.00006 Games 3 α -0.00041* 4.46E-05 -0.00049 -0.00032 Health/Diet/Fitness 4 α -0.0008* 7.30E-05 -0.00092 -0.00061 Humor/Jokes 5 α -0.00091* 2.49E-04 -0.00126 -0.00046 Internet/WAP 6 α 0.00011 7.58E-05 -0.00002 0.00025 Logic/Puzzle/Trivia 7 α -0.00056* 1.29E-04 -0.00081 -0.00035 Reference/Dictionaries 8 α -0.00028 1.50E-04 -0.00046 0.00002 Social Networks 9 α -0.00001 9.45E-05 -0.00016 0.00020 University 10 α 0.00018 1.36E-04 -0.00007 0.00034 States: Individual download history State 11 α -0.00136* 3.33E-04 -0.00193 -0.00081 Latent imitation level 12 α 0.00006* 7.64E-06 0.00004 0.00007 App category characteristics (factors): Popularity of app category 13 α 0.00001 2.06E-05 -0.00003 0.00005 Investment apps category 14 α -0.00006 6.77E-05 -0.00016 0.00007 Hedonic apps category 15 α 0.00021 1.35E-04 -0.00003 0.00043 * p<0.05 Table 1.12 presents correlation between customer tenure (number o f d ays sin ce re gisteration on the app-store) and choice parameters of customers. Those who register early t o the app-store (potentially with innovator personalit y) have higher preference for mobile apps in Internet/WAP 46 and university mobile a pp categories. This corr elation might be r elevant as mobile innovators might be more int erested in i mproving thei r performance oriented apps. In additio n, these customers are more sensitive to download history and social influence. This result is aligned with the chasm o n the product life cycle theories that argue that if the prod uct does no t pass the acceptance of early adopters it will fall into the chasm, leading to earl y failiure. Figure 1.7. PARAMETER DISTRIBUTION: Hetero geneity in Individual Choice (Local Adopters) F Device t ools cat. E - book cat. Logic/ puzzle Trivia cat. Reference/ Dictionaries Games cat. Health/ Diet/ Fitness app cat. So cial NW cat. Universit y cat Humor/Jokes c a t. Internet/ WAP cat. Individual State Imitators d ensity Popul ar apps Investment apps Freemiu m apps 47 Figure 1.7 presents the distribution of choice parameters. This distri bution has heavy tail, which highlights the importance of allowing for flexible heterogeneity distribution for the choice parameters. Table 1.13. PARAMETER ESTIMATES: Individual Choice effect (Local Adopters) Total number of users: 1258 Positive Significant Negative Significant Category specific preference: Device Tools 1 α 0 1253 eBooks 2 α 0 1258 Games 3 α 0 1258 Health/Diet/Fitness 4 α 53 1205 Humor/Jokes 5 α 0 1258 Internet/WAP 6 α 0 1258 Logic/Puzzle/Trivia 7 α 0 1258 Reference/Dictionaries 8 α 0 1258 Social Networks 9 α 53 1205 University 10 α 0 1258 States: Individual download history State 11 α 0 1258 Latent imitation level 12 α 1257 0 App category characteristics (factors): Popularity of app category 13 α 1205 53 Investment apps category 14 α 1205 53 Hedonic apps category 15 α 1205 53 Targeting is a relevant appli cation of micro choi ce modeling f or app-stores. Table 1.13 presents the distribution o f significance and si gn o f each of the choice parameters a t individu al customer 48 level. Knowing the distr ibution of negative and p ositive response helps th e app-store to tar get 53 customers that do n ot prefer h ealth/diet/fitness or social network m obile app s. This correct targeting might help improving the usabilit y of the app store. 1.8. COUNTERFACTUAL ANALYSIS The advantage of the indivi dual specific choice model for the app-store platforms is th at it allows estimating the imp lications of the social influence p olic y for total expected adoption b y simulation. I r an three counterfactual scenarios u sin g the estimated choice mod el b y modif y ing the level of so cial influence. Furthermore, I us e t he esti mated model to find th e opt imal d ynam ic level of social in fluence t o maximize the diffusion over th e app-store platform. Formall y, I solve the following optimization problem: t j c c u u imm jt imm jt T t J j I i J j ijt ijt c imm jt , , ) exp( 1 ) exp( 1 1 2 1 1 } { max ∀ ≤ ∑ ∑ ∑ + − = = = = ∑ (20) Table 1.14 p resents the implications of each of these four policies. Surpr isingly shutti ng down the social influen ce improves the t otal exp ected adoptions of mobile apps on this app-store platform. This fu rther confirms th at this platform do es not have enough quality to retain its customers. How ever, an optimal so cial in fluence policy shows 13.6% increase in total expected adoptions of the platform . This optimal policy decreases adoption of mobile apps in Reference/Dictionary categor y, bu t increases the ex pected adoption o f mobile app categories in Logic/Puzzle/Trivia, d evice t ools, and Ga mes the most. A co mmon characteristic of these thre e categories is their popularity, so I tried to explain t hese i mprovements b y optimal p olicy with the popularity of mobile apps in each of the categories ov er the Apple’s app-store. 49 Table 1.14. COUNTERFACTUAL ANALYSIS: Ch ange in the adoption level by intervening social influence Category specific counterfactual results: original expected adoption shut down social influence 1% more social influence 1%less social influence An optimal social influence Device Tools 875.83 -57% 0.8% -0.7% 55.8% eBooks 189.45 -1% 0.0% 0.0% 2.5% Games 187.51 19% -0.3% 0.3% 58.6% Health/Diet/Fitness 22.21 0% 0.0% 0.0% 6.1% Humor/Jokes 255.09 0% 0.0% 0.0% 0.6% Internet/WAP 1042.20 23% -0.4% 0.5% 14.1% Logic/Puzzle/Trivia 249.12 25% -0.6% -0.2% 109.1% Reference/Dictionaries 1262.09 16% -0.4% 0.3% -36.2% Social Networks 21.66 0% 0.0% 0.0% 0.3% University 18.08 -1% 0.0% 0.0% 1.5% Total improvement 4123.25 1% -0.1% 0.1% 13.6% Table 1.15 p resents the result of regressing th e improvement u nder opt imal s ocial influence policy on popularit y rank of the m obile app category. The correlation b etween mobile ap p category popularit y and the improvement under o ptimal po licy is p ositive and si gnificant. This result suggests that more p opular mobile app cat egories have m ore improvement unde r o ptimal policy. This result indicates th at thi s app-store ca n improves its adopti on by 13.6% if i t can use social influence to increase the adoption of more popular mobile app categories. Table 1.15. COUNTERFACTUAL ANALYSIS: E xplain optimal social influence improvement with popularity rank of the app categor y on the app-store Coefficients Standard Error p-value t Stat Intercept -0.185 0.147 0.245 -1.254 Category popularity rank 0.050* 0.015 0.010 3.388 * p<0.01 Finally figure 1.8 presents social influence level for this optimal policy. This policy su ggests early increase in the social influence by potentially a viral marketin g campaign. 50 Figure 1.8. COUNTERFACTUAL ANALYSIS: an optimal social influence strategy to increase expected adoption level b y 14% (log scale) 51 1.9. CONCLUSION In this paper, I develop ed an approach that co mbines macro diffusion mo del with micro choice model to allow app-stor es to target th eir customers and proposed Dirichlet Process to model customers’ heterog eneity, and Unscented Kalman Filter to estimate social influence measure. Then, using a large data set from an African a pp-store, I showed that social influence is an important factor in determining adoption choic e of customers. M y results demonstrate that ignoring social influence in modeling customers’ adoption can bias the choice parameters’ estimates. Furthermore, m y results indicate th at social influence on mobile app adoption choices is effective l ocally (within the city of the study ) rather than g loball y (over the inter net). I benchmarked the mobile app adoption process against the s ame pr ocess for classic al economy goods, and I find that mobile app adoption process is similar to the same process for music CDs. I further illustrated how estimated model can be used to analyze co unterfactual scen ario where the app-store pl atform optimizes its intervening social influence. This counterfactual analysis showed that, if this app store runs viral marketing campaign focusing on more popular mobile app categories, it can increase its total adoption by 13.6%. I believe t hat m y modelin g approach, proposed estim ation method, and der ived empirical insig hts in this paper can be of interest to both practitioners and scholars in academia. 52 CHAPTER 2 DO BIDDERS ANTICIPATE REGRET DURING AU CTIONS? AN EMPIRICAL STUDY OF AN AFRICAN APP-STORE Meisam Hejazi Nia Naveen Jindal School of Management, Department of Marketing, SM32 The University of Texas at Dallas 800 W. Campbell Road Richardson, TX, 75080-3021 53 2.1.ABSTRACT I developed a structural model that accounts for b idders’ learning and their anticipation of winner and loser regrets in an auction platform. Winner and loser regrets are de fined as regretting for paying too much i n c ase of winning an auction and reg rettin g for n ot bidding hi gh enough in case of losing it, respectively. Using a l arge data set from eBa y an d e mpirical Bayesian estimation metho d, I quantif y the b idders’ anticipation o f regret in variou s pr oduct ca tegories, and investigate the role of exp erience in ex plaining the bi dders’ regret and learning behaviors. I also showed how the results can be used to increase eBay's revenue significantl y . The counterfactual analyses showed that shuttin g down th e bidder r egret via appropriate notifi cation policies can increase eBay’s revenue b y 24%. Keywords: winne r and loser regret in auctions, affiliated valu e auction, emotionall y rational bidders, Bayesian updating structural model 2.2. INTRODUCTION It is not uncommon to regret one’s own bidding decision at the end of an onli ne auction. Whether this is about regretting fo r g iving up too easily on a b idding war or regretting for losing self- control and bid ding too high, bidders more than rarely feel discomfort ab out their final bids. eBay fo rums are filled wi th questions like “I won an auction but regret: What can I do?”, and, in fact, eBa y tries to educate its us ers for bidding without reg rets 5 . Winning a n auction on eBa y is a contract to complete t he sale, and n ot honorin g th is contract has seri ous consequences, includin g 5 Bertolu cci, Jeff. "Bi g Data Anal y ti cs: Descripti ve Vs. Predictive Vs . Prescr iptive - Informatio nWeek." Information Week. Decem ber 31, 201 3. Accessed March 23, 201 6. http://www.info rmationweek.com/bi g- data/bi g-data-analytics/bi g-data-analytics -descripti ve-vs-predictive-vs-prescr ipti ve/d/d-id/1113279 . 54 being banned fro m any transa ctions on its website. Therefore, the desire to avoid these consequences, along with th e bad experiences one can have about t he product, seller, or one’s own biddin g behavior or even the common sense lead to an anticipation of end-of -auction regret during the bidding period. I consider two t ypes of re gre t that are studied in the auction literature: Bidders mi ght feel winner regret when they win a n auction but feel th e y pay too much, since their winning d epends on them bein g the m ost o ptimistic among th e auction participants abo ut t he m arket value of the auction i tem and/or the honesty of the seller (Bajari and Horta csu 2003b). On the other hand, a bidder might feel loser regret when she loses an auction in which the winnin g bid t urns o ut to be less than h er valuation of the item. Cl early , the latter type of regret realizes when the bidders bid naively (o r strategically) instead of bidding t heir true valuations o f t he items. Intuitively, anticipation of winner (resp. lo ser) re gret should mak e th e bid ders low er ( resp. incre ase) their bids. There are many s tudies in the auction literature t hat show that the anticip ation of bo th t yp es o f regret s ignificantly affects the bidding behavio rs of auction participants in va rious settings. Experimental studies such as Filiz Ozbay and Ozbay (2007) a nd Engelbrecht-Wiggans and Katok (2008) study these effects under th e fi rst-price seal ed-bid auction setting and s how that they have si gnificant imp lications on the bidding behavior. Although, in th eory, the second price nature of eBay auctions implies that bidders should not experience winner regret (see, for example, A riel y a nd Si monson 2003), Bajari and Hortacsu (2003a) and Yin (2006) investigate 55 the ``winner’s curse 6 ” in e Bay auctions and suggest that eBay bidders anticipate it too, and hence act strategically 7 . I also focus on eBa y ma rkets in this paper. Auction platforms, such as eBay, act as a two sided market by connecti ng selle rs and bidders without taking ownersh ip of the auction item. Au ctions in these platforms involve a vast amount of d ifferent s ellers, bid ders, and p roducts in d ifferent categories, and h ence th ey exhibit a hi gh level of heterogeneit y in behavior. Al though many descripti ve an d predictive t ools are studied in the auction li terature to deal with th is heterogeneity and large data sets (e. g., Park and Bradlow 2005; Bradlow and Park 2007; Zeithammer and Adams 2010), the prescriptive anal yses r emain limited 8 . However, counterfactual anal yses relyin g on struct ural models that control for consumers’ decision proce sses can have remarkable contributions. One su ch contribution is being able to investigate the effects of notifi cation policies (such as notifyin g bi dders about the similar auctions in th e past) on the platform reven ue. To work towards filling this gap, in this paper, I developed a str uctural mo del to explain the bidder behav iors in a n online auc tion platform. Considering all the req uirements for a v iable exp lanation of the auction platforms, I asked the following questions: Can I design a computationall y tr actable s ystem to estimate bidders’ bidding behaviors in an online auction platform? To what extent do bidders ant icipate w inner 6 I use th e term “wi nner regret” to r efer to the explai ned phenom enon, but “win ner’s c urse” is also used in the literatu re. 7 Zeith ammer and Adams (2010) suggest that sealed-bid se cond price auction is not a good abs traction f or eBay auctions . Bidders’ nai vety can also r esult in this in consistency. I comm ent mo re on this issue in Res ults s ection. 8 Bertolu cci, Jeff. "Bi g Data Anal y ti cs: Descripti ve Vs. Predictive Vs . Prescr iptive - Informatio nWeek." Information Week. Decem ber 31, 201 3. Accessed March 23, 201 6. http://www.info rmationweek.com/bi g- data/bi g-data-analytics/bi g-data-analytics -descripti ve-vs-predictive-vs-prescr ipti ve/d/d-id/1113279 . 56 and loser r egret and how do the y var y i n bidders’ ex perience and learnin g behavior? What is t he effect of intervening bidders’ regret by notification policies on the auction platform’s revenue? To answer th ese questio ns, I develop ed m y m odel co nsidering many important asp ects o f bidders’ behaviors. In particular, I account for the emotionally laden contex t of auctions where, in addition to the re gret anticipation, the bidde rs do no t know the i tem’s market v alue and l earn about the val ue o f the product during t he bidding process, for ex ample, by gaining additional information about the auction item or reso lving some of th e uncertainties a bout the seller o r about t heir own n eeds (Hossain 2008, Zeitham mer and Adams 2010, O kenfels and R oth 2002). I also consider the fact that bidders tend to bid incr ementally i n online auctions (C hakarvarti et al. 2002; Z eithammer and Adams 2 010), and h ave different levels of ex perience wh ich a ffects th eir bidding behaviors (Arie ly et al. 2005; W ilcox 2000; Srinivasan and Wang 2010). I further ta ke into account both common and priv ate value component s of the auction item (i.e., affiliated value), and bidders’ lea rning from the current highest bid during th e biddin g process. Due to the emotionally laden context o f eBay auctions, I assume that bid ders might l ose their global f ocus, as Ariely and Simonson (2003) su ggest, so they show in ertia and generally do not sear ch across auctions (Haruvy and Leszcyc 2010). I used the followin g estimation strategy to i dentify the parameters of my model: First, at each discrete time period durin g the bidding process, I modeled the utilit y of a bidder consisting of her expected profit, i. e., the difference between her valuation and bid, and anticipated winner and loser regrets. Second, assuming the observed bid at each period is the one that maximizes t he bidder’s utilit y for a given valuation level, I derived –using the first order condition for the uti lity function— the bi dder’s revealed latent v aluation f or t he auction item at th at time p eriod. Finall y, 57 I assumed that this deriv ed l atent valuation (whic h is n ow a function o f the b idder re gret, a mong others) c onsists of a common value, a p rivate value, and a component consisting of bidder’s learning th e value of the auction item from the highest observed bid. This approach allows me to identify the regret parameters. More sp ecifically, sin ce there is a common value component of t he auction item, the nu mber o f bidders also matter in this process. To account for the bounded rationality and incomplete information in th e model , I considered that the bidders p erceive t he ob served bid and nu mber o f bidders as a noisy measure of the lat ent bid an d th e latent n umber of bidder s. I m odeled t he bidders’ Ba yesian learnin g of the latent bid and the latent number o f bidders usin g Kalman Filter theory, which Jap a nd Naik (2008) introduced to the auction literature. In this structure, I assumed that bidders’ beliefs about the latent bid and latent nu mber of bidders follow a first order Markov process. Similarly, I assumed that the com mon v alue ele ment of valuation follows a first order Markov process as well, with a drift and a common time var ying signal. To account for het erogeneity and to av oid ov er-fitting the data with a la rge number of parameters, I clustered t he bidders and utilized the eBay-specified a uction clusters, and shrunk the b idder sp ecific ( regret and valuation) and auction specific (evolution of bids and the number of b idders) parameters wit hin these bidder and au ction clusters. I used a mixture normal distribution model to cluster the bidders usin g their observed ch aracteristics, which are used as proxies for b idders’ experience l evel. F in ally, I optimized the Maximum a Posteriori (MAP) of the model given the seg ment and cluster m embership of each of the bid ders and auction items over a large eB ay data set. To optimize MA P, I used simulated annea ling method, which is a 58 metaheuristic global optimization method used both in rob otics and portfolio optimization in finance (Crama and Schyns 2003; Zhuang et al 19 94) 9 . I estimated this model over an eB ay data set that I crawled and scraped from the web in May 2014. This samp le consists of arou nd 58,000 bi ds of around 12,000 bidders in around 1,600 auctions that offered items for sale in 19 differen t categories presented in Table 1. The estimation results show that, in all auction categories, both w inner and loser regrets are si g nificant and I find a positive relationship between winner and loser regret. I also find that t hose who are more regretful stick to st atus q uo, i.e., the y update their valuations less frequently a nd learn less from others. Furthermore, I find that ex perience can ex plain the heterogeneity in the bidders’ l earning, updating, and regretting behavior. I further us ed the esti mated model to anal yze a counterfactual scenario where the auction platform shuts down the bidders’ winner re gret. This anal y sis shows that, i f an auction platform can shut down winner regret of bidders b y its notification policies, i t can increase it s revenue by 24%. I also observed that shutting down winner r egret can cause the highest bid to increase two to four folds in some auctions. Using notification policies to affect the bidder s’ behaviors is not uncommon in eBa y. For instance, my p ersonal interview wit h an eBay scho lar sug gested that eBay is conc erned about bidders’ loser regret that might le ad t o a potential churn effect. The refore, they use notifications to inform the bidd ers who mig ht lose th e aucti on if they do not change their bids. Empirical evidence o f si gnificant (winner and/ or loser) r egret of bidders might i nvoke using similar 9 I used t his optimizati on approach beca use the Quasi Newton, Bro yden–Flet cher–Goldfar b–Shanno (BFGS) optimi zation, or Ba y esian sampli ng methods are computationall y intractable o ver large d ata sets. 59 notification policies in onli ne auction platforms 10 . Such policies are studied in the experimental literature as wel l, and s hown to be effective in in fluencing the regret level s of bi dders (see, for example, Engelbrecht-Wi ggan s and Katok 2008). I believe the cont ributions of my paper c an be of intere st to both pr actitioners and s cholars in academia. M y contributions are th reefold: F irst, I consider bidders’ anticip ation of winner and loser regret in the affiliated value settin g, and propose a t ractable e mpirical B ayesian method to estimate a structu ral mo del of bidder demand in an online au ction platfor m. This model allows the auction pla tforms to run co unterfactual scenarios. In this way, I contribute to the l ine of descriptive and predictive auction models for auction customer r elationshi p management (Bradlow and Park 2007, Park and Bradlow 20 05, Jap and Naik 2008, and Zeithammer and Adams 2010). Second, I model the learning and af filiated value of bidders, and, by allowing for in cremental valuation revelation in the proposed model, I allow for the incremental naïve biddin g b ehavior. The importance o f these features in a model a re emphasized b y Okenfels and Roth (2002), Hossain (2008), and Zeithammer and Adams (2010). T his aspect of my model contributes to t he stream of papers t hat model co mmon and privat e value auctions structurally (e.g., Laffont et al. 1995, Bajari and Hortacsu 2 003, Haile et al. 20 03, and Haile and Tam er 2003). Unli ke these papers, I consider the auctions as e motionally laden social contexts. To the best of my knowledge, I am the first to model the bidders’ Bayesian le arning and aff iliated value updating processes to account for bidders’ updating their uncertain valuatio ns. 10 Notificatio ns providing i nformation ab out similar auction items, su ch as the highest bids, paid am ounts, an d numbe r of bidders in thos e auctions can help to in fluence the win ner regret o f bidders. Simil arly, more granular inform ation about th e sellers mi ght help with the trust issues, which again can affect th e winner regret level. 60 Third, I contribute to the auction regr et literature by proposin g a method that identifies regret parameters structurall y usin g field data. Importance of bidders’ anticipation of winner and l oser regret are emphasized in th e literature, fo r example, b y Ariely and Simonson (20 03), Filiz Ozbay and Ozbay ( 2007), a n d Enge lbrecht W iggans and Katok (2008). The latter two studies use experiments to show that noti fication polic y can affect the b idders’ f eeling r egret and potential over and un der bidding. I used real co mpany data in t his paper, an d my st ructural modeling approach allows the auction platforms to qu antify the impacts of new p olicies, target dif ferent bidders, and customize their operations conditional on bidder behaviors. The rest of t he paper is organized as follows: In Se ction 2, I r eview t he relevant l iterature. Section 3 provides a detailed description of dat a. Section 4 describes m y structural model and how the empirical B ayes ian method can be u sed to estimate the parameters of the model using eBay d ata. I interpret the estimation results in Section 5, and explain how I can use the estimated parameters in t esting a c ounterfactual sc enario w here the auction pl atform shuts d own the bidder regret in S ection 6 . Next, I test the robustness of some of my assumption s and methods in Section 7 . Fin ally, Section 8 presents my concludin g re marks and discus sion for futu re research directions. 2.3. LITERATURE REVIEW My paper resides at the int ersection of fou r s treams of litera ture: ( 1) c ustomer re lationship management using aucti on big data; (2) b ounded rationalit y, trembling hand, l earning, and the affiliated v alue of bidders; (3) the e motionally rational or regretful bidders; (4) the theoretical, 61 experimental, and empirical studies o f auctions. I explore each one of them in th e following sections. 2.3.1. Customer Relationship Management of Auction Platfor ms Numerous s tudies i nvestigate th e beh avior o f b idders on the auction platform t o estim ate the demand or to extract info rmation that ca n be used i n customer relationship management and targeting. F or exam ple, Park and B radlow (2005 ) propose a stochastic m o del to identify the bidders, the co nditions i n which they bid, and t he amount of their bi ds, which are useful for customer relationship management. B radlow an d Park (2007), further extend their research by proposing a record brea king st ochastic approach to recover t he latent n umber of bidders i n the context of the first pri ce auction. Both studies acknowledge that e mpirical literature has demonstrated flaws in t he theo retical predi ction o f auctions, but they argu e t hat these fl aws can be corrected b y a model which accounts for the beha vioral aspect of the bidders’ decision making. More recently, another predictive study conduc ted b y Jap and Naik (2008) uses the Kalman Filter theor y to develop a ``Bi d Anal yzer” that allows one to estimate the distr ibution of auction p articipants’ l atent bids . Al l of these papers call for new studies to model the structural aspect of bidders’ decisions for polic y experimentation. To model the bidder behavior structurall y, Bajari and Hortacsu (2003) make the simplifying assumption that the eBa y auction with proxy biddi ng ap proximates a secon d price auction. Their study emplo ys a data set from a 1998 coin auction on eBa y to estimate a reduced form common value model (for tr actabilit y), but i t calls f or studies that model a ffiliated v alue ( i.e., exi stence of both common and private value ele ments). Although t heir p roposed model relies on t he 62 assumption that bidders are full y rati onal, t hey tr y to recover winner curse from measuring th e amount shed b y a b idder when a new bidder enters the auction. Zeitha mmer and Adams (2010) carry ou t a se ries of statistical tests to cast doubt on the assump tions that the prox y bid ding mechanism is equiv alent to the second price sealed bid auction, and that bidders’ bids are equal to t heir valuations. They recom mend emplo ying a r educed fo rm modeli ng approach. In discussing Zeithammer a nd Adam’s stu dy (2010), Hortacsu and Ni elsen (2010) and Srinivasan and W a ng (2010) note that, althou gh some of its tests are questionable, its main h yp othesis is strongly suppo rted. Both of these commentaries c all for a st ructural model based on Zeit hammer and Adam’s new find ings, particularly those i ndicating that both naïve and sophi sticated bidders might exist on eBay, and bidders’ experience plays an important role on their bid ding behaviors. Yao and Mela (2010) take the auction platform as a two sided market, and jointl y model the choices of bidders and sellers structurally to extract the value of the customer lifetime and the impact of the c ommission policy on the aucti on p latform r evenue. T he y consi der only one auction cate gory and model b idders’ dis utilit y in the form of hi storical cost function, rather than in the form of winner and loser regret. In parallel with their paper, Haruv y and Leszczyc (2010) also model the disutil ity of biddi ng in the fo rm of the in ertia o f bidde rs within an auction. Th ey attribute this inertia to search cost. All of th e above studies unanimousl y suggest that future polic y experiment s is po ssible onl y b y a structural model of bidder learning in which hetero g eneity is explain ed b y ex perience measures. They also agree th at suc h a study should model common an d private v alu es join tly, in t he form of an affiliated val ue model. Built over the above s tudies, I mod el the bid ders’ anticipation of winner and l oser regret structurally. Understan ding such consu mer be haviors can ben efit the 63 auction platforms b y allowing it to target its policies toward helpin g naïve consumers learn, if such learning is predicted to improve the revenues of auction platforms. 2.3.2. Bounded Rationality, Learning, and Affiliated Value of B idders There a re many papers discussing that consumers are bounded rational, an d their a ction is subject to flaws (see, for example, Simon 1972; Selten 1975; Kahne man and Tversky 1979; Tversky and Kahneman 1992; Camerer and Weber 1992; Hey and Orme 1994; Camerer and Ho 1994; Kah neman 2003). Various theories explain why consumers behave bounded rationally, from decision making and p sychology perspectives (Simon 1 972; Ellison 2006; Sala nt 2011; Kaufman 1999). Bounded rationalit y is refe rred to as naïve bidding in the auction context. Naive bidders are known to bid in an ad-hoc manner or b y matching their bids wit h o thers bids. In particular, Ely and Hossain (2008) define the naive bidd er as the on e who acts as if the amount she pays conditional on winning equals to her bid in eBay au ctions. Additionally, Okenfels and Roth (2002) also define naïve and inexperienced bidder as a bid der who mistakenly treats the eBa y au ction as an E nglish first-price au ctions in which the winner pa ys the maximum bid. Furthermore, Kagel et al. (1987) posit that the dominant strategy equilibrium does not or ganize second-price auction outcomes, as bids consistently ex ceed private values. Other studies posit that experience and l earnin g can reduce bidders’ b ounded rationalit y , fostering more rational behavior. F o r example, Wilcox (2000) fi nds that experience leads to b ehavior which is more consistent w ith auction theory althou gh the proportion of e xperienced bidders who behave in a manner inconsistent with the theory is quite large. 64 Ariely et al. (2005) find t hat experience reduces but d oes not eliminate cons iderable incremental bidding. In this respect, Oc kenfels and Ro th (2 002) examine t he multiple-bid phenomena to consider h ow bidders get information from other’s bids, and then revise their willingness to pay in an auction with independent values. To describe bidders’ l earning, Hos sain (2008) su ggests that bidders do not always k now their exact private v aluation for a good and so learn spontaneously from the posted price. He concludes t hat bid ders obtain information about their own and others’ preferences as they participate in the auction. These ex perimental stu dies are particularl y relevant to my research i n the sense that they emphasized the role o f n aïve bid der, learning, and ex perience. However, bu ilt over t hese stud ies, my paper inte gra tes these processes in a structur al model to help t he auction platf orm manage and t arget its bidders. Moreover, in contrast to these studies, my paper accounts for behavioral regularities th at stem f rom bidders’ anticip ation of winner an d l oser regret, another for m of bounded rationality. 2.3.3. Emotionally Rational or Regretful Bidders Several stu dies exp lain the b ounded rationality of auction bidders b y r eferring to bidders’ uncertainty about th e v alue of th e co mmodity, which su ggests that bidders m ight anticipate winner and loser regret in their decision. In particular, Holt and Sherman’s (1994) theoretical study describes th e acceptance of a bi d as an informative event because it signals an overestimation of unknown value. They mention that winning/losing mi ght result in regret , so the bi dder mi ght anticipat e winne r/loser re gret in her decisions. In testin g a regret th eory in a first-price sealed-bid auction setting, F iliz-Ozba y and Ozba y (2007) find that the anticipation of winner and loser regret can be modified by a no tification po licy. Also, Engelbrecht-Wiggans and 65 Katok (2008) find a similar phenomenon, and the y conclude th at the policy of r evealing losin g bids may de crease the auction holders’ revenue. Although experiments are helpful for making causal i nferences, an auction site migh t need a str uctural model to run counterfactua l analysis and to target its bidders. Regret construct has been the subj ect of many studies in the consumer b ehavior, psychology, decision science, behavioral economics, and marketin g literature. A strea m of liter ature in psychology and consu mer behavior define s regret as a ne gative psychological response whi ch occurs when an ind ividual believes that a p resent situation would have been better i f onl y she had decided differently (Peluso 2011; Gilovich and Medvec 1995; Van Dijk and Zeelenberg 2005; S imonson 19 92; Zeelenberg et al. 2000; Inman and Zeelenberg 2002; Roes 1994). This regret can affect the co nsumers’ decision-making t hrough counte rfactual thinking (Roes 1994). In particula r, t he con sumer mi ght consider the pos sible negative outco me of a previous choice in her future d ecisions and so might regulate her b eh avior to decide d ifferently ex ante, by being regret averse (Peluso 2011; Zeelenberg and Pi eters 2007; Boles and Messik 1995; Tsiros an d Mittaal 2000). Many studies in ps ychology literature classif y the different types of regre ts accordin g to acti on and inaction regret categories. The fi rst categor y r efers to consumers’ feelin gs of sorrow for what they have don e, and the second r efers to consumers’ feelin gs o f so rrow fo r what the y have n ot done. The former is anal ogous to winner and t he latter to loser regret discussed in the auction literature (Filiz-Ozbay and Ozbay 2007; Engelbrecht-Wiggans and Katok 2008). Furthermore, action regret has sh ort term effect and evo kes intense feelin g, and inaction regret has long term 66 effect and evokes wistful feelin gs (Gilovich et al. 1 998; Gilovich an d Medvec 1995; Kei nan and Kivetz 2008). Bell (1982) argues that, after making a d ecision under u ncertainty, the decision maker may discover the relevant outcomes by learnin g that ano ther alternative would have b een p referable. This l earning creates a sense of loss o r regret that, if i ncorporated expli citly into the expected utilit y framework, better predicts i ndividuals’ d ecisions. According to Loomes and Sugden (1986), the violation of the c o nventional e xpected utility su ggests that im portant influential choice factors are overlooked, perhaps becaus e of the misspecification of t he conventional theories. The y propose an alternative approach, formulating a theor y o f expected modified utilit y to account for the in dividual’s capacit y t o anticip ate feelings of regret and rejoice. Such theory rests on two fundamental assu mptions: F irst, many people ex perience the sensations called regret and rejoice; and, se cond, t hey try to anticipat e and t ake in to ac count those sensations in making decisions under uncertainty. Guided by the menti oned study’s observation and sug gestions, many theor etical stu dies incorporated r egret to e xplain how the optimum pricing strategy might change in a new setting t hat i ncorporates regret ( Popescu and Wu 2007; Nasiry and Popescu 2011; Heidhues and Koszeg i 2008; Su and Zhang 2009; Diecidue et a l. 2012; Nasiry an d Popescu 2012; Ozer and Zhen g 2012 ). All of the above mentioned studies are useful in expanding the domain of knowledge about consumers’ re gret and the effect o f such phenomenon on the consumers’ deci sions. However, no ne of them has mod eled both th e rational and emotional aspects o f bidders’ decision m aking in the context of the online auctions, where bidders’ valu es are affiliat ed. In t his contex t, t he bidders learn the value of the com modities by observing others bids as well. 67 2.3.4. Theoretical, Experimental, and Empirical Auctions studies I classif y the papers in th is s ection into three c ategories based o n the modeling assu mption about the b idders’ v aluations: independent private value, common value, and affiliated value models. The model o f independe nt private value assu mes each bidder h as a different private value known only to him (Laffont et al. 1995; Guerrere et al. 2000; Haile and Tamer 2 003). In respect to t he common valu e assumption, Haile et al. (2003) proposes a non-p arametric test for first-price sealed-bid au ctions based on the fact that winners curse might exi st in such auctions. Baj ari and Hortacsu (2003) also ass ume that eBa y’s auction can b e approxi mated with second-pri ce sealed- bid auction to estimate a str uctural model of common v alue to recover winner curse. However, they acknowledge that a better option might be assuming that the eBa y auction is affiliated value. Finally, affiliated value is a form of valuation that is drawn from a j oint distribution of valuations, consisti ng o f both p rivate and common v alue component (Li et al. 20 02; Campo et al. 2003). Althou gh these st udies ex pand the domain of knowledge about the implication of various assumptions, t he y do not consider bidders’ emotional response and bounded rationalit y. Chakravarti et al. (2002) call for future studies of th is issue b y emphasizing that the learning process mig ht a lter the v aluations of bidders by an “information cascade”. They su ggest that such learnin g and value a ffiliation might in duce st rategic emulation of precedin g bidders wi thout considering p rivate si gnals. In this paper, I incorporate learning, value affiliation , and emotion in a structural model. Structural models can co nsider either consumers’ learning, in the form of an adaptive Ba yesian learning model, or the consumers’ expectation, in the fo rm of a fo rward-looking a pproach. Zeithammer (2006) argues that buyers can benef it from forward-looking strategies if the y take 68 into account the information provided by the announcements of upcoming aucti ons. He implicitly states that a f orward-looking m odel for b idders in online auction is intractable, s o in developing su ch a model , h e uses several simplifying assumptions. Given all these si mplifying assumptions, it is not c lear that a forward loo king approach has much m ore merit than the Bayesian adaptiv e-learning approa ch. F urther, it i s not clear how an e motionally l aden environment of an auction might foster the forwar d-looking behavior of the bidders. Smith (1989) notes that auction contexts are often emotion-laden and suggests th at the outcomes reflect communal legitimization o f both price and allocation given uncertainty ab out value, preferences and fair ness. Chakravarti et al. (2002) su gg est that the individual and social nature of the value determination p rocesses is a fertile area for future res earch. Furthermore, whether bidders experience re gret or not when bidding aggressively and winning may depend on their cognitive skills for co unterfactual reasoning and their facilities wit h motivational processes (e.g., dissonance and attri bution) for man aging th e e motions of victory and defe at, accordin g to Tsiros and Mittal (2000). The Filiz-Ozba y and Ozbay (2007) and En gelbrecht- Wigg ans and Kato k (20 07) st udies fo cus on experimentally attributing underbidding and overbiddin g to regret theory; Astor et al. (20 11) finds t hat aforementioned stu dies’ theoretical predictions for the effe ct of re gret holds, by employing an a pproach that combines auction e xperiment with psychological measures that indicate emotional involvement. Furthermore, G reenleaf (2004) shows that the auction sellers also anticipate regret and rejoice wh en they set the reserve price, which is the low est auction price t hat t he seller will accept. En g elbrecht-Wiggans (19 89) proposes a utilit y theor y th at depends not only on the profit, but also on the regret of the outcomes (e.g., money l eft on the 69 table). Further, Engelbrecht- Wiggans and Katok (2 007) points out that in the case of indepen dent private value first-price sealed-bid auctions, bidder s bid above ri sk neutral Nash equilibrium, which can be explained only by regret theory. Overall, built on the aforementioned studies, I develop a g eneral model that nests all learning, experience, value affi liation, and bid ders’ anticipation of winner and los er re gret in a structural form that allows an auction platform to tar get its customers and run co unterfactual pol icies. Without a model that c ontrols for all these mechanisms, the revenue implication of bidde rs’ regret for the auction platform may not be clear. 2.4. DATA I acquired th e data s et b y crawling an d s craping a s ample of auction s from eBay website, during May 2014. It consists of 58,285 bids of 12,247 bidders on 1,647 auction items within various auction categories. eBay’s revenue is based on a complex system of fees for services, including listing product features ($0.10 to $2) and a F inal Value Fee for each s ale (10% of the total amount of the s ale, i .e., price of the i tem pl us shi ppin g charges), and it exceeded $ 17.90 billi on in 2014. Millions of collectibles, décor, appliances, computers, f urnishings, equip ment, domain names, vehicle, and other miscellaneous item s are sold on e Bay d aily. Generally, se llers can auction an y thing on th e site as long as it is no t illegal and it d oes n ot violate th e eBa y prohibit ed and restricted item policy. eBay u ses a bidding mechanism called proxy biddin g. This mechanis m asks the bidder to sub mit the ma ximum amount she is willing to pay for the item, which is call ed a prox y bid. T hen, eBay’s software biddi ng agent (called prox y engine) bids incrementall y on the bidd er’s behalf up to this maximum value, which remains hidden from other bidders u ntil someone outbids it . As 70 new prox y bids enter, the prox y engine sets t he current win ning bid t o the second h ighest bidder’s maximum value plus the minimum increment specified by eBay. The current winning bid is displayed on the auction board throughout th e auction. At the end of t he auction, the bidder with the highest proxy b id wins the item and pa ys a price equ al to the second highest b idder's maximum bid plus the increment. This process makes eBa y auctions a h ybrid of th e English and second-price sealed b id auctions. Table 2.1 pres ents a possible path for th e proxy and observed bids in an auction where the startin g price is $25 and the minimum incre ment is $1. Table 2.1. A sample bid sequence on an eBay auction w ith $25 reservation bid and $1 minimum increment Bid number Max. bid (unobservable) Bid on the auction board (observable) 1 $50 $25 2 $40 $41 3 $70 $51 4 $65 $66 In eBa y ’s website, I observe both t he amount t hat eac h bidde r puts in as her prox y bid and t he bids automaticall y generated b y eBa y’s proxy en gine. I filtered the automatic bid s out to be able to work with th e actual bids o f the bidders. Note that, in eBay’ s sy stem, if so meone puts a bid between the displa yed ( automatic) bid and the hi ghest proxy bid, this action wil l not reve al the highest prox y bid – the hi ghe st proxy bid will only b e r evealed aft er so meone o utbids i t. Hence, even thou gh the displa yed bids always increase over time, the p roxy bids may not be in increasing order (see the example in Table 2.1). I sorted the bid s b efore usin g it in the model estimation to overcome this issu e. Figure 2.1 presents the evolution of obs erved bids across six sample items. 71 For this stud y, I randoml y selected 19 auction categories. The selected categories h ave both luxury and widel y available g oods, so the sample allows to test whether the re gret levels are different across thes e two categories. For instance, a bidder might regret more for los ing a lu xury item auction than losing a necessit y ite m auction. Table 2.2 shows the categories that I use in this study along with the nu mber of auctions in each category. I classified the first nin e categories as luxury good categories, and the next ten categories as the widel y available goods. Table 2.2. Auction categories in the eBay data Auction category Number of auction Items Jewelry and Watches 149 Collectibles 103 Crafts 78 Pottery and Glass 74 Antiques 68 Art 70 Entertainment Memorabilia 88 Tickets and Experiences 91 Stamps 72 Toys and Hobbies 93 Books 84 Clothing, Shoes and Accessories 84 Gift Cards and Coupons 85 Music 86 Consumer Electronics 83 DVDs and Movies 87 Dolls and Bears 84 Health and Beauty 74 Video Games and Consoles 93 Table 2.3 presents a sample of auction items. For ea ch auction item, I know it s title, categ ory, number o f bidders, number of bids, and the d uratio n of the auction. I call this auction specific information. In my d ata set, the avera ge nu mber o f bidders and b ids in eac h auction a re 9.52 and 72 49.19 wit h standard deviations of 4.58 and 1 9.40, respectively. The average duration of auctions is 4.74 days with a standard deviation of 1.67 days. F igure 2.1. Evoluti on of Bids in six sample auctions Table 2.3. Sample auctions in the eBay data Auction Item Title auction category winning bid number of bids number of bidders Ended Vintage Original Co-op porcelain sign Collectibles $1,000.00 92 12 May 18, 2014 , 2:15PM $3/1 Pantene Product Coupons Shampoo Conditioner Styler Gift Cards & Coupons $17.50 30 5 May 19, 2014 , 6:30PM Genesis Breyer P-Orridge "Naked Eye" Autographed Camera w/ Original Negatives Entertainment Memorabilia $900.00 75 9 May 22, 2014 , 2:00AM 73 F i gure 2.2 presents the evolution of the nu mber of bidders for a sam ple of six auctions in different auction categories. An in teresting observ ation in these auction s is a spi ke at the rate o f entrance at the last minutes. This behavior is known as sniping in the auction literature. Figure 2.2. Evolution of number of participating bidders in six sample auctions eBay emplo ys feedbac k as a reputation mechanism for its members t o dec rease their uncertaint y about bidder and seller characteristics. W hile buyers can l eave s ellers negative, neutral or positive f eedback, seller s can leave buyers posit ive feedb ack o r c hoose not to leave feedback. Over time, eBa y members develop a fe edback p rofil e, or reputation, based on these ratin gs. This information appears next to the members’ name and on t he members’ profile. In my d ata set, each bidder attends onl y on e of the auctions. I o bserve each bidder’s f eedback score, numb er of bids on the it em in question, to tal number of bids within the last 30 days, total 74 number of it ems / cate gories bid on within the las t 30 da ys, and number o f b id activit y with t he current s eller. This bidder specific information capture different t ypes of proxies for the experience of the bidders, and , hence, I u tilize the m in my an alyses. For ex ample, while bidding on o ne au ction category m ight show h igh level of concentration, bid ding on t hree categories might show high level of differentiation. Table 2.4 presents the s ummary statistics of average bidder characteristics w ithin each auction category. It shows significant heterogeneit y among bidders in different auction categories. Another important obs ervation from Table 2.4 is that, on average, bidders in each auction category h ave bid at least three times for the same item, which s uggests a multiple (incremental) bidding behavior. T his behavior, wh ich i s also r eported in the auction literature fo r si milar settings (see, for example, Zeithammer and Adams 2 010) su ggests that bidders might not ente r their valuations as proxy bids, as second-price sealed bid auction theor y suggests. Table 2.4. Summar y statistics of the average bidder characteristics within each of 19 auction categories Characteristic Mean SD m i n max size 644.53 238.21 453 1550 avg. feedback score 714.53 260.39 342 1301 sd feedback score 2763.37 1897.84 745 8033 avg. Number of bids on this item 4.84 0.74 3 7 sd Number of bids on this item 8.05 1.93 4 13 avg. total number of bids in 30 days 195.16 100.57 56 504 sd total number of bids in 30 days 493.74 260.94 110 1065 avg. Number of items bidded on in 30 days 93.63 57.41 30 264 sd Number of items bidded on in 30 days 251.42 197.83 53 1001 avg. Bidding Activity with current Seller 28.63 8.51 1 7 50 sd Bidding Activity with current Seller 31.11 4.05 2 4 40 avg. Number of categories bided on 2.05 0.39 1 3 sd Number of categories bided on 1 0 1 1 75 2.5.MODEL DEVELOPMENT I developed an a gent-based structural mo del to predict the revenu e impl ications of possib le auction platform policies, such as the notification policy , b y counterfact ual anal yse s. I mo del bidders’ actions in a Bayesian adaptive-learning structure. The term adaptive-learning re fers to the bidders’ updating beliefs about the value of the item, the distribution of the bids, and the number of bidde rs conditioned on observing noisy signals (Jap and Naik 2 008). B ayesian learning approach is appropriate in my setti ng, because auctions are e motionally laden settings, in which users’ preferences are correlated (Cha kravarti et al. 2002). In this environment, majority of bidders are naïve, so they learn th e value of the auction items b y observin g others’ bids (Hossain 2008; Zeitham mer and Adams 2010; O kenfels and Roth 2002). Furthermore, another advantage of the Ba yesian-learning ap proach is t hat it is computationally tractable in the auction context 11 . 2.5.1 Modeling the valuation of auction items I identif y the anticipated loser and win ner regret of the bidders by fi rst mode ling the bidder’s valuation with a d y n amic adaptive utilit y max imization approach. This app roach incorporates the anticipated regret o f the bidder in th e utilit y s pecification. Then --usin g th e first order condition for the u tility fun ction--I derive the latent valua tion of the bidders and embed it into another valuation specification combinin g the affiliated values of bidders and le arning from others in the bidding process. This meth od provides t he requ ired identifying equ ations for th e anticipated loser and winner regrets for each bidder. 11 Forward-loo king approach es are intractabl e in this setti ng; see, fo r example, Zeithammer (2006), which proposes many restri ctive simplif ying assumptio ns to deal w ith the intract ability of a fo rward-looking model . 76 2.5.1.1. Dynamic adaptive utility maximization approach: I first specif y the utility o f an emotionall y rational bidder. In a second pr ice auction setting, t he term “emotionall y rational bid der” refers to a b idder that, rather th an biddin g her priv ate value as suggested b y the auctio n th eory, acts naivel y b y comparing he r bid with t he bids of the population. The reason of t his comparison might be a lack of infor mation about the value of t he auction item and/or about t he seller or bidder’s past bad experiences. In this way, th e bidder anticipates a possible regret f or potentiall y winning or lo sing with his current bid, so ex-ante t he bidder compares her b id with t he hi gh est bid o f t he populati on. To quantif y th is ph enomenon, I adopt the ut ility function format t hat Engelbrecht-Wiggans and Kato k (2008) and F iliz Ozbay and Ozbay (2007) specify for emotionall y rational bi dders. Formally, the utility is defined as ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 t t t it v z b i t t t it b z i it t it it it z dG z v z dG z b b G b v u it t it it t − ≤ ≤ − ≤ − − − − − − = ∫ ∫ β α (1) where it u d enotes the utility o f b idder I i .. 1 = at the time of bidding the t ’th bid, T t . . 1 = , in a particular auction (I suppressed t he auction subscripts for eas e of ex position). it v denotes t he time var ying value of bi dder i at time t , and it b denotes the bid that bidder i raises at time t. t z denotes th e maximum bi d among all other participating bid ders and t G is the cdf of t z . Note that ) ( 1 it t b G − is t he p robability o f winning the auction (based o n the beliefs at time t-1 ) after Rational gain from Winner regret Loser regret 77 bidding it b at time t . i α and i β denote the winner and l oser re gret parameters of b idder i , respectively. All the notation I use in this paper is summarized in Table 2.5. Table 2. 5. Notation Notation Description it u The utilit y of bidder I i .. 1 = at time of bidding the tth bid, T t . . 1 = , in auction j, which is suppressed for ease of presentation it v The time va r y i ng v alue of b idder i at time of raising t th bid i n au ction j; The measure of the valuation of bidder i at time of raising tth bid in auction j it b The bid that bidder i raises at time t 1 − t G The time var ying b elief of bidders about t he d istributi on of maximum bid t z response of all other participating bidders i α The winner regret parameter of bidder i i β The loser regret parameter of bidder i (.) 1 − t g The densit y fun ction, or derivative of cu mulative distribution function of bids (.) 1 − t G jt b The tth bid in auction J j ... 1 = jt θ The latent tth bid in auction J j ... 1 = jt ε The Normally d istributed nois e of entering the bid into the system, or observation noise, which has auction specific variance of jv σ j τ , j γ The evolution and drift factors of the latent bid, in s y stem equation jt ω the noise o f evolution of the latent bids within the auction, whi ch has auction specific variance of jw σ (.) t F The time var ying cumulative d istribution fun ction of bi ds; The auction s pec ific subscript j is suppressed (.) t f The time var ying density function of b ids, assuming that the dis tribution is normal; The auction specific subscript j is suppressed t n The time var ying nu mber auction p articipants; Th e au ction s pecific s ubscript j is suppressed jt n The observed number of bidders at time of tth bid in auction j jt κ The latent number of bidders at time of tth bid in auction j jt τ The time trend, or the count of bids that have entered so far j ι The average rate of entrance parameter j η The rate of sniper entrance 78 jt 1 ξ The normally distributed system noise, which has the variance of ξ σ j 1 jt 1 ζ The normally distributed observation noise, which has the variance of ζ σ j 1 it b − The maximum bid that others have raised until the tth bid it ϕ The affiliated value of bidder i when tth bid is raised jt ϑ The auction specific time var y i ng common value element of this valuatio n i δ The parameter of revelation of the value it 2 ζ The private signal error term, which has the auction specific variance of ζ σ j 2 jt 2 ξ The common si g nals t hat bidders rec eive, and it has auction specific variance of ξ σ j 2 i d The vector of measures for experience of bidder i c m The mean of this vector across members of cluster c c π The propensity of population membership in segment c ) , , , ( i i i i i ρ δ β α = Θ The vector of regret, valuation and learning parameter of bidder i i Ind The segment that individual i is its member j f The information vector of the au ction item j, with n information items (i. e., columns) ' θ The latent prior of membership of an auction in an auction cluster n z The latent cluster index of feature j n j f The nth observed information item of auction js information vector ' α and ' β The parameters of the LDA model to estimate ) , , , ( j j j j j η ι τ γ ψ = The auction specific parameters of the e volution of b elief about the bid s and number of bidders j clus The cluster membership index for auction j This utility spe cification has three com ponents 12 : T he first component i s the ex pected profit of the bidder from winning the auction. The second compon ent is the anticipated winner re gret for 12 A criticism t o the proposed utility speci fication mi ght be that bidders might sear ch across different auctions. However, s tudies su ch as Haruvy and Leszc yc (2010 ) show th at bidders have i nertia, and unless ther e is an incenti ve, they do not search across auctions. Ariely and Simo nson (2003 ) also posit th at when bidders are emotion ally involved wit h the auction, they lose th eir global view o f all the optio ns that ar e available to th em (i.e., search), s o they act boun ded ration ally, and onl y focus on selecting th e bid amount. 79 paying hi gher: Winner regret is defined as a multi plier o f the difference between the b idder’s bid and highest bid of others in case th e bidder wins the auction, and it depends on the distribution of maximum bids of other bidders. The third c omponent is anticipated loser regret, w hich occurs when the bidder loses an auction even thou gh the winnin g bid is lower than h er valuation. In this case, loser re gret is defi ned as a multipl ier o f th e difference b etween the b idder’s valuation and the winning bi d. T he underl ying assumption fo r this specification is th at bidd ing is a nois y process, so bidders form a belief about the distributi on of the latent bids, which I describe next. Consistent with the suggestion that bidders learn during the aucti on 13 (H ossain 2 008, Zeithammer and Adams 20 10, Okenfels and Roth 2002) and biddin g is a nois y process (Jap and Naik 2008), I assume th at t he mean of bids foll ows a first order Ma rkov process. Formall y, I define ) , 0 ( ~ , jv j t jt jt j t N b σ ε ε θ + = (4) ) , 0 ( ~ , 1 jw jt jt j jt j jt N σ ω ω γ θ τ θ + + = − (5) where jt b denotes the t’th bid in auction J j ... 1 = , jt θ denotes the latent bid, and jt ε denotes the normally d istributed noise of ente ring the bid into the sy stem (i.e., trembling hand of Selten 1975) or observatio n noise (i.e. bounded rati onality o f Simon 1972), whic h has auction specific variance, jv σ . j τ and j γ denote evolu tion and drif t facto rs of the latent bid, respectively, and jt ω 13 As mention ed in the Int roduction, bi dders can learn via various pro cesses, su ch as gaining addition al information about th e auction item or resolving som e of the unc ertainties about the seller or abo ut their own n eeds, et c. This type of l earning is differ ent than learni ng the valu e of the item fr om the bidders of the other au ction particip ants. 80 denotes th e noise of evolut ion ( or t he u nobserved evolut ion facto r) o f the latent bid s within the auction, which has auction specific variance, jw σ . Let B i be the random variable denoti ng th e latent bid of bidder i, (.) t F and (.) t f be the time varying cum ulative distribution and density functions of latent bids (uni form across bidders), respectively (assum ing the density function exists), and t n be the time varying n umber of auction participants. C umulative dist ribution an d densit y functions f or maximum bid o f o ther 1 − t n bid ders at ti me t are formally defined as (I suppress the auction index j for ease of exposition): 1 1 1 1 1 1 1 ) ( ) ( ) ... ( ) ( )... ( ) ,..., ( ) ) ,..., (max( ) ( − − − − = = ≤ ≤ = ≤ ≤ = ≤ = t t t t n t t t t t t t n t t t t n t t t n t t t F F F B P B P B B P B B P G θ θ θ θ θ θ θ θ θ (6) ) ( ) ( ) 1 ( ) ( 2 t t n t t t t t f F n g t θ θ θ − − = (7) Many stud ies such as Park and Bradlow (2005) and Bradlow and Park (2007) propose methods to recover the latent n umber of b idders. In t his stud y , for the purpos e of parsimon y and simplicity, I assum e that customers use t he same Bayesian upd ating structure for the evolution of both t he b ids and the number of bidders. This as sumption is reasonable since there is potent ial observation noise for the number of bidders 14 . 14 Usually bid ders onl y skim through the bi ds to get a high le vel understan ding of numb er of bidders, and since eac h bidder b ids multiple tim es, double counting or missi ng on e bidder might be com pletel y natural for bound ed rational bidder. In addition , bidders d o not know if any bidder has left the au ction or not at th e time of consid eration, so the cu mulative num ber is a nois y signal. 81 Formally, I use the followin g first order Markov process to specify the evolution of the actual number of bidders 15 : ) , 0 ( ~ , 1 1 1 ζ σ ζ ζ κ j jt jt jt jt N n + = (8) ) , 0 ( ~ , 1 1 1 1 ξ σ ξ ξ τ η ι κ κ j jt jt jt j j jt jt N + + + = − (9) where jt n and jt κ denote the observed and latent nu mber o f bidders at time of t ’th bid in auction j , respectively. j ι denotes the av erage rate of entrance between bidding t imes and j η is t he change i n that entranc e rate, which is multiplied by jt τ , which deno tes the time trend in the auction. This specification allows me to model the s niping behavior explicitly (see, for example, Roth and Ockenfels 2000 for further d iscussion of sniping). jt 1 ζ is the obser vation noise, normally dist ributed wi th mean z ero and vari ance ζ σ j 1 , and jt 1 ξ is the system noise in the rate of en trance and exit , also normally distributed with mean zero and vari ance ξ σ j 1 . Therefore, bidders update their ex pectations about the l atent number of b idders at each point in time by observing the cumulative number of distinct bidders, who have bid up until that moment. The last step of the model development in this approach is derivin g the expression for valuation. I assume that, at each t ime t , bidders optimize th eir utilit y b y selectin g the optimal bid, it b , given the valuation t hat they decide to reveal at the t ime. As a result, the bid, it b , satisfies the foll owing first order condition: 15 Given that theory and m an y empir ical studie s su ggest th at b idder s are bounded r ational fo r vari ous r easons, it is reasonable t o ass ume t hat bidders f ollow a simpler parsimonious approximation , su ch as my mo del, rather t han a complex o ne. 82 0 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 1 1 1 = − + + − − − + − = ∂ ∂ − − − − − − it t i t it it t it it t it it t i t t it it it t it it b g b v b g b b g b b G b g b v b G b u β α α α (3) Inverting equation (3) gives a measure for bidders’ valuation. Hence, valuation is specified as ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 1 1 1 1 it t i it t it t it i i it t it i it t i it t it it t it b g b g b g b b g b b G b g b b G v − − − − − − − + + + − + + = β β α α α (10) However, bids it b are nois y, so a better mea sure o f valuation consists of the exp ectation of the right hand side of equation (10) over th e dist ribution (.) 1 − t F of l atent bids it θ . Therefore, valuation takes the following form:       + + + − + + = − − − − − − − ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 1 1 1 1 it t i it t it t it i i it t it i it t i it t it it t it g g g g G g G E v θ β θ θ θ β α θ θ α θ α θ θ θ θ (11) The right hand side o f the equation (11) is full y sp ecified, but in order to estimate the unknown regret parameters, another specification of it v is required. To derive suc h a sp ecification, t he affiliated valuation and learning theory provides an appropriate ground, which I anal yze next. 2.5.1.2. Affiliated valuation and learning approach: In this approach, I model a bidde r’s valuation of an auction ite m as a combinati on of three components: a common value, a private value, and a component consistin g o f b idder’s learning the value of the auction item from the bids of other participants ( see Hossain 2008; Zei thammer and Adams 2010; and O kenfels and Roth 2002 fo r further justification o f this specification). As a result, the time varying valuation has the following specification: it t i i it b v ϕ ρ + = − − ) 1 ( (12) 83 where ) 1 ( − − t i b denotes the bid t hat the bidder sees in the auctio n board at t-1, and it ϕ denotes t he affiliated value of bidder i when tth bid is raised. Th e affiliated valu e consists of a priv ate si gnal that onl y the bidder r eceives, and a co mmon signal that all the bidders receive. To control for both t y pes of these un observed si gnals, I model the affiliated value evolution in the state space format, w here the private signal is the error of th e observation equation, and the common si gnal is the error of the valu ation state equation. In addition, I assume th at t hese si gnals affect t he valuation h igher at higher value ite ms, a nd lo wer when the value of the item is lower (i.e., t he signals are h eteroscedastic). Therefore, consistent with Zeithammer and Adams (2010 ), I consider a log-log model of affiliated valuation evoluti on, which has the following form: ) , 0 ( ~ , ) log( ) log( ) , 0 ( ~ , ) log( ) log( ) log( 2 2 2 1 2 2 2 ξ ζ σ ξ ξ ϑ ϑ σ ζ ζ δ ϑ ϕ j jt jt jt jt j it it i jt it N N + = + + = − (13) where it ϕ denotes the af filiated value of bidder i that raises t’th bid, and jt ϑ denotes auction specific time varying common value element of this valuation. i δ denotes the p aram eter of revelation of the value, and it 2 ζ denotes the private sig nal error term, which has the auction specific variance of ζ σ j 2 16 . jt 2 ξ denotes the com mon si gnals that bidders receive bet ween time t-1 and t , and it has auction specific va riance of ξ σ j 2 . This specification allows the b idders to reveal their private v alues gradually when they bid multiple tim es. I incorporated this hiding 16 In an ideal s cenario, e ach bidder woul d have diff erent distributi on for their p rivate values. However, this assumptio n significantl y complicates the model and makes it i mpossible to esti mate the bidder-sp ecific paramete rs using th e available dat a set. My data set is sparse in th e sense that man y bidders do not raise their bi ds more th an three or four b ids within each auction. 84 process to allow for th e later bids to be s ystematically higher, consistent with the Zeithammer and Adams (2010), and Okenfels and Roth (2002). 2.5.1.3. Identification of loser and winner regret: Assuming that th e two approaches d iscussed abov e –d ynamic utilit y maximization and affiliated valuation and learning a pproaches-- give the same valuation for a particular auction item, I can combine equations (11) - (13) in the followin g way: ) , 0 ( ~ , ) lo g( ) log( ) , 0 ( ~ , ) lo g( ) log( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( log 2 2 2 1 2 2 2 1 1 1 1 1 1 1 ξ ζ θ σ ξ ξ ϑ ϑ σ ζ ζ δ ϑ ρ θ β θ θ θ β α θ θ α θ α θ θ θ j jt jt jt jt j it it i jt it i it t i it t it t it i i it t it i it t i it t it it t N N b g g g g G g G E + = + + =         −       + + + − + + − − − − − − − − − (14) For e ach bidding time t, the set of equations (4)-(9) and (14) provides the required identif y ing equations for the anticipated loser and winner regrets for each bidder. 2.5.2. Accounting for heterogeneity eBay auctions and their participants ’ b ehaviors show hi gh level of heterogeneity, as also mentioned in the Data section. Althou gh bidd ers bi d multiple times (incrementall y), the n umber of observations is not enou g h to identify each bidd er’s parameters, so I u se a level of shrinkage through Ba ye sian prior o n the aucti on specific and individual specific parameters. Man y st udies including S rinivasan and W ang (2010), W i lcox (2000), and Ariel y et al (2005) emphasize t he influence of experience on the behaviors of the bidders. I use the bidders’ information as a prox y for their experience at the hierarchical leve l in order to shrink the para meters of bi dders with similar experience. I ran the estimation procedu re in two steps: In the fi rst step, I segment the bidders into K clusters and in the second st ep, I condition on the se gment index of bidders while 85 running the estimation procedure 17 . This two-step approach helps s peedin g up the estimation procedure. In the sa me man ner, to acc o unt for heterogeneity in the auction spe cific p arameters, I shrink auction p arameters given the eBa y-specified auction clusters. Simil ar to bidder specific parameters, I condition ed on the clu ster membership in the estima tion procedure to shrink the auction specific p arameters. In Section 2.7, I t ested ap plying a c lustering technique on the auctions too, rather than using the eBa y-specified cl usters. My main model results turned out to be robust to this method, which suggests that eBa y auction clusters were indeed informative. 2.5.3. Estimation Procedure The total number of parameters of the model is lar ge, mainl y to account for heterogeneity: Th ere are four parameters, ) , , , ( i i i i i ρ δ β α = Θ , for each of the 12,603 bidders, and 10 parameters, ) , , , ( j j j j j η ι τ γ ψ = and the var iances of six state-space equation error terms ) , , , , , ( 2 2 1 1 j j j j wj vj j ξ ζ ξ ζ σ σ σ σ σ σ = Σ , for each of the 16 46 auctions. This makes a to tal of 66,872 parameters. This large n umber of parameters o ver a data set of 58,285 bid s most likely causes over-fitting problem. Ho wever, Ba yesian shrinkage o f para meters across clusters allows me to identify th e model. Add itionall y, I put constraints o n the evolution of bids and the number of bidders to be able to identify the model more efficiently. These con st raints assure tha t the evolution param eters o f both the bids and the numbers o f b idders are non-negative. I also put constraints on the valu ation growth and l earning from others’ bids, consistent with the th eory 17 It is not clear whether o r not the mod el would over-fit and learn n oise rather than th e actual b ehavior of bid ders if I had in corporated un observed param eters of bid der respons es in a form of a mixture norm al model emb edded within th e estimation procedure. 86 which suggests that valuation i s positive, and the bidder can either lea rn from the h ighest bid to increase her valuation or not learn at all . F inally, I rounded up the latent number of b idders recovered from the state space model, as the number of the bidders should be an integer number. I explain my estimation algorithm in this section and provide a pseudocode of it in Appendix C. 2.5.3.1 Clustering estimation: In th e first step of the esti mation procedure, I clu stered the bidders based on the observed data. To cluster bidd ers with similar experiences, I ass ume, in each segment, the ex perience of members is a noi sy measure of the s egment’s mean experience, so I use a mixture normal fuzzy clustering 18 . Formally, the likelihood o f the mix ture normal clustering approach has t he following structure: ∏ ∑ ∏ = = = = I i c c i K c Norm ic I i i i N orm v m d P v m d P 1 1 1 ) , | ( ) , , | ( π π (15) where i d denotes the vector of measures for experience of bidder i. m=(m 1 ,…,m K ), v=(v 1 ,…,v K ) where c m and c v denote the mean and variance of i d across members of cluster c, respectivel y, and ) ,... , ( 1 iK i i π π π = where ic π is the propensity of p opulation member ship in segment c. I maximize this likelihood function with respect to ( m, v, π ) using an Ex pectation Maximization (EM) algorithm, which, for each bi dder, provides a p robability distributio n for segment memberships. Finally, I assi gn e ach bidder to t he s egment with the hig hest probabilit y. Therefore, for the shrinkage parameters, I formally have: 18 The term “fuzzy” is used for methods which estimat e a distri bution for th e cluster members hips, rather than assignin g the bidders to certain clust ers. 87 ) , | (. ~ i i i d Ind MVN Θ (16) where i Ind denotes bid der i’s segment. This specification provides flexib le patterns of bidders’ responses. I used M clust p ackage in R to pe rform t his clustering. T his package uses the Bayesian information criterion (BIC) for model selection. To cluster similar auctions, I utili zed th e eBay-specified auction clusters. Formall y, I obtain th e following structure: ) , | (. ~ j j j D clu s MVN Ψ (18) where j clus denotes cluster membership index of auction j. In summary, the model uses the hierarchical multi variate no rmal prior for both bidder specific and auction specific parameters, conditional on their segment and cluster membership (I use j clus to denote th e cluster membership index of auction j). This procedure accounts for heterogeneity in these entities, and prevents over-fittin g to the data and learning noise. 2.5.3.2.Maximum A Posteriori (MAP) optimization: In the s econd step, given the segment and cluster membership information of each bidder and auction, I optimized Maximum A Posteriori (MAP) of th e model parameters over the data 19 . Considering the assumption that b idders u pdate their belief about th e distribution of the bids sequentially, I use Kalman Filter t heory (Kalman and B ucy 19 61). Introd uced b y Jap and N aik (2008) to auction literat ure, Kalman Filter sta rts wit h a prior on the d istribution of latent 19 Alternatives such as Gib bs and Metropolis Hasting sam pling methods are computation ally intractabl e over large data sets. 88 measures, and it updates the posterior distribution of the latent measures sequentially, u sing Kalman gain facto r (the var iance of si gnals proportion) to wei ght the observed signal and the prior in a Ba yesian updatin g process. The advantage of Kalman Filter to other filters is it s closed form, which significantly improves the estimation speed. To estimate the latent state spa ce model with Kalman F ilter, I us ed a Monte Carlo Exp ectation Maximization (MCEM) approa ch suggested by De Valpine (2012). This approach embeds K alman Filter in the optimization method. In othe r words, given the non-state p arameters ) , , ( Σ Ψ Θ = Φ , t he pro cedure es timates the mean and variance of the latent s pace, and th en uses Monte Carlo simu lation to estimate the full j oint likelihood of the observation t y and the latent state ) , , ( t t t t ϑ κ θ = ∆ , which has the following form: ∏ = − Φ ∆ ∆ Φ ∆ Φ ∆ = Φ T t t t t t T y p y y l 1 1 0 1 ) , | ( ) , | ( ) | ( ) | ,..., ( π π (19) The posterior of the model has the following form: [ ] [ ] [ ] ) ( ) ( ) ( ) ( ) ( ) , , | , , , ( ) , , | , , , ( ) , | ( ) , , , , , , , | ( ) , , , | ( ) , | ( ) , , , | ( ) , | ( ) , , , , | , , , , , , , , ( ) ( ) ( ˆ ) ( ) ( ˆ ˆ ) ( ) ( ) ( ) ( 2 1 2 1 1 1 1 1 1 1 ) ( ) ( ) ( ) ( i i nd i ind j clust j clus t j j clust j clust j j j j Norm i i i nd i i nd i i i i Norm j j jt jt Norm jt jt j i i i i jt jt Norm j j j j jt j t Norm j jt j t Norm J j I i T t j j j jv jt jt Norm jv jt jt Norm j i jt jt jt i i i i j j j j i ind i ind j c lust j clust j D P d P d P b b P d P n P d P b P D d n b b P µ δ µ δ σ δ µ δ δ σ µ η ι γ τ σ µ δ ρ β α ϑ σ ϑ ϑ κ ϑ σ δ ρ β α κ η ι σ κ κ σ κ θ γ τ σ θ θ σ θ δ ρ β α η ι γ τ µ µ σ µ ξ ζ ξ ζ ) ) × × × × Σ × × × × × × × × = Σ Σ − − − = = = − − ∫ ∫ ∏ ∏ ∏ ∫ (20) 89 where ) ,..., ( 1 j jT j j θ θ θ = , ) ,.., ( 1 j jT j j κ κ κ = , ) ,..., ( 1 j jT j j ϑ ϑ ϑ = , and (.) . ˆ δ denotes t he Dira c delt a function 20 . The first, second, and third lines denote likelihood of error terms of the s tate space equations (4) and (5) --specified for the belief of b idders about the bid distribution-- equations (8) and (9) --specified for t he belief of bidders about th e number of bidder distribution--, and equation (14) --specified for t he evolution o f affiliated valuations, r espectively. The fourth and fifth lines denote the likelihood of error te rms of equation (16) and (18), respe ctively, specified as hierarch y over individu al and auction specific parameters. The s ixth l ine specifies prior on t he variance of th e t hree state s pace equations, and the mean and v ariance parameters of the hierarchy. I can rewrite the model p arameters’ posterior based dir ectly on the error terms of t he state space equations and hierarch y over individual and auction specific parameters as follows: [ ] [ ] [ ] ) ( ) ( ) ( ) ( ) ( ) , , | , , , ( ) , , | , , , ( ) , , | ( ) , , , , , , , , | ( ) , , , , | ( ) , , | ( ) , , , , | ( ) , , | ( ) ( ) ( ˆ ) ( ) ( ˆ ˆ ) ( ) ( ) ( ) ( 2 1 2 2 2 1 1 1 1 1 1 1 1 1 ) ( ) ( ) ( ) ( i ind i ind j clust j c lust j i clust i clust j j j j Norm i i ind i ind i i i i Norm j j jt jt jt Norm jt jt j i i i i jt jt it Norm j j j j jt jt jt Norm j jt jt jt Norm J j I i T t j j j jv jt jt jt N orm jv jt jt jt No rm i ind i ind j clust j clust j D P d P d P b b P d P n P d P b P µ δ µ δ σ δ µ δ δ σ µ η ι γ τ σ µ δ ρ β α ϑ σ ϑ ϑ ξ κ ϑ σ δ ρ β α ζ κ η ι σ κ κ ξ σ κ ζ θ γ τ σ θ θ ω σ θ ε µ µ σ µ ξ ζ ξ ζ ) ) × × × × Σ × × × × × × × × Σ − − − = = = − ∫ ∫ ∏ ∏ ∏ ∫ (21) I use an MCEM approach to compute the m aximizing parameters o f the posterior of the model in (20). This iterative method starts with a n initial set of parameter estimations an d alt ernates between an Expectation (E-) step and a Maximization (M-) 21 step until convergence. 20 Dirac delt a function is a generalized distribution t hat is zero everywhere excep t at the point that its su bscript specifies. It repres ents a normal distrib ution at the limit when the variance equat es to zero. 21 I actuall y appl ied the Generalized EM al gorithm w here, in the M-s tep, rather than computin g the ma ximizing paramete rs, I settled wit h a point th at improves the obj ective. This algorithm has similar propertie s with EM algorithm (see, McLach lan and Kris hnan 200 8, for further details ). 90 For the E-step of each iteration, I first perform a W eighted Least Squares ( WLS ) to project t he bidder (resp., auction) s pecific information to the b idder (resp., auctio n) specific parameters within each segment as follows: ) , 0 ( ~ , ) , 0 ( ~ , ' ' j j i i clus j j j clu s j Ind i i i Ind i N D N d σ χ χ µ σ λ λ µ + = Ψ + = Θ (22) The estimated parameters of th ese WLS’s, ( ) ( ) ( ˆ , ˆ i ind i ind σ µ , ) ( ) ( ˆ , ˆ j clus t j clust σ µ ), are t hen used to compute the p rior probab ilities of the bid der-specific and auction-specific parameters ( fourth and fifth lines in equation (2 0), respectivel y). I computed the lik elihood contributi on of the belief of bidders about the bids and the number of bidders, and the evolution of the valuations (first, second, and the third lines of equation (20)) using Kalman filterin g and bac kward smoothin g methods to d erive the evolution of the state para meters i n each biddin g time. I used Monte Carlo sampling method to integrate out the latent stat e variables. The details of these methods are explained in Appendix C. I us ed D LM package in R t o run th e Kalman filtering and b ackward smoothing. For the optimi zation pro blem in the M-step of each iteration, since a closed form so lution for t he gradient o f the maximu m a posteriori of the mo del is not available, methods such as gradient descent, quasi Newton, and conjugate gradient a re computationall y i ntractable. Calculatin g the gradient nu merically will also in crease the run-time o f the estim ation al gorithm cubi cally in the number o f the para meters, i.e. ) ( 3 TJ P O . Therefore, I used simulated annealing meth od, which is a generic probabilistic heuristic method for global optimization. 91 Simulated annealing meth od uses only fun ction values, s o it is relatively slow. I t sta rts with an initial value and, at each iteration, a n ew point is randoml y generated. T he al gori thm accepts all new point s t hat improves the objective, but also, with a certain probabilit y that graduall y decreases, it might ac cept points that worsen the objective. B y acceptin g the latter type of points, the alg orithm avoids being trapped in local m ini ma i n early ite rations and is able to explore globally for better so lutions. (See Belisle 19 92 f or further discussion of this algorithm.) To the best o f my knowledge, I am th e first to u se simulated annealing i n the marketin g/OM fields, but it i s used in other fields (for example, Cr ama a nd Sch yns 2003 us e this method for complex portfolio optimization, and Zhuang et al 1994 use it for robotics calibration). I terminated the M CEM algorithm when the Euclidian difference between the para meter estimations of two cons ecutive iterations became smaller th an a p re-specified tolerance or after a maximum number of ite rations (I u sed 1e-8 as the t olerance and 2,000 as the maximum iteration number in this study). 2.6. RESULTS I start presenting the results with the bidder se gments that I esti mated in t he first step. T he B IC criterion sugge sts clustering th e bidders into 47 segmen ts. Table 2.6 pr esents the summ ary statistics of average bidder characterist ics within each bid der segment. As exp ected, there is considerable heterogeneit y between segments. The optimal MAP is estimated to be -94,280,085. Given that this model is estimated on approximately 6 0,000 bids of 12,000 bidders in 1,600 au ctions, this value is in the expected range. Table 2.7 presents t he su mmary statistics for the bi dder-specific parameter estimations: columns 2 -5 ar e across 1 9 auction categ ories and columns 6-9 are across estimated 47 bidder 92 segments. The estimated parameters show significant heterogeneit y in bidders’ parameters across bidder se gments. On the other hand, there is not much heterogeneit y acr oss au ction catego ries, which su ggests that re gret and valuation /learning characterist ics are more i ndividual specific than category specific. Table 2.6. Summary statistics of the average bidder characteristics within each of 47 bidder segments Characteristic Mean SD min max Size 260.55 275.02 3 9 92 avg. feedback score 3471.21 12399.80 48 84027 sd feedback score 3635.04 8480.39 5 45365 avg. Number of bids on this item 8.57 7.67 1 35 sd Number of bids on this item 6.79 6.77 0 26 avg. total number of bids in 30 days 680.32 1072.73 3 4530 sd total number of bids in 30 days 630.98 1066.22 0 5814 avg. Number of items bid on in 30 days 257.57 412.59 1 1631 sd Number of items bid on in 30 days 266.91 449.57 1 2099 avg. Bidding Activity with current Seller 24.47 20.19 1 10 0 sd Bidding Activity with current Seller 19.34 11.72 0 40 avg. Number of categories bid on 2.19 0.57 1 4 sd Number of categories bid on 1.13 0.64 0 3 Winner (r esp. lo ser) re gret parameter is si gnifica nt in 44 (resp. 45) out of the 47 bidder segments at p <0.01 and it is significant in two (resp. one) categories at p< 0.05. This significance is consistent with the findings o f Bajari and Hortac su (2003) suggesting when there is an element of co mmon v alue i n the valuation, there is po tential winner regret anticipation in eBa y auctions. Table 2.7 also indicates that the mean of av erage loser regret is slightl y higher (in magnitude) than winner regret, but the means are fairl y close to each other. The si gnificance of both winner and loser r egrets and th eir close m agnitudes, on av erage, ar e no t consistent with the s uggestions of Ariel y and Sim onson (2003), which state that the second price 93 systems used by onlin e auctions li ke eBay d ecrease the probabilit y of winner regret, wh ile maximizing loser regret. There fore, my results indeed s upport the r esults of Zeithammer and Adams ( 2010) suggesting sea led-bid second price auction is not a g ood abstraction for eBay auctions. Another possib le explan ation for this inconsistency is bid ders’ naïve bidd ing b ehaviors which do not conform to second price auction theo ry. The magnitudes o f the regret v alues do not support the claim of Gi lovich et al. (1998) either : They sug gest that action regret (analogous to winner regret) incites m ore intensive fe eling than inaction r egret (analogous to lo ser regret), which incites wistful feeling. Table 2.7. Summary statistics for the bidder specific parameter estimations within each auction category (19) within each bidder segment (47) Parameter min max Mean SD min max Mean SD avg. winner regret - 1.38 -1.24 -1.31 0.04 -1.67 -0.52 -1.28 0.19 se winner regret 0.02 0.04 0.04 0.01 0.03 0.41 0.10 0.08 avg. loser regret -1.4 -1.28 -1.33 0.03 -1.7 -0.79 -1.34 0.13 se loser regret 0.02 0.04 0.04 0.006 0.03 0.49 0.10 0.09 avg. valuation param. 1.17 1.28 1.23 0.03 0.79 1.42 1.22 0.10 se valuation param. 0.02 0.04 0.03 0.004 0.02 0.27 0.08 0.06 avg. learning param. 0 .18 0.32 0.25 0.03 0 0.81 0.27 0.12 se learning param. 0.02 0.04 0.03 0.01 0.03 0.26 0.09 0.06 Looking at the estimatio n results from t he auction categor y perspective shows that both wi nner and loser regret are si gnificant in all t he cate gories at p<0.0001. To test the h ypothesis that luxury and widel y available goods convey different levels of re gret, I ran pairwise t-t est b etween the p arameters of regret in widely available and l uxur y goods categories. However, the results, which are p resented in Table 2.8, did not sho w any si gnificant difference between luxury and widely available good auctions in terms of regret levels. 94 Table 2.8. t-Test: Paired Two Sample for Means For Winner Regret For Loser Regret Widely Available Luxury Widely Available Luxury Mean -1.334 -1.317 -1.319 -1.330 Variance 0.934 0.982 0.946 0.957 Observations 6024 6024 6024 6024 Pearson Correlation 0.008 0.002 Hypothesized Mean Difference 0 0 Df 6023 6023 t Stat -1.002 0.609 P(T<=t) one-tail 0.158 0.271 t Critical one-tail 1.645 1.645 P(T<=t) two-tail 0.316 0.542 t Critical two-tail 1.960 1.960 Table 2.7 shows th at the mean of the average v aluation revelation parameters across b idder segments is 1.2 2, i.e., at each increment, o n average, the bidders reveal 22% m ore than their previously revealed valuation. The mean of average learning parameters a cross bidder se gments is 0.27, which imp lies that, o n average, the bidders weigh t he highest observed b id 27% while updating th eir valuations b y learnin g from th e highest bid. Comparison of the estimated valuation and learning p arameters suggests that b idders put more wei ght on th eir own valuation than learning f rom the h ighest bid. A possib le explanation for th e low learning level is shill bidding, as Bo ze and Da ripa (2011) suggest. In shill bid ding, the sell er bids on the auction b y herself or through one of her affiliate to cause oth ers to bid higher. It might b e possible that t he bidders consider such shill bidding, so they discount their learning from the hi g hest bid. 95 F i gure 2.3. Histogram of regret and valuation evolution parameters across bidder segments F i gure 2.3 sh ows t he d istribution of regret, l earning, an d v aluation revelation parameters across bidder se gments. These distributions have lon g t ails, and , ind eed, Shapiro-Wilk n ormalit y tests reject normali ty for these distributions at p<0.01 , so Gaussian distribution do es no t repre sent them well. This o bservation lends support t o the importance of all owing flexibl e response 96 patterns by clu stering the data and shrink different b idders’ parameters across their corresponding segment parameter means. Table 2.9. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments winner regret loser regret valuation revelation learning winner regret 1 loser regret 0.427 1 valuation revelation 0.662 0.589 1 Learning 0.135 0.474 0.613 1 Table 2.10. Relation between the winner regret i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments Regressand Re gre ssor Estimate SE t-stat p-value W inn er regret Intercept -0.43 0.27 -1.61 0.11 loser regret 0.63** 0.20 3.16 0.00 W inn er Regret Intercept -0.78** 0.20 -3.901 0.00 0 Learning 0.94** 0 .13 7.305 0.000 valuation revelation - 0.59** 0.17 -3.418 0.00 1 Loser Regre t Intercept -1.64** 0.16 -10.118 0.000 Learning 0.33** 0 .10 3.149 0.003 valuation revelation 0 .17 0.14 1.193 0.239 ** Two tail 0.95% con fidence inte rval signific ance 97 Table 2.11. Explaining winner regret i α , the loser regret i β , the update of valuation parameters i δ and the learning parameter i ρ estimates across 47 bidder segments Regressand Regressor Estimate SE t- stat p-value W inn er Regret ) 64 . 0 ( 2 = − R Adju sted Intercept -1.278* 0.017 -75.449 0.000 Segment Size 0.000 0.000 0.564 0.576 Bidders Feedback mean 0.001* 0.000 6.344 0.000 Number of Bids on This item -0.004 0.003 -1.57 0 0.125 total number of bids in 30 days 0.013* 0.004 3.574 0.001 Number of items bid on in 30 days 0.000 0.000 -1.456 0.153 Bid activity with current Seller 0.003* 0.001 2.430 0.020 Number of categories Bid on Mean 0 .044 0.046 0.964 0.341 Loser Regret ) 30 . 0 ( 2 = − R Adjusted Intercept -1.341* 0.016 -83.916 0.000 Segment Size 0.000 0.000 -0.737 0.466 Bidders Feedback mean 0.001* 0.000 4.261 0.000 Number of Bids on This item -0.004 0.003 - 1.734 0.091 total number of bids in 30 days -0.001 0.003 -0.311 0.758 Number of items bid on in 30 days 0.000 0.000 -1.025 0.312 Bid activity with current Seller 0.000 0.001 -0.455 0.651 Number of categories Bid on Mean -0.02 7 0.043 -0.635 0.529 Learning value from bids ) 64 . 0 ( 2 = − R Adjusted Intercept 0.271* 0.017 16.326 0.000 Segment Size 0.000 0.000 1.168 0 .250 Bidders Feedback mean 0.001* 0.000 9 .030 0.00 0 Number of Bids on This item 0.003 0.003 1.216 0 .231 total number of bids in 30 days 0.002 0.004 0.601 0.551 Number of items bid on in 30 days 0.000 0.000 -0.208 0.836 Bid activity with current Seller 0.001 0.001 0.590 0.559 Number of categories Bid on Mean -0.02 3 0.045 -0.507 0.615 Valuation update ) 41 . 0 ( 2 = − R Adjusted Intercept 1.269* 0.016 79.126 0.000 Segment Size 0.000 0.000 -0.215 0.831 Bidders Feedback mean 0.001* 0.000 4 .782 0.00 0 Number of Bids on This item 0.004 0.003 1.521 0 .136 total number of bids in 30 days -0.011* 0.003 -3.148 0.003 Number of items bid on in 30 days 0.000 0.000 1.859 0.071 Bid activity with current Seller 0.000 0.001 -0.213 0.833 Number of categories Bid on Mean -0.01 0 0.043 -0.242 0.810 * Two tail 0.95% con fidence inter val significan ce 98 I evaluated the correlation between bidder-specific p arameters as well. Table 2.9 shows t he correlation matrix for re gret and learning parameters a cross 47 bidder se gments and Table 2.10 shows t he regression analysis b etween these parameters. The results show that wi nner regret is positively correlated wi th lo ser regret. I ex plain this result by the type o f bidders: Some bidders might be emotional, s o the y account for both wi nner and loser regret emotions, and the others might be less emotional so they generally regret l ess. I also find a negative relationship b etween learning l ess from others ( status quo tendency) and feeling winner regret, consistent with Inman and Zeelenberg (2002) fi ndings. In o ther words, I find that b idders who update their v aluations based on the new auction board bid less, anticipate more winner regret than others. I also expl ain th e estimat ed regret and learnin g pa rameters b ased on th e ob served ch aracteristics of the bidder segments. These characteristics are good proxi es for the bi dders’ ex perience and important factors on bidders’ behaviors, as I discussed earlier. T able 2.11 presents the result of this an alysis. The results show that bidders wit h more feedback score (i.e., more experience) a re less regre tful, an d learn mo re from the b ids on the auction board. I also find that biddin g in several categories cor relates wit h more loser re gret and the valuation updat e correlates positi vely with the bidders’ feedback score. The latter r esult suggests that bidders with more experience reveal th eir value more, which is consistent wi th the dominant strateg y of rational bidders in t he auction literature. Table 2.12 pres ents the summary statistics for the auction-specific parameter estimations. T he average of the parameter for the b elief about the growth of bids is 1.77 across all auction items, which suggests that bidders believe that bids will exponentiall y grow as n ew bids enter. This exponential growth is co nsistent with the f orm of the evolution of b ids in figure 2.1. The average 99 drift parameter is 5.58 across all auctio n items, which suggests th e dollar value that bidders expect a new bid will increment t he p revious bid after the growth. The average rate of entrance between two bids is 1.05 across auction items, s uggesting bidd ers expect 1 new bidder watch the auction and ready to bid between two consecutive bids. The last minute rush rate is 2.09 across auction i tems, which suggests that b idders expect the rate of entrance to t riple at the end of t he auction. This is consistent with the sniping behavior. Table 2.12. Summary statistics for the auction specific parameter estimations within each auction category (19) Parameter min max Me an SD avg. growth of bids 1.5 2 1.77 0.11 se growth of bids 0.07 0.24 0.14 0.04 avg. drift of bids 5.36 5.81 5.58 0.12 se drift of bids 0.08 0.2 0.12 0.03 avg. l ast minute flood 0.9 1.26 1.05 0.09 se last minute flood 0.06 0.24 0.13 0.05 avg. mean entrance rate 1.84 2.26 2.09 0.11 se mean entrance rate 0.08 0.24 0.14 0.04 2.7. COUNTERFACTUAL ANALYSIS Notification policies a re shown to be effective in influencing th e bidder’s feeling and anticipating regret (see, f or example, F iliz-Ozbay and Ozbay 2007 ). Such a notification policy for an auction platform s uch as eBay might su ggest sendin g emails of auction winning bid and amount paid statisti cs to the u sers, or to present such information on the website. Notification policies c an be conditioned on the bidd er behavior to target only naïv e bidders. However, to implement a p olicy change, th e auction platform should be able to predict the r evenue implications of such an action accuratel y. From this aspect, in addition to allowin g for tar geting bidders, the key advantage of modeling the bid ders’ decision structurally is the ca pabilit y to 100 study counterfactuals. My empirical results show that the bidders experience si g nificant wi nner regret on eBa y auctions. Therefore, I studied a counterfactual scenario wh ere an auction platform shuts down th e winner regret usin g i ts notification p olicy (I first assumed t hat loser regret is sti ll in effect). To run this counterfactual scenario, I set the winn er re gret parameters to zero while keepin g all the o ther parameters in t heir estimated values. Giv en th is n ew settin g, I started from the first bid, and, at each point in ti me, I computed th e optimal bid of a given bidder by runnin g a Broyden– Fletcher–Goldfarb–Shanno (BFGS) optimization al g orithm on the utilit y f unction of the bidder, presented i n equation ( 1). Given this new opt imal bid , I then updated t he t ime var y ing parameters of belief about the distribution of the latent bids, b y running Kalman Filter, and I computed the o ptimal bid o f the next given bidder. In this wa y , I simulated the bids of all the bidders at each point i n time and determined the winning bid and the amount paid for each auction item in the new environment with no winner regret. Figure 2.4 presents the re sults of this anal y sis on six auction samples. The results show that shutting down the winner regret can increase the wi nning bid two to four t imes in some auctions. The results resemble a step function in some p roximit y , becaus e, as a result of th is shut down, some bidd ers bi d so much hi gher than others that the other bidders’ bids became irrelevant, as they are prone to raise a bid with a lower value. 101 F igure 2.4. Counter factual analysis of shutting down winner regret (blue line the optimal biddin g when regret is shut down, and red line the observed) 102 Table 2.13. Counterfactual anal ysis of shutting down only winner and both winner/loser regret I furthe r studied th e re venue i mplications at a uction platform level. T able 2 .13 p resents the results o f this counterfactual analysis within each auction categor y , and a cross all th e auctions. By shutting down the bidders’ winner regret through a notification polic y , on a v erage, the auction platform c an i mprove its revenue of eac h item by 32 %, a nd its total reve nue b y 24%. Considering category-bas ed improvements indica tes that “Stamps” and “DVD and Movies” h ave the highest improvem ent, when I shut down the win ner regret. Si g nificant improvements b y Auction Categor y Number of Auctions Average improvement of shutting down winner regret Average improvement of shutting down both winner and loser Jewelry and Watches 14 9 28% 28% Collectibles 103 36% 32% Clothing, Shoes and Accessories 84 25% 16% Crafts 78 28% 31% Pottery and Glass 74 27% 22% Antiques 68 40% 49% Toys and Hobbies 93 29% 30% Stamps 72 61% 43% Books 84 28% 30% Tickets and Experiences 91 18% 5% Art 70 25% 21% Gift Cards and Coupons 85 40% 38% Music 86 44% 27% Consumer Electronics 83 19% 17% DVDs and Movies 87 53% 39% Dolls and Bears 84 27% 39% Entertainment Memorabilia 88 23% 13% Health and Beauty 74 37% 40% Video Games and Consoles 93 39% 38% Total improvement 24% 24% Average improvement across all auctions 32% 29% 103 shutting down w inner regret occur in wi dely available g ood c ategories such as DVD’s and movies, music, h ealth and beauty, b ooks, and gift card s and coupons, in which usuall y n on- expert bidders bid on (stamps and antiqu es are exception s). However, smaller improvements occur in tickets and exp eriences, entertainment memorabilia, and art, which mo st likel y attract more exp ert bidd ers. To test this h y pothesis I regr essed the counterfactual r evenue improv ement of the auction it ems on the characteristics of bidders and p rice. Top part of T able 2.14 pr esents the results. They suggest that the number of b id on a specific auction item and the total n umber of it ems bid on are positivel y corr elated with the improvemen t in revenue. This can be ex plained by incremental bidders, i. e., those who bid a lot are naive increm ental bidders, and shuttin g down winner regret improves the revenue mo re, when bidders are n aïve. Furthermore, a high nu mber of auction i tems bid on is a s ignal of th e bidder’s not concentrating o n on e auction item to win, which is another proxy for less experience of the bidd er. As n otification p olic y might have the potential to remove b oth t ypes of regre ts t ogether, I also experimented the effect of shu tting down both wi nner and los er regret. S ince winner and l oser regrets affect t he bids in the opposit e directions, i ntuitivel y , such a s hutdown should decrease the amount of revenue improvement of shutting down only the winner regret, and, indeed, on average, it improves each auction’s highest bid by 29% and the total revenue by 24%, sli ghtly less than shutting down only the winner re gret. Regressin g th e counterfactual r evenue improvement on the characteristics of bid ders and price did not provide us statistically significant relations in this case (estimation results are presented in bottom part of Table 2.13). 104 Table 2.14. Counterfactual revenue improvements exp lained by the characteristics of bidder on each auction bidder category Shutting down winner regret Coefficients Standard Error t Stat P- value Intercept 0.1559 0.0921 1.6935 0.0905 Feedback 0.0000 0.0000 -0.9519 0.3413 Bids on this item 0.0035 0.0020 1.7814 0.0750 Total bids in 30 days 0.0000 0.0001 -0.3022 0.7625 Number of items bided on 0.0009 0.0004 2.5731 0.0102 Activity with the Seller 0 .0006 0.0011 0.5675 0.5704 Number of Categories bid on 0.0231 0.0303 0.7613 0.4466 Shutting down both types of regret Coefficients Standard Error t Stat P- value Intercept 0.166 0.084 1.987 0.047 Feedback 0.000 0.000 -0.611 0.542 Bids on this item 0.002 0.002 1.358 0.175 Total bids in 30 days 0.000 0.000 0.404 0.686 Number of items bided on 0.000 0.000 0.756 0.450 Activity with the Seller 0.000 0.001 0.008 0.994 Number of Categories bid on 0.035 0.027 1.266 0.206 2.8. ROBU STNESS CHECKS I first checked the robus tness o f my results to some of the modeling assumptions. One assumption in my mod el is that bidders u se their own bids as a prox y fo r how much the y are going to p ay in case they win t he auction 22 . Therefore, they compare their bid s wi th their beliefs 22 Even if bidd ers might b e aware of t he second price n ature of eBay auct ions, it still make sens e to make this assumptio n, since the bi dders do not know whether t here will be new bids between the one on th e auction boar d and thei r bids before the auction e nds. Therefor e, it is possible that they will pa y a pri ce very clos e to t heir own bids even if they win. 105 about the maximum bid of other bidders ( since the maxi mum bid of others is not directly observable). I tested an alternative utility specification assuming that when a b idder b ids it b at time t , she considers the fact that she would pa y an amount between her bid and the currently displayed maximum bi d, in case she wins th e auction. I model this situation b y replacing it b with it i it i b max ) 1 ( λ λ − + where it max is the maximum b id that is shown on the auction board and i λ is the paramete r (to be estimated from data) of expected pro portion of the current bid difference that can be possibly filled by new bids . I f ound that this new specification do es not ch ange the infer ence significan tly. In th e new specification, winner an d loser regrets are still si gnific ant at p<0.05 across all bidder seg ments except o ne, and their magnitudes, o n average, do not change significantly. Other in sights derived from the main model did not change either, so I concluded that my utilit y specification is robust to this assumption. I further ch ecked the robu stness of my estimation algorithm to th e clusterin g approaches. First, I tested appl ying a different clustering method on the auctions, rather than using the eBa y- specified clusters. In par ticular, the descriptions in the item titles in Table 2.3 su ggest that the auction cate gories ma y not be the best way to c lassify auctions, since the keywords in the product titles also provi de u seful information for classification. For ex ample the words “Shampoo”, “Conditioner”, and “Styler” in the title of the auction item which is classified as “Gift Card an d Coupon”, might su ggest that this item is actually closely related to items in “Health and B eauty” category. Or the word “Original” appearing in the d escriptions of items th at are in “Collectible” and “Entertainment and Memorabilia” categories mi ght su ggest that bidders 106 behave similarl y f or th ese two items, as a r esponse to this si gnal. This observation l ed me t o use a model that inco rporates not o nly th e auctions’ structured infor mation, but also unst ructured text description of the auction items. To be able t o incorporate text data, I used frequenc y matrix of words, an d after augmenting it with t he auctio n i tem information, I used a Latent Dirichlet All ocation ( LDA) model to clus ter the auction items b ased on th eir observed characteristi cs. Similar t o bidder specific parameters, I used a two-step ap proach, 2 3 i. e., I first clustered the auctions using t he LDA model and t hen I conditioned on the se gment membership in the esti mation procedure to shrink the auction specific parameters. I used a to pic modeling p ackage available in R, to run LDA usin g Variational Ba ye sian Expectation Maximization (VBEM) method, over a data set of key word frequency and au ction information. To cre ate auction key word frequency matrix, I used WordNet python interface to lemmatize the key words after parsing them, and I onl y kept the keywords that are available in the dictionary. I explain the details of the LDA estimation procedure in the online companion. This procedure grouped the auctions in to 50 cl usters 24 . The average nu mber of bidders (resp. bids) in each cluster ran ges from 6 to 13 ( resp. 25 to 63) and th e av erage au ction duration ranges from 4 to 6 d ays 25 . The optimal MAP in th is case is estimated to b e -99,276 ,228 (5.3% l ess than the estimated main model MAP). Th e estimation and counterfactual results can be found in the 23 I did not us e LDA in th e estimation p rocedure, mai nly for its det rimental effect o n the run tim e. 24 If I use k-means algorithm , I observe th at within gr oup sum of squ are uniforml y decreases as th e number o f clusters increases (see Figure B.2 in Appendix). However, I used 50 clusters, since some clusters become high ly sparse if I use mor e than 50 clusters. 25 Except one ( outlier) cluster that, on avera ge, lasts one day with one average numb er of bids and bi dders. 107 online companion. Ov erall, my i nsights deriv ed from the main model do not change significantly. Second, to ch eck the robustness of my results t o t he clusterin g method used for the bidders, I tested usin g a k-means algorithm in stead of mixture normal fuzz y clustering. Appen dix B presents the change in within group su m of square based on the number of clusters using k- means algorithm, 26 and compares the summar y stat istics o f clusters derived by both approaches. These an alyses show that bot h c lustering methods p rovide similar cl usters for bidder s. The optimal MAP in t he cas e of k-means algorithm is estimated to be -103,428,091, which is 9.7% less than the estimated main model MAP. Furthermore, m y main insights are not affected significantly in this case either 27 . 2.9. CONCLU SION In this paper, I d eveloped a s tructural model that accounts for bidders' l earning and their anticipation of winner and loser regrets in an auction platform and proposed an e mpirical Bayesian esti mation meth od t o cali brate the parameters of this model. Th en, usin g a large data set from eBa y , I showed that bidd ers anticipate sig nificant levels of r egret in various product categories. M y r esults also demonstrate th at ex perience can exp lain the h eterogeneity in t he bidders' learning, updating, and regretting behavior. I further illustrated ho w the estimated model c an be used to anal yze a counterfactual scenario where the auction platform s huts down the bidders' winner regret. This counterfactua l anal ysis 26 The elbow of the curve in Figure B.1 is around 50, w hich suggests th at the range of t he optimal num ber of clust ers that BIC c riterion su ggests in mixture normal clust ering (47 cluster s) is robust . 27 Estimation results of this model is avail able upon req uest from author. 108 shows that, i f eBay can shut d own winner re gret of bidders b y appropriate notification p olicies, it can increase its revenue by 24%. I believe that m y modeling approach, p roposed estimation method, and derived e mpirical in sights in this pa per can be of i nterest to both practiti oners and scholars in academia. 109 CHAPTER 3 MEASURING GAMIFICATION ELEM ENTS’ EFFECTS ON USER CONTENT GENERATION: AN EMPIRICAL STUDY OF STACKOVERFLOW’S TWO SIDED PLATFORM’S BIG DATA Meisam Hejazi Nia Naveen Jindal School of Management, Department of Marketing, SM32 The University of Texas at Dallas 800 W. Campbell Road Richardson, TX, 75080-302 110 3.1. ABSTRACT The cornerstone of the new marketing era consi sts of user ge nerated content. This information is useful for redu cing consu mer uncertaint y, generating ne w ideas for new products, and managing the customer r elationship. To motivate users to generate content, practitio ners use video g ame elements such as b adges, leaderboard, and reputation points for user achievements, in an approach called G amification. To allow Gamification platforms to target their users, I profile user segments b y an ensemble meth od over LDA, mixed-normal and k-mean clusterin g, and then I d evelop a model of s tate-dependent choices of content generating us ers. This model captures long tail distributio n of user heterogeneit y b y Dirichlet Process, and inv estigates the effects of fun and so cial elements of Gamification, reputation p oints, rank in t he leaderboard, and bad ges (i.e. gold, silver, b ronze) on the users’ probabili ties to contribute content. I used a bi g data set of approximately 11,000,000 choices made by 36,000 u sers across 250 d a ys on Stackoverflow to estimate the mixed binar y lo git model of users’ content contribution choices. I show that estimating the model on smaller random samples generate b iased r esults. The estimation results demonstrate that u sers show heterogeneous si gnificant positive and ne gative inertia, reciprocit y, intrinsic motivation, an d res ponses t o badges, reputation points, and leaderboard ranks. I fou nd interesting sensiti vity patterns to Gamification elements for users with differe nt nationality, which allows the Gamification platform t o create targeted mess ages. The co unterfactual analysis suggests that the Gamification platform can increase the number of contributions by making earning badges more difficult. 111 Keywords: Gamification, user g enerated content, mixed log it with DP pri or, sem i parametric Bayesian, ensemble segmentation, targeting 3.2. INTRODUCTION An underu tilized marketing resource is user generated c ontent, a t ype o f o nline contents that customers generate and use. User generated content is an important tool f or generating word o f mouth buzz, collecting n ew product d evelopment ideas, decreasing consumers’ uncertainty about an exp erience goods, engaging brands, and ma naging customer relationship s. However, to use this resource effectivel y, marketers might need to know ho w to motivate users t o generate more favorable and hi gh qualit y conten t. To motivate th e u sers, marketing pr actitioners have sta rted to use the video games concepts such as badges and points for user achievements, and leaderboards, for user popularity, in a method called Gamification. According to Gabe Zicherman, the au thor of “game based mar keting”, Gamification is the use of game pla y mechanics for non-game applications (Z ichermann and Linder 2010). In other words, Gamification is the process of using game thinking and mechanics to engage an audience and solve problems (Van Grove 2011). Studies show t hat g ame pla y itself stimulates t he human brain (releasing dopamine ), so Gam ification a ims to bring the proven mechani cs from gaming into marketing (Bosomworth 2011). Gartner predicts by 2016, G amification will be a vital tool for brands’ and retailers’ cus tomer lo yalty and marketin g . However, th is report h ighlights that firms 112 are s keptical about the longevity and the real efficiency of Gamification as a tool to motivate customers 28 . As a result, given the int erest i n and skepticism a bout the effects of Gami fication mechanics on motivating users, this s tudy asks the fo llowing questions: How to model the choices o f consumers in r esponse to Ga mification mechanics? How to weigh emotional elements su ch as fun i n relation to the mechanical elements, such as badges and leaderboard? How to d esign a scalable a nd flexible targ eting approach tha t is f easible on m assive streaming G amification platform data? Are the social aspects of p ublic go od contributions, su ch as rec i procity an d reputation, important in motivating users to provide content in a gamified context? Answering each of these questions help s the Gamification platform to form a different targeted policy t o increase the users’ co ntent contributions. F o r example, depending on whether bad ges are good or bad motivators, the Gamification platform might modif y the thresholds of earn ing them. As p oints sum up to build the user s’ reputation, depending on whether d ifferent users respond positivel y or ne gatively to their reputati on changes, the Gamific ation platform can send a customized list o f t asks with different difficult y level to users. In the customized list, the Gamification platform might p rioritize tasks to make sure that the com munity replies to the request of target us ers who are positivel y reciprocal. In addition, given th at Gamification is about user empowerment, the Gamification platform mig ht want to send positive empowering messages to failed users who have high inertia. 28 Gartner's Gam ification predic tions for 202 0. Growth Engi neering websit e. http://www.growt hengineeri ng.co.uk /future-of-gami ficati on-gartner/. Acc essed June 7, 20 15. 113 To respond to these questions and take i nto account the emotional nature of the motivation process, the current s tudy buil ds i ts model in the li ght of th e state-dependent utilit y model in t he consumer choice literature. I n particular, I incl uded in the state-dependent ut ility model t he elements that might define the observed motivation state o f users. A us er decides whether or not to contribute, b ased on this u tility. First, I included i n the model het erogeneous stimulation l evel in a f orm of user specific random effects, g uid ed by the studies in the consumer behavior literature (Mittelstaedt 1976; Joachimsthaler and Lastovicka 19 84; Steenk amp and Baumgartner 1992). Second, I considered t he number of badges in different categories (i.e. g old, silver, bronze) to have different effects, guided b y the Gamification lit erature (Wei et al. 2015; Li et al. 2015; Deterding 2012; Antin and Church il 2011; etc.). F urther more, I allowed the users of different segments to respond differentl y to the same type of badges. I considered the social aspect of users’ decisions at two levels: first, the reciprocit y and the reputation points at state of u ser utilit y level; and second, the reach of use rs at h ierarchical level, guided b y the literature on behavioral aspect of decision makin g (Bolton et al. 2013; Bolton et al. 2004; Yog anarasimhan 2013; L ee and Bell 2013; Toubia and Stephen 2013). I also considered that the effects of bad ges and reputation points in motivating users might b e different in the sh ort and long term, s imilar to the effect of loyalty p rogram rewards and promotio ns in the marketing literature (Liu 2007; Jedidi et al. 1999, Mela et al. 1997; Lewis 2004). To esti mate th e model I use a data set I s craped from Stac kOverflow b y m y P ython crawler. The data set i ncludes app roximatel y 11,000,000 contri bution choices of 36,000 users ov er a course of approximately 230 da ys. StackOverflow is a question and answer website, where re g istered users can post t heir p rogramming questions, and th e other communit y members can r espond. The 114 StackOverflow b usiness model is based on th e traditional job listing, C urriculum Vitae search, and unobtrusive adv ertising. It uses Gamif ication concepts such as reputati on points, badges, and a leaderboard to motivate its u sers. Community m embers can up-vote or do wn-vote a q uestion or an answer, and StackOv erflow keeps track of the votes a user re ceives as reputation points. The platform (i.e. StackOverflow) u ses these vo tes later as a m easure to define who receives bad ges at g old, silver, and bron ze levels in d ifferent knowledge do mains. Thes e domains are specified by tags that a user attaches t o the question. In addition , these reputation points define the rank o f each use r on the leaderb oard. I select ed St ackOverflow as a sourc e o f the d ata for this study, because it implements a successful Ga mification mechanics on its question and answer pl atform (e.g., Antin and Churchil l 2 011; W ei et al. 2015; Li et al. 2 015). To use t hi s data, I wrote a web crawler, and I synthesized the data from various web pa ges based on the user identity. Estimating the model o ver the big data set allows the Gamification platform to t arget its policies effectively, if the estimates capture t he h eavy tail of u ser-heterogeneity p arameter-distribution. I employed a mixed b inary logit model with h ierarchical Dirichlet p rocess, which a l lows the number of response parameters to increase with sample siz e. Al lowing the number of parameters to increase with the sa mple si ze allows the esti mation procedure not only to learn th e tail more effectively, but al so to l earn more about the infinitely complex rea l p henomena as more data becomes available. To t he best of m y kno wledge n o st udies i n marketin g have estimated a choice model over such a big data set. Instead, marketing scholars r esort to a linear -probability d ata-fitting approach to estimate co nsumers’ parameters ( Goldfarb a nd Tucker 2011). An alternative approach is to sample from the data, b ut throwing awa y d ata might not be a relevant st rategy for targeting. I 115 showed the estimates for t he model using samples with different sizes. The results showed that estimating the model over a smaller sa mple sizes results in biased esti mates. As a result, importance of a quick, flexible, and scalable method is highlighted. The results show th at users can be seg mented into competitors (20%), collaborators (21% ), achievers (25%), exp lorers (11 %), and un interested (22%) users. The u sers show heterogeneous significant positive and negative r esponse to the badges, leaderboard ranks, and reputation points. In addition, users show heterogeneous si gnificant positive and ne gative inertia, intrinsic motivation, and reciprocity. Thes e results sug gest that the Gamification platform can condition its targe ted message o n t he u sers’ responses to increase their content cont ributions. Particularly, my results identify that certain nationaliti es are sensitive to certain Gamification elements. For example, American users show s ignificant inertia, increase t heir contribution wh en earning silve r badges, but d ecrease their contribution when their reputation is g reate r. However, Europe an users increase their contribution when their reputation is greater, but they decrease their contribution when the y earn Gold badges. Given the estimated parameters, and the two sided sword effects of the badges, I u sed a counterfactual anal ysis to study the effect o f modifying the th reshold of b adges on t he resp onse of users. The results su ggested that the Gami fication platform ma y want to increase the thresholds of earnin g t he badges rather t han decreasin g them, to make b ad ges hard er to a chieve. This recommendation parallels the rec ommendation in stu dies on lo yalty p rogram effectiveness in the marketing l iterature that su ggests increasing the r eward threshold is a good choice. In the Gamification context, this decision is important because bad ges are once-i n-a-lifetime elements, without expir y date. 116 In summar y the current study contributes to the l iterature in ma rketin g in the following wa ys: First, althou g h a st ream of literature in marketin g focus es on various factors that affect t he valence of user generated content, and i ts impact in reducing custo mer uncertaint y (e.g., Weiss et al. 2008; Moe and Schweidel 2012; Godes and S ilva 2012; Mallupraganda et al. 2 012), the user motivation to contribute content is understudied. The current stud y tries to narrow this gap by determining which Gamification elements can drive motivation of users to contribute content. Second, although many practitioners and social psychologists emphasiz e the role of Gamification as a motivator ( Wu 20 11; Deterding 2012; C onejo 2014), quantitative measures of G amification elements such as badges and leaderboard to help t he Gamification platform to target its policies are understudied. Two studies in pr ogress b y Wei et al. (2015) and Li et al. ( 2015) use a difference in d ifference and a hidden Markov model to id entify such effects. However, both of these studies assume that th e users select the number o f contributions, rat her than whether to contribute o r n ot. Also t hese stud ies do n ot account for heterogeneit y in users’ r esponses to the Gamification ele ments. Therefore, in the current stud y I modeled the binar y choice of the users while allowing for state-dependenc y and heterogeneity. Finall y, I use ensemble method over LDA, mixed normal, an d k -mean clusterin g methods to profile user segment b ehaviors, and mixed binar y logit model with hierarchical Dir ichlet Process prior to recover user specific parameters. These contributions sho uld be of interest to both practitioners and scholars. 3.3. LITERATURE REVIEW This stud y draws upon several st reams with in the literature that have in vestigated, in cluding: (1) User Generated Content (UGC); ( 2) Gamification mechanisms and rewards in lo ya lty p rograms; (3) Optimal stimuli l evel and state dependent choice models; (4) Behavioral aspects o f d ecision 117 making (altruism, reciprocit y , endowment effect, etc.). Given the breadth of these areas across multiple d isciplines, what f ollows i s only a brief revie w of these relevant streams. Table 3.1 presents a list of relevant studies in each literature stream. Table 3.1. The relevant streams of litrature in five clusters Research Area References User Generated Conte nt and free rider problem Mallapragada et al. (2012) ; Godes an d S ilva (2 012) ; Moe and Schweidel (2012); Chevalier and Ma y z lin (2006); Chaudhuri (2011); Chen (2008); Weiss et al. (2008). Gamification elements ,Mechanism, and Loy alt y Li et al. (2015) ; Wei and Zhu (2015) ; Conejo (2014) ; Bittner and Shipper ( 2014); Salcu and Acatrinei (2013); Roth and Schneckenberg (2012); Kopalle et al. ( 2012); Zhang and Breugelmans ( 2012); Wu (2011); Zichermann and Cun ningham (2011); Pink (2 009); L iu (2007); Shugan ( 2005); Kivetz an d Simonson (2002); Bol ton et al. (2000 ). State dependent choice model, and optimal sti muli Dubé et al (2008) ; Seetharaman (2004) ; Seethar aman (1999) ; Guadagni and Little (1983); Steenkamp and Baumgartner (1992); Joachimsthaler an d Lastovicka (1984 ); McAlister (19 82); Mittelstaedt et al . (1976); Lewis(2004); Jedid i (1999). Behavioral aspect of Decision Making, Altruism, reciprocity Toubia and Stephen (2013) ; Lee and Bell (2013) ; Yog anarasimhan (2013); Bolton et al. (20 04); A ndreoni (1990); Cornes and Sandler (1994); Bolton et al. (2013); C hurchill (2011) ; Chen et al. (2010); Raban (2009); C hiu et al. (2006); Ren and K raut (2011); Tedjamulia et al. (2005). Big Data Estimation Methods McMahan et al. (20 13) ; McMahan ( 2011) ; Genkin et al. (2007) ; Le Cessie and Van Houwelingen, (1992); M urphy (2012). 3.3.1. User Generated Content (UGC) User ge nerated content i n marketing refers to the contents that are both p ro duced and consumed by the s ame consu mers, for example qu estion and answers, blogs, Twitter, so cial networks, and YouTube vid eos (Mallap raganda et al. 2 012). U GC can also be considered as a form of public goods, because one cannot exclud e others from usin g it after and durin g usage. M arketing and economics scholars hav e s tudied UGC fro m t wo p erspectives: con sumption and p roduction. From the consumption per spective, Chevalier and Ma yzlin (2006) find that UGC can have a 118 positive effe ct on sales. In addition, W eiss et al. (2008) find that the consumers’ goal and the social h istory o f t he producer affec t h ow co nsumers perceive th e value of UGC. F rom the production perspective, Godes and Silva (2012), Moe and Schweidel (2012), and Mallapraganda et al. (2012) su ggest th at the UGC creation p rocess is subject to s election bias du e t o s ocial influence and the heterogeneit y in pr eferences of th e product ado pters who enter with different order and at different times. Although these studies are useful, none of the m discuss how the firm can affect the UGC creation process by motivating users. The public good liter ature in ex perimental economics fills the gap b y st udying these incenti ve compatible mechanisms (Chen 2003). By relaxing stron g rationality assumptions, these studies find th at punish ing alt ruistically and monetarily, gro upin g likeminded i ndividuals, and passing advice ac ross generations can motivate the users (Chaudhuri 2011). Although th ese studies are helpful, th ey neglect that users’ emotion can also be relevant. In p articular, psychological studies emphasize that having fun, earnin g virt ual rewards, setting goals, and empowering can also motivate users to contribute UGC. In the current s tudy I fo cused on quantifying the effect of such psychological factors on the users’ choice to generate content, in a gami fied context. 3.3.2. Gamification mechanisms and rewards in loya lty programs To motivate u sers in a n on-gaming environment, Gamification uses ele ments from video games (Bittner and Shipper 2014). Gamification elements can be classified according to three categories: d ynamics , mechanics, an d component s (Zichermann and Cunn ingham 2011). Game dynamics in volve personal-psychological elements of the sense of progression, emotions, relationships, and narratives. Game mechanics in volve social-psychological elements, in cludin g feedback, rew ards, competition, cooperation, and transactions. Game co mponents include 119 achievements, levels, points, badges, leaderboa rds, and virtual goods. Studies relevant to Gamification can be classified into two groups: quantitative and qualitative studies of ge neral Gamification, an d the Gamification role in m arketing. Only two stu dies quantify the ef fect o f Gamification elements. First, Li et al. (2015) use difference-in-difference reduced-form to identify the a ggregate le vel effect of bad ges on u sers’ nu mber of content contributions. Second, Wei et al. (2015) ai m to quantify these e ffects structurally b y a Hidden Markov Model (HMM ), but they consider th at us ers plan how many cont ents to contribute, rather than p lan at each point whether to cont ribute or not (i.e. binary choice). Althou gh helpful, both of the studies fail to control for the users’ heterogeneity and effects of users’ inertia. However, the qualitative stu dies of Gamification emphasize the role of user inertia. These studies emphasize the psychological need of consu mers to exp erience pleasure, fun and emp owerment, based on s elf-determinati on theory (Wu 2011). Unlike monetar y r ewards, fun and pleasure are process-focused motivat ors, rather than outcom e-focused motivators (Shen et al. 2015). In Gamification, out come-focused motivators include social status and reputation. Gamification captures these outcome-focused motivators in points, leaderboard, and bad ges (Deterding 2012). These studies are h elpful, but for marketing purpo ses, a Gamification platf orm needs a for mal model to quantify the effect of Gamification elements to target users based on t heir behavior. Marketing scholars have studied Gamification elements qualitativel y . Bitter & Shipper (2014) shows in a case st udy that Gamification is useful for advert isin g . Salcu and Acatriney (2013) discuss a case st udy in which Ga mification has wo rked in th e affiliated ma rketing progra m. Roth and Schnechenb erg (2012 ) f ind t hat Gamification is useful for innovation and cr eativity. Co nejo (2014) posits th at Gamification can revolutionize loyalty programs. I n particular, Ga mification 120 differs from conventional loyalty programs given its em phasis on fun, m eanin gfulness, and empowerment, in addition to point rew ards. The lo yalty p ro grams’ point rewards have been the subject of man y marketing studies (Bolto n et al. 2000; Shugan 2005; Kivetz and Simonson 2002; Kopalle et al. 2012, Zhan g and Bre gelman 2 012; Liu 2007; Jedidi et al. 19 99; Mela et al. 1997; Lewis 2004). Th ese st udies support a model o f Gamification ele ments to control for heterogeneity and short and long t erm effects, but they do not quantif y the effects of Gamification elements on users’ content contribution. 3.3.3. Optimal stimuli level and state dependent choice models Inertia an d state dependence that qualitative Gamification studies suggest is the subject of two groups of studies in con sumer r esearch and marketin g science. Consumer resea rch scholars emphasize th at, to engage in explorator y behav ior, consumers need to be emotionall y motivated until they reach their hetero geneous optimal estimation level (Steen kamp and Baumgartner 1992; Joachimsthaler and Lastovicka 1984; Mittelstaedt et al. 1976; Seetharaman et al. 1999; McAlister 1982). Marketing s cience scholars use fi xed effects to model this opt imal sti mulation level, and the y use lag ged instant or cumulative choices to control for h eterogeneous us ers’ state dependence (Guadagni and Little 1 983; Dube et al. 20 08). These st udies further emphasize that the modeler shoul d allow for a flexible h eterogeneity structure to avoid confounding state dependence with heterogeneit y. B uilt on t he above research, this study adopt ed a h eterogeneous agent-based state-dependent choice-model, rath er than t he rule-based s imulati on approac h that Ren and Kraut (2011) adopt to run counterfactual Gamification policies. 121 3.3.4. Behavioral aspects of decision making Gamification policies aim to motivate users to behave in certain w ays, for example to create content. Onlin e content can be cons idered as a form of pu blic good, becau se upon publishin g its consumption cannot be controlled. B ehavioral economics literature concludes that the impure altruism model can explain the public g ood creation (Cornes and Sandle r 1992; Andreoni 1990). In p articular, the impure altruism model considers that users might have b oth private and public incentives to contribute. In marketing literature, Toubia and Stephen ( 2013) classif y user’s heterogeneous factors of utility to contribute content into two groups: int rinsic and ex trinsic (or image and prestige related). Tedjmulia et al . (2005) argues that, under speci fic circumstances, the extrinsic factors can af fect i ntrinsic factors, either positively b y internalization or ne gatively b y over-justification. Chiu and Want (2006) argue from so cial psychological perspective that, content creati on is more influenced b y intrinsi c fa ctors such as fun and playfulness than extrinsic factors such as reciprocity and reputation. H owever, man y other st udies emphasize the importance of social capital a nd rep utation, as a substitute fo r mon etary i ncentives, in content creation (Raban 2008; Chen e t al. 2010; Toubia and S tephen 2013). Furthermore, Antin and Churchill (2011) discuss that Ga mification badges and leaderboards can influence extrinsic factors such as r eputation, status af firmation, and group id entification, an d intrinsic factors such as goal setting, and instructions. All in all, many marketing modelin g stu dies h ave emphasized the rol e of social factors su ch as reputation and reciprocity on users’ choices in variou s contex ts (Bolt on et al. 2004; Bolton et al. 2013; B anks et al. 2002; Yo ganarasimhan 2013; Lee and Bell 2013). However, no stu dy has yet 122 modeled individual users’ cont ent creation choi ces in terms of G amification policies th at aim to motivate users both intrinsically and extrinsicall y. The current study aims to narrow this gap. 3.4. DATA I collected the data for t his stud y from Stac kOverflow, because many studi es find th at it provides a successful Gamification application (e.g., Anti n and Church ill 2011; Wei et al. 2015; Li et al. 2015). Stackoverflow is an o pen online platform for p rofessional and enthusiast programmers. It was founded in 2 008, b y a fir m which later established Stack Ex change, a network of question and answer websites focused on diverse topics (rang ing fro m ph ysics to writing) and mod eled after St ackOverflow. In 2014, StackOverflow had 4 million us ers, and among th ese users, 77% asked and 65% answered questions. Duri ng this period, the y generated 11 million questions and displayed an exception level of heterogeneity i n their content creation. For exampl e, onl y 8% of the users answered mor e than 5 qu estions. Sta ckOverflow raised $ 6 million venture c apital money in 2010, and it s busi ness model is ba sed on three key activities: job listing (like traditional classified advertising), Curriculum Vit ae search, and unobtrusive d isplay advertisin g. The platform is ri gid in its focu s, requesting the u sers to ask onl y questions relevant to its topic and ref rain from rai sing questions that are opinion based or lead to open end ed chat. The moderators monitor the violation of this rule through the communit y members’ reports. The community of conten t creator users pla ys a key role in managing St ackOverflow’s day to day activities, but t he community notifies moderators in exceptional cases, for example when t he etiquette i s n ot preserved. StackOverflow select s li fetime moderators through a democratic voting pro cedure, but m oderators can resign. A ccording to Stac kOverflow, m oderators act as 123 liaisons b etween the user community and Stac k Ex change. StackOverflow asks users to register and log in befor e askin g a question, but to answer a question, a user must eit her sign up for a n account or post as a guest, in which case the user must register her name an d email address. On Stackoverflow, a use r who asks a qu estion has the option to revi ew the answers and to accept an answer as correct. In addition, ever y one can vo te up or down on either each question or answer. To be able to vo te, the user mu st first register. The sum o f all the up-votes and down- votes that a user receives for contributin g contents (i.e. questions and answers) is called the user’s reputation poi nts. W e b sur fers can observe a us er’s reputation l evel on her profile pa ge and t he le aderboard. According to St ackOverflow, reputation is a rough measure of how much the community trusts the u ser’s expertise, communication s kill, and content quality and relevance. In addition to the weekly reputation information, the leaderboar d presents the previous week’s information about user’s rank and rank change. The leaderboard provides an informal way of tracking reputation within the community. It acts like a leaderboard of a league, an d it only tracks the users’ points above a threshold of 200 points. Contributing users can also earn badges as hallmarks of their achievement s. Badg es h ave t hree categories or levels: gold, silver, and b ronze. In addition , badges are s p ecific to knowledge domains. In th is stu dy , I refer to th is detail by using a domain knowledge tag because it is directly relevant to the tag that user attaches to her q uestion. In particular, when a u ser posts a question, the StackOverflow platform requires her to attach a rel evant tag to make the question appear to relevant audience, o r sub-communit y. The total up-vot es a user e arns b y an swering t he questions relevant to the tag nominate a user for t ag b adges. T he threshold for gold, silver, and 124 bronze badges, are 1000, 400, and 10 0 r espectively, fix ed a cross the tags (or knowledge domain). Table 3.2 presents a sample of these badges. Table 3.2. Sample set of Bandges in different knowledge domains (tags) Tag Badge Name Type Definition vcl Bronze Earn at least 100 total scor e for at l east 20 non-community wiki answers in the vcl tag. entity-framework Silver Earn at least 400 total scor e for at l east 80 non-community wiki answers in the entity-framework tag. r Gold Earn at least 1 000 total score for at lea st 200 non- community wiki answers in the r tag. These users can single-handedly mark r questions as duplicates. ggplot2 Silver Earn at least 400 total scor e for at l east 80 non-community wiki answers in the ggplot2 tag. statistics Bronze Earn at least 100 total scor e for at l east 20 non-community wiki answers in the statistics tag. regex B ronze Earn at least 100 total scor e for at l east 20 non-community wiki answers in the regex tag. linux-kernel Silver Earn at least 400 total scor e for at l east 80 non-community wiki answers in the linux-kernel tag. sql-server-2008 Gold Earn at least 1 000 total score for at lea st 200 non- community wi ki answers in the sql-server-2008 tag. T hese users can single-handedly mark sql-serve r-2008 questi ons as duplicates. html-helper Silver Earn at least 400 total scor e for at l east 80 non-community wiki answers in the html-helper tag. google-app-engine Silver Earn at least 400 total scor e for at l east 80 non-community wiki answers in the google-app-engine tag. I collected the data for th is stu dy, by automatically scrapin g the Stack Overflow website. The sample includes the 3 6,915 users who appeared in the leaderboard durin g the first wee k o f June 2014, and identified t hemselves by an English n ame that can be captured by the p ython crawler and scraper code. I observed users’ choices including choices to co mment, review, revise, accept, and post-answers. Table 3.3 presents the definition o f each of the activities that I observed from users. The activity that I am interested to model is aggregate number of users’ contribution, which includes com menting for clarification, answering a q uestion, and revising an answer. To 125 control for potential reciprocit y of users, I include the total number of ac cepted post s, reviews, and asking activities in the reciprocit y variable. Table 3.3. Type of activity, description and inclusion in the dependent or independent variable Activity Name Observation s in data set Ty pe (Prox y) Definit ion Comment 1,995,665 Dependent (Choice) Includes the activity of asking for clarification, suggesting correction, providing meta information about the post (so that not confuse with answer), it is short (600 character), only limited Markup, URL, disposable, and it does not have revision history, and it can be deleted without warning to the author by the moderator. Accepted 80,446 I nd ependent (Reciprocity) Includes the activity of the questioner to review the answers and only accept the one that it finds suitable. Post Answered 671,772 Dependent (Choice) Includes the activity of answering the question that is raised on the platform. Review 1,017,029 Independent (Reciprocity) Includes the activity of the questioner to review the answers that is posted to her question. Post Asked 129,526 Independent (Reciprocity) Includes the activity of the questioner to ask a question. Revision 812,992 Depende nt (Choice) Includes the activity of revisin g the answering post that is raised on the platform. I collected these ch oices’ data for 238 days of t he sample period, namely fro m June 2014 to January 20 15. In the sample, I observed 11,2 76,186 users’ choices. Table 3.4 shows the frequency of users’ declaration of their webs ite an d nationalit y. Giv en the i dentifier of the use rs, I also collected the l eadership board information, namely the t otal reputation points per week, the weekly reputation, t he rank, and th e rank ch ange, to synthesize with the main data. In addition, I collected the history of each user’s badge earnings. 126 Table 3.4. Sample Observations’ statistics Observations Frequency Users 36,915 website 13,194 USA 9,434 UK 2,362 Australia 1,133 India 2,142 Europe 7,142 Asia 482 South America 659 China 208 Middle East 892 Figure 3.1 p resents t he total number of content c ontributions taken f rom a sample of four users over time. As can be seen, considerable heter ogeneity exists in the users’ content contribution levels, and on some d ays, users do not contribute at all. The nu mber of zero contributions of different us ers, and the heterogeneity in the nu mber of t he contributions mi g ht offer evidence to explain why a simple regression and a homogeneous response model might not return unbiased estimates. In addition, a gr eat number of n on-contribution choices mi ght suggest that the user thinks more about whether to contribute than about how much to contribute. 127 For each of the users, I al so scraped the profile in formation: tenure, last seen d ate, the number of profile views, reputation, the number of go ld, silver and b ronze badges earned, the n umber of answers, the number of questions, the total amount of reach (i.e. app roximate to tal n umber o f people who viewed the user’s posts), user’s website, and user’s country. Table 3.5 shows the basic statistics of these variables be fore and a fter the ob servation period. T he average nu mber of reputation points, badges, questions and answered increased between 10% an d 30%. To better un derstand the heter ogeneit y in use rs’ behavior, I segment ed u sers by clusterin g their observable cross se ctional information profile for pre- stud y period. Th ese p rofiles are consisted of binar y indicator and count data. Th ere are various methods to cluster this data. K-means partition based clustering migh t be relevant for its em p loying similarity index based on Euclidean distanc e. Mixture Nor mal Fuzzy model based clust ering might be relevant for its assumption that observations of a given clusters are nois y measures of cluster centers. Finall y, Latent Dirichlet Allocation ( LD A) model based c lustering method might be r elevant for its assumption that e ach pr ofile attribut e might b e relevant contingent o n the cluster th at the observation belongs to . H ubert and Arabie (1985) suggests adjusted rando m index for comparing clustering results. This method compares the pair assignment o f two clustering results and by assuming generalized hy pe r geometric distribution creates an index that bo unds to 1 un der perfect agreement, and 0 under random partition (Yeung and Ruzzo 2001). 128 F i gure 3.1. Contribution of a sample of four users over time To have computational t ractability, I rando mly s ampled 20 ,000 from 36,0 00 users for clusterin g, and ran three clustering methods to segment these users. Elbow mea sure of the within cluster sum of sq uare suggested that k equal to 30 is optimal f or k-means clusterin g method. Ba y esian Information Criteria (B IC) measure suggested that five clusters ar e en ough for mixture normal 129 fuzzy clustering method, and Log Likelihood measure s uggested th at 13 clus ters in LDA method represents the data better. I used random adju sted measure to co mpare th e partition membership result of th ese three methods. Table 3.6 presents the result o f t his c omparison. Mixture No rmal and K-means clustering generate more similar resu lt than other couples. Table 3.5. Sample Observations’ basic statistics Pre Post Variable AVG SD Min Max AVG SD Min Max Reputation Points 6,213.9 9 17,650.1 5 1 685,46 3 8,066.6 1 21,221.3 8 1 773,02 0 Number of Gold Badges 5.51 8.04 2 301 6.70 9.19 2 344 Number of Silver Badges 20.98 44.88 2 4,597 24.83 50.68 2 5 ,233 Number of Bronze Badges 40.86 66.08 2 5,951 46.30 71.74 2 6 ,507 Number of Answers 188.86 527.00 0 29,950 207.56 574.53 0 31,537 Number of Questions 36 .22 73.04 0 1,688 40.30 78.05 0 1,737 Table 3.6. Adjusted Random Index measure for c lustering agreements LDA Mixture Normal K-means LDA 1.0000 Mixture Normal 0.0005 1.0000 K-means 0.0000 0.342 0 1.0000 As i t is not clear which method to choose, I used ensemble method (similar to Strehl and Ghosh 2003) to create results that are more robust to the t ype of clustering method, rather than using each of these methods In this app roach. I c ombine the r esults of cluster membership of observation pairs using a hierarchical agglomerativ e clusterin g method. F o r ease of ex position and i nterpretability, I cut the tree at the level with five clusters. T o i nterpret the segments, this study adopts the term inology of Gamification to classify the users into four gr oups ba sed on 130 whether th e focus is on action versus r eaction, or on context versus p layers ( Wu 2012 ): collaborators (focus on interaction and pla y ers), competitors (focus on action and players), explorers (focus on cont ext and interaction), an d achievers (focus on con text and action). Table 3.7 presents the definition of each of the segments and the proxy variable relevant to the context. Table 3.7. Gamification Segment Names User Segmen t Name Focus proxy variables Defi nition Competit ors Player & Action High level of reputation and badges Users will go to great len gths to a chieve r ewards that conf er th em littl e or no gamep lay benefi t s impl y for the pres tige of havin g it. Collabo rators Player & Interaction High nu mber of answers Users gain t he most enjoyment fro m a game by intera cting with oth er users , and on s ome oc casions, computer- controlled ch aracters with persona lity Explo re Context & Interaction High nu mber of Questi ons Users find great joy in discovering an unknown glitch or a hidden East er egg. Achie vers Context & Action Person al Site in the Prof ile Users thrive on competitio n with other users, and prefer fighting them to scripted comput er-controlled opponents Table 3.8 presents the behavioral segmentation r esult based on ensemble m ethod. Users in collaborator s egment consist 20 % of th e sa mple and sh ow hi gh l evel of answering activit y, although the y have not earned s ignificantl y more reputation p oints and b adges. Users in achiever segment consist 25% of t he sample and declare th eir w ebsite more than users in other segments, and all o f them are American. Users in Explorer segment consist 11% of t he sample an d ask significantly more questi ons than others. Users in competi tor se gment consi st 20% of the s ample and h ave e arned m ore badges and reputation points than others. F in ally, I id entified users in uninterested segment th at co nsist 2 2% of th e sample and do not declare t heir nationalit y and behave poorly rel ative to ot her users with respect to all measures. In fact th is information ca n help the Gamification platform to target its customers. 131 Table 3.8. Gamification Segment Names (heat map configured at row level) Segment Name Co llaborators Uninterested Users A chievers E xplorers Competitors Whole Sample Segment Size 0.21 0. 22 0 .25 0.11 0.20 1. 00 Website 0.44 0. 00 0. 51 0.38 0.46 0.36 USA 0.00 0. 00 1 .00 0.02 0. 01 0.26 UK 0.29 0. 00 0.00 0.00 0. 00 0.06 Austrailia 0.14 0. 00 0 .00 0.00 0. 00 0.03 India 0. 27 0. 00 0. 00 0.00 0.00 0. 06 Euroupe 0.00 0. 00 0.00 0.01 0. 97 0.19 Asia 0.06 0. 00 0.00 0.00 0. 00 0.01 South America 0.09 0. 00 0. 00 0. 00 0.00 0.02 China 0.03 0. 00 0.00 0. 00 0. 00 0.01 Middle East 0.11 0. 00 0.00 0.00 0. 00 0.02 Tenure 1,584.87 1, 333.64 1, 788.74 1,691.32 1,623.44 1, 601.01 Seen since 30.26 22.65 33.75 366.40 25.60 66.70 profile Views 1, 301.65 172.23 846.11 88 6.31 1,095.88 848.84 Reputation Point s 7, 638.74 1,906.79 6,664 .90 7,084 .18 8,237.01 6, 183.55 Gold Badges 3.10 0. 66 2.92 3. 82 3. 55 2.69 Silver Badges 23.78 7. 61 22.85 25.10 25.94 20.56 Bronz Badges 46.31 1 8.68 43 .90 48.57 4 9.88 40.57 Answers 237.29 54.21 196.26 212.68 2 51.88 186.61 Questions 33.78 1 9.30 30 .29 60.24 3 6.46 33.24 Reach 5, 690 ,175.70 220,400.60 4,689,612.80 4,582,561 .20 5,917,558.80 4,149,459.60 It is import ant to n ote that users fall within the co ntinuum of these classif ications, and the noted assignments onl y discrimin ates based on the strength of each of the proxy s ignals (i.e. number of questions, number of answers, lev el of reputation, d eclaration of a personal webpage), for ease of interpretation. Figure 3.2 p resents the median an d the qu antiles variations of the total number o f answers that each user re views or accepts (i.e. contribution received), th e to tal reputation points of th e us er, her we ekly reputation p oints and rank evolution across 238 da ys o f th e study. Th e variation in th ese variables further su ggests that an aggregate non-agent-ba sed model might miss the underlying dynamics in the data. 132 F i gure 3.2. Evolution of Explanator y Variables Over time (Median shown in B lack, an d the interval between 25% and 75% interval is colored gray) F inall y, figure 3.3 illustrates th e evolution of the total number of gold, si lver and bronze badges. The data reveals pea ks i n the number o f badges earned. I did not find th e exact explanation for the p eaks fro m the plat form change of the thresholds perspective, but, as th e peak is not far from the be ginning of the s ample, it might be r elevant to the seasonal summe r p eriod when th e programmers have more free time to contribute. Another seasonal peak takes plac e in September, an occu rrence which aga in mi ght refl ect anoth er demand shoc k because G oogle shows the same type of StackOverflow search tr end peak in both Jul y and September. There is also a periodic structure in the evolutio n of the badges. The programm ing nature of the questio ns explains this cyclical pattern. Generally, the users on StackOv erflow are more activ e during workda ys than during weekends. 133 Figure 3.3. Evolution of Total number of Gold, Silver, and Bronze Badges Granted to Users in the Sample Before discussing the model, it mi ght be worthwhile to no te that, t o preserve the data of 11,000,000 contribution choices, with approximately 42,000 heterogeneity parameters, the old tabular data str ucture is n ot a v iable option for t he commodit y computin g devices. Therefore, I used a b ig data tool, called Apache Spar k, which is bu ilt on t he H adoop map-reduce structure for 134 data cle aning. The map-reduce structure simply creates a d ata flow of map and reduce operations. Map operati ons create assi gnment outcomes for each of the data points in par allel, across multiple machines (i.e. like a function with each data point). The reduce operati on aggregates the outcomes of map op erations, based on the group flags in pa rallel, acros s multiple machines (i.e. li ke a ggregate functions). The dat a structure of m ap-reduce consists of pairs of key (for groupin g purpos es) and v alue ( the actual data). I used the same structural of key-value pairs o f map -reduce model, rather than the tabu lar structure . In particular, I kept each of the variables in a separate fi le with a key for user and time indices on separate li nes. Furthermore, I used a sparse m atrix structure to reduce the size o f the bad ges’ ex planatory variables data. Next, I present my proposed model. 3.5. MODEL I start this section with ex plaining the choices o f th e Gamification p latform. In particular the Gamification ele ments that I considere d include: fun element, bad ges, leaderboard, and reputation points. For ex ample, a Gamification platfor m might work on the positi ve environment of social interaction between content producers an d consumers, b y putting emphasis on d ifferent contents (e.g., eas y vs. hard, p olite vs. not polite questions), to make the en g agement more fun. It can als o manipulate the threshold of earning bad ges, to make earning badges h arder or si mpler. In addition, a Gamificati on platform can send empowering mess ages to u sers whose rank fall on the leaderboard. To find the effect o f each o f th ese policies, the Gamification platform should measure the response of the users to the Gamification incentives. 135 In the context of this study , the choice of users to create content can be in the followin g forms: to post an answer, to revise, or t o comment on a question o r an answer. As a result , I consider ed the outcome o f the user choi ce positi ve if the user makes an y of t hese ch oices, and negative if the user selects none. I recognize t hat based on thei r unob served state dependent ut ility, the users contribute co ntent to the platform b y answering questions and commenting and revisin g the answers. This state i ncludes information about the number of contributions that us ers have had and the number o f Gamification assets and r ecognitions (e.g., badges, leaderboard rank, and reputation poin ts) the y h ave earned recentl y and cum ulatively. Formally, I define users’ state- dependent utility of user i at day t in week w for contributing content, in the followin g f orm: it it i it i iw i iw i iw i iw i it i it i i it cbdg bdg rnk rnk rep crep rcv cont U ε γ γ γ γ γ γ γ γ α + + + ∆ + + + + + + = − − − − − − − − 1 8 1 7 1 6 1 5 1 4 1 3 1 2 1 1 (1) where i α denotes the fixed stimulation threshold parameter for user i for contributing content. 1 − it cont denotes total nu mber o f co ntents that us er i has contributed until day t . 1 − it rcv denotes total number of answers an d comments that user i has received until d ay t for t he questio n she has raised. 1 − iw crep denotes the cumulative number of reputation points u ser i has earned until week w . 1 − iw rep denotes the n umber o f reputation points us er i has earned at week 1 − w . 1 − iw rnk denotes the rank o f user i in l eader board published at the end of week 1 − w . 1 − ∆ iw rnk denotes the first order rank d ifference o f user i betwee n week 1 − w and w eek 2 − w . 1 − it bdg denotes the vector of number o f gold, si lver, and bronze badges user i has earned at day 1 − t . 1 − it cbdg denote s the vector of cumul ative nu mber of go ld, silver, and bronze b adge s us er i has 136 earned until day t . it ε denotes idiosyncratic unobserved utility erro r term for user i at d ay t . ) .. , ( 8 1 i i i i γ γ α = Λ denotes the vector of individual specific choice parameters to estimate. Assuming that a contributor has a random state de pendent utility, and th at t he distribution of the random error term is extreme value, a lo git funct ion can model the probability of observing a user contribution . As a r esult, the l ikelihood of a user’s multiple contributions i n a day, follow binomial distribution. Similarly, a mixed l ogit function can model the choice of multiple users with heterogeneous choi ce para meters withi n the population. Next I ex plain t he r ationale behi nd choosing the va riable t hat might explain t he observed states of the users, in terms of the Gamification components, inertia, and reciprocity. The proposed utilit y model includes user fixed effect i α to capture users’ hetero geneo us optimal stimulation level, to represent that users require motivation to contribute content (Salcu and Actrinei 2013; Mittelstaedt 1976; Joachimsthaler and Lastovicka 1984; Ste enkamp and Baumgartner 1992). The total cumulative number of contributions 1 − it cont ac t s as prox y for the fun that a user experiences. As a result, a lag cum ulative number of contributions might be a state variable to capture the effect of t he fun elements of the Gamification platform. Furthermore, t he nu mber of content received (i.e. answer to the posted question) acts as the p roxy for t he social utilit y o f t he user. As a r esult, I included the lagged total nu mber of asked questions, answers r eviewed and answers acc epted b y a user 1 − it rcv as a p roxy fo r the users’ reciprocity state. Another proxy for the s ocial ut ility 1 − iw crep of users to contribute content is the l evel o f reputation points, i.e. t he number of up-votes a us er has received for comment and answe rs (Bolton et al. 2 013; Bolton et 137 al. 2 004; Yo ganarasimhan 201 3; Lee and B ell 2013; Toubia and Stephen 2013). As t he reputation point mi ght have both in stant and long t erm e ffects, t he util ity of the user incorporates both the week ly lev el 1 − iw rep , and the cumulative l evel of user reputation 1 − iw crep (Wei et al. 2015; Li et al. 2015). Anoth er Gamification element that is proxy si gnal f or so cial st atus of u ser is the la gged leaderboard absolute ran k 1 − iw rnk and ran k change 1 − ∆ iw rnk . The l atter one might be relevant fo r potential endowment e ffect. In o ther words, an individ ual might be regretful for losing the last week rank or forgone social status. Last but n ot least, ba dges mi ght also affect users’ motivations to contribute content, for both intrinsic (empowe rment effect), or extrinsic (social s tatus function) moti vations (e.g., Antin and Churchill 2011; W ei et al. 2015; Li et al. 2015). The total number of bad ges earned at each bad ge category (i.e. Gold, Silver, Br onze) 1 − it cbdg and t he number of b adges earned in previous d ays 1 − it bdg might both affect the choice of user to co ntribute as they sho w th e p rogress o f users in the Gamified environment to their social surroundin gs. In addition, l ike the eff ect of any marketing policy, short term and long ter m effect of e arning the badges might be different (Liu 2007;Jedidi et al. 1999, Mela et al. 19 97; Lewis 2004). As a result, consistent with Wei et al. (2015) and Li et al. (2015), m y consumer utili ty mod el includes both lagged cumulative 1 − it cbdg and instant number o f each of the badges 1 − it bdg . Figu re 3.4 shows box and arrow diagram o f the components of the state-dependent utility of users to contribute content. Table 3.9 sum marizes the definition of variable and parameters. 138 F igure 3.4. Box and Arrow Model of State Dependent Utilit y of a user to c ontribute Last but not le ast, to con trol for the unobse rved h eterogeneity in use rs’ responses to each of the Gamification elements, the model of the us ers’ state-dependent ut ilit y allows for flexible patterns of response, through a random coefficient model of users choices, by putting hierarchical Bayesian Dirichlet Process (DP) prior on i Λ . This approach allows the number of parameters to increase with the size of the sample, increasing learning as new data is observed. This app roach assu mes a mixture of multivariate normal distributions over the parameters’ prior, to allow for thic k tail skewed multimodal distribution. I accommodated user h eterogeneity by assuming that i Λ i s drawn from a distribution com mon across users, in two stages. I e mplo ye d a mixture of normal as the first stage prior, to specif y an informative prior that also do es not 139 overfit. The first st age consists of a mixture of K multivariat e norm al distribut ion, and t he second stage consists of prior on the parameters of the mixture of normal densit y, formall y: b z z p k k k k i i K k k k k i i | } , { , ) , | ( }) , { , | ( 1 Σ Σ ∆ − Λ = Σ ∆ − Λ ∑ = µ π µ φ π µ π (2) where b denotes the h yper-parameter for the priors on the mixi ng probabilities and the parameters governing each m ixture component. K denotes the numb er of mixture com ponents. } , { k k Σ µ denotes mean and covariance matrix of the di stribution of individual specific parameter vector i Λ for mixture component k. k π denotes the size of th e th k ' component of mixture model, and φ denotes the normal densit y function distribution. i z denotes information set about user i, which here can include indicator of publishing personal website on the profil e, i ndicator of stated nationality from USA , UK, Australia, Ind ia, Europe, Asia, South Am erica, C hina, Middle east, Tenure (the number of days since r egistra tion on the Gamification platform), Se en (the nu mber of days since last date that user logged in), n umber of profile views from internet browsers, total number of r eputation points and bad g es accumulated, number of answered and asked questions, and the total number of internet brow ser reached by contributing to the platform, until the start of samp le period. ∆ denotes the parameter of correla tion betwee n cho ice r esponse p arameter and information set about user i . To obtain a truly non-parametric estimate using t he mixture of normal model it is required that the number of mixture components K increase with the sample size. I adopted the approach proposed by Rossi (2014), called non-parame t ric Bayesian approach. This approac h is equivalent to the approach mentioned above when K tends to infinity . In this structure, the parameters of mixture normal model have Dirichle t Process (DP) prior. Dirichlet process is the gene ralization 140 of Dirichlet distribution for infinite ato mic number o f partitio ns. This process represents t he distribution o f a random measure (i.e. prob abilit y ). Dirichl et process has tw o parameters, the fi rst is the base dis tribution, which is the prior distribution on the parameters of t he multivariate Normal-Inverse Wishart (N-IW) conjugate prior distributio n for the distributio n for the partitions that the ch oice parameters are drawn f rom, and the second parameter is the concentration parameter. Formally, the p rior for the individual specific choice par ameters has the following structure: power d k k k k d k k k Unif d U nif z z d a a Unif a a I IW a N G G DP )) /( ) ( 1 ( ~ ) , ( ~ ), , 1 ( ~ ), exp( 1 ~ ), , ( ~ : ) , , ( ) , ( ~ ), , 0 ( ~ | : ) ( 0 ) ), ( 0 ( ~ ) , ( 1 1 1 1 1 1 1 1 α α α α α υ υ υ ν ν ν υ ν λ υ ν ν µ λ α λ µ θ r s r s r s r s r − − − + − + − × × Σ Σ Σ Σ = − (3) where ) ( 0 λ G denotes the base distributi on or measure (i. e. the d istribution of hyper-parameters of the prior di stribution of the partitions). λ denotes th e random measure, which represents the probability distr ibution of ) , , ( υ ν a . denotes the hype r-parameters of the prior distribution of the partitions that the choice p aramet ers belong to, which represent the behavior parameters of the latent segmen ts. d denotes th e n umber of choic e parameters per us er (in m y case d is equal to 15). d α denotes the concentration (also refe rred to as pre cision, tightness, or innovation) parameter. T he idea is that DP is centered over t he base measure ) ( 0 λ G with N-IW with precision parameter d α (larger value denotes ti ght distribution). ) , , , , , ( υ υ ν ν s r s r s r a a denotes the hyper parameters vector for the second l evel prior o n h yper parameters of prior ov er the partitions distribution of the c hoice parameters. ) , , ( υ ν a 141 Dirichlet Process Mixtu re (DPM) is referred to th e distribution over the probability m easure defined on some sigma-algebra (collection of subsets) of space ℵ , such t hat the distrib ution for any finite pa rtition o f ℵ is Dirichlet distri bution (Rossi 2014). In m y case, the probability measure ov er the partitions for mean and variance of random co efficient response parameters o f individual choice parameters si gma-algebra has the Normal-Inverse-Wishart con jugate probability. For any subset of users U of ℵ : 1 )) ( 1 )( ( )) ( ( ) ( )] ( [ 0 0 0 + Λ − Λ = Λ = d G G U G Var G U G E α λ λ λ λ λ (4) By De Finetti theorem, i nt egrating ( marginalizing) out the random measure G results in the joint distribution for the collection of user specific mean and covariance of ran dom coef ficient choice parameters as follows: dG G p G p p ) ( ) | , ( ) , ( . . . . Σ = Σ ∫ µ µ (5) This join t distri bution c an be represented as a sequence of conditional distr ibutions that has exchangeability property: )) , ( ),..., , ( | ) , (( ))... , ( | ) , (( )) , (( )) , ( ),..., , (( 1 . 1 1 1 1 1 2 2 1 . 1 1 1 − − Σ Σ Σ Σ Σ Σ = Σ Σ n n I I I I p p p p µ µ µ µ µ µ µ µ (6) The DP process is si milar in nature to Chinese Restaurant Process (CRP) and Polya Urn. In t he CRP, t here is a restaurant with infinite n umber of tables (analogous to p artitions of mean and variance of the individual choice random coeffici ents). A user entering the restaurant selects t he tables randoml y, but he selects t he t able with probability proportional to the number of users that have sat o n the table so far (in which case the u ser behaves similar to the other u sers who a re sitting at the selected table). If the user selects a n ew tab le, he wi ll behav e based on a parameter 142 that h e randoml y s elects from r estaurant user behavior parameters (so not necessarily identical to the parameters of the other t ables). The P ol ya U rn process has also th e same structure. I n this process, the exp erimenter starts b y drawing balls (represen ting the parameter of response for each user ) with different colors from the urn. Any ti me t he ex perimenter has a ball w ith a given color drawn from the urn, he w ill add an additio nal ball with the same col or to the u rn, and he also returns the drawn b all. The distribution of number o f custo mers sitting at each t able in CR P and number of balls in each color in Polya Urn follow DP. Table 3.9. Utility model Variables Definition Variable Description State Dependen t Utility( it U ) State dependent utility of user i at day t in week w Ind i vidual Specific Fixed Effect ) ( i α Fixed effect, or fixed optimal threshold level of user i Contribution Sta te ( 1 − it cont ) Total contribution level of u ser i , up until the current contribution point i n day t , demeaned and then n o rmalized by one hundred Reciprocity State ( 1 − it rcv ) Total number of contributi on received ( answers r eceived for her question) by use r i , up until day t , divided by a hundred Reputation State ( 1 − iw crep ) Total number of reputa tion poin ts received by user i , up until week w Weekly Re putation ( 1 − iw rep ) Total number of reputation po int received b y user i , at the previous week (i.e. week 1 − w ) Leaderboard rank ( 1 − iw rnk ) Rank of user i , in the leaderboard at prev ious w eek (i.e. week 1 − w ) Leaderboard rank change ( 1 − ∆ iw rnk ) First order rank difference for user i ’s in t he leaderboard from the other week to the previou s week (i.e. week 2 − w to week 1 − w ) Instant Badge ca tegory ( 1 − it bdg ) A vector of number of gold, silver, and bronze badges user i ear ned at the previous day (i.e. day 1 − t ) Cumulative Badg e Category ( 1 − it cbdg ) A vector of total cumulative number of gold, sil ver, and bronze badges user i earned until the previous day (i.e. da y 1 − t ) 8 1 .. i i γ γ User i specific parameters of state dependent utili ty of user i it ε User i and day t specific typ e one extreme value err or An alternative wa y is th e approach p roposed by Dube et al. (2010) to fit models with successive ly large nu mbers of com ponents and to g auge the adequacy of the number of 143 components b y examining the fit ted density associated wi th the s elected number of components. However, the process of model selection is tedious for big data sets in this case. To sum up, in this section I modeled the s tate dep endent utilit y of users t o contribute cont ent to the g amified platfor m. I control for potential self-selection and unobse rved heterogeneit y o f users by defining Dirichlet Process prior on the mixed log it choice model p arameters. I also controlled for po tential reciprocity and inertia (potential fun ) b y incl uding the number o f contributions sent and received by user until a given choice occasion. 3.6. ESTIMATION In ord er to identify the choice model, I used a random coefficient (mixed) binar y lo git specification, which has a fix ed scale. To set the l ocation of t he util ity, I normalized t he util ity of no contributio n option to zero. To minimize the concerns about s elf-selection, I use different fixed ef fect (stimul ation level t o con tribute content) for different users. To minimize c oncerns about endogeneity (omitted variable), I control for potential correlations between choic es o f various users and u nobserved het erogeneity by i ncorporating multi-modal mixture normal prior on the users’ choices parameters (in a form of D P p rior). I also control for potential confounding effects of inertia and reciprocit y b y incorporating t he number of send and received contributions. In addition, by rando m coefficient structure, t he modeling approach also minimizes the concern for Independence from Irrelevant Alternatives ( IIA), as it allows for heterogeneit y in the individual specific choice behavior parameters. Estimation of the proposed model over a big data set consisting of app roximately 11,000,000 million choices o f approx imately 37,000 us ers involves various computational and stati stical 144 issues, including over-fitti ng and computational tractability. First, the l arge nu mber o f parameters may cause over-fitting the sa mple, and this o ver-fitting ma y reduce g eneralizability of results. Bayesian shrin kage with flexible DP prior helps to identify the large set of indi vidual specific p arameters, wit hout over-fitting. Second, optimization approaches that use su m o f gradient, like the Newton Ralf phson and batch gradient-descent methods, ar e exp ensive over th is kind of eno rmous data-set like the one used in this study . To deal with the sam e type o f computational tractability issue in estimati ng a logit model, Goldfarb and Tucker (2011) resort to a linear probability model for a dat a set of onl y 2.5 million observations, with a much lower number of parameters. An alternative approach is to sample a s ubset of d ata and estimate the parameters. However, throwing away da ta by taking small sample m ight not be a relevant approach for targeting users. To avoid the sample s election issue and show the effect of sample s ize, I take random samples of 1K, 5K, and 10 K users from approximately 37 K us ers by str atified sa mpling from strata that are generated from k-mean, mixtu re n ormal fu zz y clu stering, and Latent Dirichlet Allocation clustering. In a ddition, I separate c ross sectional variable of i nformation set about u sers before the sample p eriod into fixed (e. g ., n ationality, w ebpage declaration) and dy namic (number o f question, answers, b adges, and reputatio n points) items. I estimated both the models th at incorporate only fix ed information set and comp lete information set (both fixed and dynamic variables) at hierarchy level. The mixture normal distribution is subject to label switching problem (i.e. the permutation of segment assign ment returns the s ame li kelihood). However, I i mmunized myself to this problem by limiting my inference to the joint distribution rather than user segment assignment. To 145 estimate the content contribution choice model, I used multinomial logit with DP p rior on the user specific h yper-parameter (Ba y esian semi-parametric) esti mation code from Ba yesm package in R. This method uses Metropolis-Hasting R andom-Walk (MH-RW) method to estimate conditional choice probabil ities on cross-sectional units (i.e. users). The limitation o f MH-RW is that random walk i ncrements sh all be tuned to conform as closely as possible to the curvature in the individual specific conditional posterior, formally defined by: ) , , , | ( ) | ( ) , , , , | ( ∆ Σ Λ Λ ∝ ∆ Σ Λ i i i i i i i z p y p z y p µ µ (7) Without prior information on highl y probable values of first stage prior (i.e. .) | ( i p Λ ), tuning the Metropolis chains given l imited information o f cross-sections (i.e. each user) b y tri al is difficult. Therefore, to av oid si ngular hessi an, the fractional l ikelihood approach p ro posed by Ross i et al. (2005) is implemented i n the used approach. Formally rather than using i ndividual specific likelihood, MH-RW approach fo rms a f ractional combination of the unit-l evel l ikelihood and t he pooled likelihood as follows: ( ) ∑ ∏ = = − = = Λ Λ = Λ I i i i w I i i i i w i i i i n N N n y l l l 1 1 ) 1 ( , , ) | ( ) ( ) ( * β β (8) where w denotes the small tuning parameter to control the effect o f pooled likelihood ∏ = Λ I i i i i y l 1 ) | ( . β denotes a parameter chosen to properly scale t he pooled likelihood to the same o rder as the unit likelihood. i n denotes the num ber of observations f or user i . Using this approach , the M H-R W generates sa mples c onditional on the p artition membership indicator fo r user i from proposal density ) , 0 ( 2 Ω s N , so that: i i i i l H V H Λ = Λ − − Λ Λ ∂ Λ ∂ ∂ − = + = Ω ) | ' * log , ) ( 2 1 1 (9) 146 where i Λ ˆ denotes the max imum of the modified l ikelihood ) ( * i i l Λ , and Λ V denotes norm al covariance matrix assigned to the partition (i.e. segment) that customer i belongs to. This approach con siders that . i Λ is sufficient to model the rando m coefficient distribution. To estimate the infinite mix ture of normal prior fo r c hoice p arameters, a stand ard data augmentation with the indicator of th e normal co mponent is required. Co nditional on th is ind icator, I can identify a nor mal prior for each customer i parameters. The distribution for this i ndicator is Multinomial, which is conju g ate to Dirichlet distributio n, formally: ) ( ~ | ) ( ~ π π α π Nom Mult z Dirichlet i d − (10) As a result posterior can be defined by: )) ( ),..., ( ( ~ | ) ,..., ( ~ 1 1 1 i K k i i j j K j j i z z Dirichlet z Nom Mult z δ α δ α π α α α α + + − ∑ ∑ (11) where ) ( i j z δ denotes indicators for whether or not j z i = . This result is relevant fo r D P as any finite subset of user choi ce-behavior parameters’ partitions has Dirichlet d istributi on, and finite sample can onl y represent finite number of partiti ons. Exchan geability propert y of partitions allows the used esti mation approach t o seque ntially d raw customer param eters g iven the indicator value as follows: 1 ~ ) , ( ),. .., , ( | ) , ( 1 1 ) , ( 0 1 1 1 1 − + + Σ Σ Σ ∑ − = Σ − − i G i j d i i i i j j α δ α µ µ µ µ (12) The next portion of this approach’s sp ecification is the definition of the size of the finite clu sters over the finite sample that is controlled by π . Rossi (2014) s uggests augmenting Sethuraman’s 147 stick br eaking notion for draws of π . In this notion, a unit level stic k is iterativ ely brok en fro m the tail with proportion to the draws with beta distribution with p arameter one and d α , and the length of the broken portion defines th e th k ' element o f the probability measure vector π (a form of multiplicative proc ess), formally : ) , 1 ( ~ ), 1 ( 1 1 ∏ − = − = k i d k i k k Beta α β β β π (13) In this notion, d α determine s t he probability distribution of t he n umber of unique values for t he DP mixture m odel, formally by: ( ) 1 ) ( ) ( )) ln( ( ) ( ) ( , ) ( ) ( ) * Pr( − + Γ Γ = + Γ Γ = = k k i d d k d k i i k i S i S k I γ α α α (14) where * I denotes the number of unique values of ) , ( Σ µ in a sequence of i draws from the DP prior. ) ( k i S denotes Sterling number of first kind, and γ denote s Euler’s constant. F urthermore, to facilitate assessmen t, t his approach su ggests the following distribution for d α , rather than Gamma distr ibution: φ α α α α α ) 1 ( s r s − − − ∝ d d (15) where α r and α s can be assessed by inspecting the mode of d I α | * . φ denote s the tunable power parame ter to spread prior mass appropriately. An alternative to Gibbs sampler employed b y this approach might be collapsed Gibbs sampler that inte g rates out the i ndicator variable for partition (segment) membership of each user, but Rossi (2014) argues that such an approach does no t improv e the estimation proc edure. Appendix 3 .A p resents the series of conditional distribution that t his approac h emplo y s in its Gibbs sampling to recover individual specific choice 148 parameters. I n su mmary, I us ed MCMC sampling to estimate the mixture normal multinomial logit model of the content contribution c hoices of sa mples of 1 K, 5 K, an d 10K users over two hundred thirty seven days, in a Gamification environment. 3.7. RESULTS AND MANAGERIAL IMPL ICATIONS I begin this section by discussing the importance of big data. Table 3.10 presents the log likelihood of twelve mo dels I have tested. Amo ng samples with 1 K users, stratified random sample from Latent Dirichlet Allocation clusterin g has a b etter fit. In addi tion, the models that use both fixed and d ynamic information set of users at hierarchical l evel explains u sers’ choice better. However, estimatin g the model ov er sample with 5 K size suggests th at potentially the LDA stratified s ample mi ght not have represented th e population because the log likelihood d oes not increase propo rtionall y. In addition, est imate of t he model that uses whole information set about user at hierarchical level over the sample wi th 10K size returns relativel y b etter likelihood than the same model es timated over sa mple with 5K size. I announce this model dominant, because it uses more information and returns a better likelihood relative to the model estimated over its adjacent sample size (i.e. sample with 5K size). Table 3.11 presents the di stributions of the parameter estimates for the in dividual content contribution choice model that explains choice parameter with whole information set of users estimated o ver a s ample wi th 10K random users. These distributions are vis ualized in fi gure 3 .5. Although I used a flexible mixture normal model, yet the p arameter of response has a normal bell shape. 149 Table 3.10. MODEL COMPARISON Model Descript ion Number of obs. Log Lik. 1 Uni form Sample Choice expla ined by all variables 23 7,000 -65,3 79.08 2 LDA stratified Sample Cho ice Explained by all variables 237,000 -61, 868.84 3 K - mean stratified Sample Choice E xplained by all variables 237,000 -62, 554.44 4 Mixture N ormal stratified Sample Cho ice Explained by all variables 237,000 -65, 164.39 5 Mixture Normal st ratified Sample Choice Explained b y Static HB variables 237,000 -66, 374.15 6 Uni form Sample Choice expla ined by Static HB varia bles 237,00 0 -65,943.30 7 LDA stratified Sample Choice Explained by Static HB variables 237,000 -65, 028.38 8 K-mean stratified Samp le Choice Explained by static HB variables 237,000 -63, 548.60 9 Sample of 5K explained by static HB variables 1,185,000 -327,701.60 10 Sample of 5K explained by all variables 1,185,000 -327,765.30 11 Sample of 10K explained by static HB variables 2,370 ,000 -656,838.80 12 Sample of 10K explained by all variables* 2,370,000 -653,301.00 * Dominant model The distribution of the parameter est imates for this an d the other model ov er s amples with sizes of 1K, 5 K, and 1 0K is presented in appendix 3.B. Comparison o f these estimates suggest that model that uses only fixed information set of users at hierarchical level overestimates fi xed effect (or stimulation level), a nd it underestimates the effect of leaderboard and badges (long term effect of silver and bronze badges) el ements. In addition, estimating the s ame model over a sample with 5 K random users results in underestimation of fixed effect, i nertia, ran k, and bad ges (except gold), and it o verestimates the effect of reputation points and reciprocit y. These r esults highlight the importance of emplo ying a bigger d ata set to get a bet ter estimate of Gamification elements. 150 F igure 3.5. HISTOG RAM OF PARAMETER ESTIMATES: Individual Cho ice parameters As hierarchical Ba yesian method allows recove ring individual specific parameters, I can use individual specific parameters to r ecover significance of p arameters. In fact this significance information i s usef ul for th e G amification pl atform to target its users. Table 3.15 p resents th e statistics of s ignificance of parameters across the pop ulation. It i s interestin g t o note th at individual l evel fix ed ef fect, which I interpret as the stimuli level required to contribute content, is positive significant ac ross 2% and negative si gnificant across 2% o f all t he users. This finding 151 suggests that, when user s contribute content, either an intrins ic (for 2% of users) or an extrinsic motivation (for 98% of users) exists. Table 3.11. PARAMETER ESTIMATES: Individual Content Contribution Choice explained by whole information set (Sample with 10K explained) Estimate Std. Dev. 2.5 th 97.5 th Fixed Effect -0.039 0.301 -0.589 0.599 States: Previous contribu tion -0.004 0.070 -0.099 0.105 Reciprocity (contribution received) -0.091 0.411 -0.528 0.229 Leader Board: Cum Reputat ion 0.108 0.184 -0.320 0.435 Reputation -0.500 0.689 -1.636 1.051 Rank -0.012 0.347 -0.795 0.726 Rank Change 0.000 0.001 -0.002 0.002 Badges Gold Badge 0.079 0.488 -0.780 0.965 Silver Bade -0.105 0.324 -0.678 0.414 Bronze Badge 0.011 0.405 -0.764 0.738 Cum Gold Badg e -0.014 0.228 -0.399 0.359 Cum Silver Badg e -0.006 0.188 -0.295 0.297 Cum Bronze Badg e -0. 004 0.134 -0.224 0.237 The nu mber of contents contributed has si gnificant posi tive e ffects f or 8% o f users and significant negative effects for 9% of users on their probabilit y of contributin g. This result is also relevant for th e Gamification platforms targetin g. The Ga mification platform can investigate the journey of customers w ho s how i nertia (positi ve effect of p revious contribution) and t ry to generate t he similar journe y fo r those who s how resist ance (n egative effect of previous contribution), through its messaging policy. In additio n, the Gamification p latform can also send customized positive messages to th e users with i nertia, to keep the m as loyal custo mers. It can also send p romotional incentivizing messages to those who show resistance, to motivate them to 152 contribute more (similar t o the promotions that ar e sent to churned customers). In fact, given the domain knowledge of Gamification p ractitioners, the message for the us ers with resistance shall emphasize the fun aspect of answering other users’ questions (B rittner and Shipper 2014; Wu 2012; Deterding 2012). Table 3.12. PARAMETER ESTIMATES: Individual Choice effect significance Positive Significant Negative Significant % positive % Negative Fixed Effect 171 239 2% 2% States: Previous contribu tion 819 949 8% 9% Reciprocity (contribution received) 450 1228 5% 12% Leader Board: Cum Reputat ion 1041 56 4 10% 6% Reputation 273 673 3% 7% Rank 619 716 6% 7% Rank Change 1349 1444 13% 1 4% Badges Gold Badge 258 147 3% 1% Silver Bade 76 127 1% 1% Bronze Badge 263 184 3% 2% Cum Gold Badg e 296 320 3% 3% Cum Silver Badg e 842 930 8% 9% Cum Bronze Badg e 1037 906 10% 9% The effect of contribution received ( or reciprocity) is posi tive significant for 5 % of u sers and negative si gnificant for 12% of users. In other words, when th e users’ q uestions are answered more often than others, 5% of users are more li kely and 12% of them are less li kely to answer the community m embers’ q uestions. The reciprocity result f or 5% of users is consistent wit h the result of studies in information s ystem research of the knowledge market and in the economics of impure altruism that e mphasize the importance of re ciprocity in users’ decisions (Ruben 2009; Chen et al. 2010; Chiu and Wang 2009; Bolton et al. 2013; Andreoni 1990; Cornes and Sandler 153 1994). However, the negative response to re ceivin g answer b y 12% o f users might be ex plained by u sers’ shift of focus on their daily life as opposed to participating on the G amification platform. This finding su ggests that the Gamificati on platform o wner can employ a prioritizing strategy. S u ch a prioritizing strategy ca n p ut higher priority on the qu estions of u sers w ho contribute more when the community answers their questions. The effe ct o f w eekly reputation (instant) is positive significant for 3% and negative significan t for 7 % of users, and the effect of cum ulative reputation i s positive significant for 10% of users and negative significant for 7% of users. T his result is relevant for the tar geting, and I explain the relevance later in this section, but first I explain wh y the effect c an be positive and negative for different users. Th e negative ef fect ma y be explained b y moral lic ensing ( Wei et al. 2015 ), or reversion to the mean. Moral licensing refers to the p rocess b y which a user reduces h er pro- social activity after bein g nominated as pro-social. The moral licensing might be more relevant here, because none of the s tudies by practiti oners and academia has y et defended the mean reversion of users in th e Gamification contex t. The positive effect of reputation might be explained by t he e mpowerment effect o f Gamification that social ps yc holo gists e mphasize (Wu 2012). I n other w ords, the reputation point s might act as a sig nal to t he user to rec ognize the potential of helping others. The e ffect of rank is positive significant for 6% a nd negative s ignificant for 7% of users, and t he effect of rank change, second order la gged effect, is positive significant fo r 13% of users, and negative significant for 1 4% o f users. R ecognizing these two d ifferent ef fects is useful for targeting in the co ntext o f the G amification platform. T he ne gative second order l agged effect resembles the mean reversion behavior. Thi s behavior might be relevant to the anchorage e ffect 154 of the rank on the leaderboard. The p ositive second order lagged effect resembles inertia. In other words, wh en f alling in the leaderboard the user gives up and when risin g in the leaderboard, the user wo rks harde r. Althou gh the Ga mification platform may not have control over the mean reversion of the user, it ma y be able to affect the users’ negative inertia (i.e. giving up when falling in the leaderboard) by positive empowering messages. The results for the e ffects of the gold, silv er, and bronze badge cat egories might also b e of interest of the Gamification pl atform, because it can modify th e badges’ requirements (threshold of p oints t o ea rn badges) to motivate users. Instant effects of earning Gol d badges are positi ve significant for 3% and n egative significant for 1% of users. In addition, the long term effects of earning Gold badges are positi ve si gnificant for 3% of users and negative significant for 3% of users. Instant effects of earning Silver badges are positive significant for 1% a nd negative significant for 1% of users. In add ition, the long term effec t s of earning S ilver b adges a re positive significant for 8% of users and ne gative significant for 9% of us ers. Instant effects of earning Br onze badges are pos itive significant for 3% and negative significant for 2% of users. In addit ion, the long ter m effects o f earning Bro nze badges are positi ve significant for 10% of users and negative significant for 9% of users. A gain, these results are u seful for targeting as I explain later in this section. Potential explanations for observing both positive and negative effects of b adges across segments are simil ar to the exp lanations for observing both positiv e and negative effects of the nu mber o f reputation p oints (i.e. moral licensin g vs. empowerment). However, the means of long t erm effects of bad ges across population is negative. These ne gative long term e ffects can be explained by the goal sett ing aspect of badges in a Gamification settin g. In other words, users 155 might have set th e goal t o win badge s as hallmark o f Gamification, so as they earn these badges, they reduce their content contribution. Given t hese negative effects, the fact that badges are once-in-a-life- time effect m ight s uggest t he Gamification pla tform a higher point threshold requirement to grant the badges. The counterfactual section quantifies the effect o f such p olic y at aggregate level. To discuss the targeting aspect of thes e results, table 3.13 p resents t he hierarchical parameters’ estimates. These r esults su ggest that certain n ationalities are more likel y to be sensitive to certain aspects of Gamification platform. F or the s ake of brevit y, I onl y rev iew more interesti ng patterns. First, American users show more inertia in contribution. They reduce their contribution level, if the y hav e more reputation, b ut increase i t, if the y earn sil ver bad ges. Second, European users increase th eir effort, if they h ave more reputation, but English users decrease their contribution if they earn gold badges. Third, sout h Americans are more motivated to contribute to StackOverflow. Fourth, Asian users are more reciprocal. The y inc rease their cont ribution when the y earn Silver bad ges, but decrease it when they earn Gold badges. Fifth, Middle Eastern users are also more reciprocal. These patterns might be relevant fo r targetin g because, if certain n ationalit y responds positi vely for example to g old ba dges in long ter m, i t might be re levant to guide the user s with this nationality to earn g old badges easier. How ever, if users of ce rtain nationality decrease their content cont ribution when the y earn go ld badges, then it mi ght be relevant to send messages to these users t hat confuses them, making earning go ld badges difficult. Si milar approaches can be designed conditionin g o n response to silver and bronze badges, and reputation p oints. In addition, if the platform finds users o f certain nationalit y more reciprocal, it can put answeri ng 156 the questions of these users top priority for other u sers. All these f indings can better guide the Gamification platform toward increasing their users’ content contributions. Table 3.13. PARAMETER ESTIMATES: Individual Choice Hierarchical Model Estimate Std. Dev. 2. 5 th 97.5 th Fixed Effect website -0.043 0.072 -0.140 0.137 USA -0.002 0.003 -0.008 0.003 UK -0.009 0.010 -0.027 0.010 Australia -0.014 0.011 -0.036 0.008 India -0.083 0.063 -0.201 0.048 Europe -0.005 0.021 -0.044 0.035 Asia 0.000 0.000 -0.001 0.001 South America 0.186* 0.051 0.082 0.300 China 0.134 0.107 -0.043 0.336 Middle East -0.044 0.085 -0.200 0.101 Tenure -0.020 0.021 -0.060 0.021 Seen 0.011 0.008 -0.003 0.027 Profile View s -0.002 0.006 -0.013 0.009 Reputation 0.351* 0.056 0.249 0.460 Gold Badges 0.004 0.004 -0.003 0.012 Silver Badges -0.015 0.012 -0.038 0.009 Bronze Badges 0.016 0.013 -0.009 0.042 Number of Answe rs -0 .016 0.084 -0.167 0.140 Number of Question s -0.025 0.026 -0.074 0.030 Reach 0.000 0.001 -0.001 0.001 States: Previous contribu tion website 0.003 0.134 -0.203 0.260 USA 0.167* 0.093 0.007 0.365 UK -0.079 0.088 -0.245 0.104 Australia -0.024 0.025 -0.071 0.021 India -0.010 0.008 -0.026 0.008 Europe 0.000 0.007 -0.013 0.013 Asia -0.106 0.079 -0.284 0.016 South Amer ica -0.005 0.006 -0.017 0.005 China 0.011 0.019 -0.024 0.052 Middle East 0.003 0.018 -0.032 0.037 Tenure 0.039 0.119 -0.188 0.291 Seen 0.011 0.042 -0.071 0.093 Profile Views 0.000 0.001 -0.002 0.002 Reputation -0.003 0.150 -0.304 0.243 Gold Badges 0.091 0.197 -0.337 0.500 157 Silver Badges -0.090 0.148 -0.390 0.193 Bronze Badg es 0. 048 0.039 -0.036 0.123 Number of Answ ers -0.009 0.013 -0.034 0.016 Number of Quest ions -0.003 0.010 -0.022 0.017 Reach 0.397* 0.183 0.029 0.708 Reciprocity (contributio n received) website -0.016* 0.007 -0.030 - 0.001 USA 0.004 0.026 -0.046 0.057 UK 0.009 0.026 -0.045 0.059 Australia -0.221 0.199 -0.648 0.104 India -0.048 0.062 -0.166 0.072 Europe 0.000 0.001 -0.002 0.002 Asia 0.554* 0.200 0.248 0.961 South Amer ica -0.118 0.228 -0.499 0.368 China 0.046 0.195 -0.283 0.447 Middle East 0.107* 0.054 0.005 0.210 Tenure -0.009 0.016 -0.041 0.024 Seen 0.014 0.014 -0.012 0.039 Profile Views 0.149 0.132 -0.156 0.375 Reputation -0.004 0.006 -0.015 0.007 Gold Badges 0.029 0.020 -0.007 0.067 Silver Badges -0.004 0.019 -0.040 0.036 Bronze Badg es 0.272* 0.121 0.029 0.497 Number of Answ ers -0.033 0.044 -0.112 0.054 Number of Quest ions 0.000 0.001 -0.002 0.002 Reach - 0.337* 0.107 -0.512 - 0.082 Leader Board: Cum Reputation website -0.148 0.171 -0.455 0.149 USA -0.263* 0.117 -0.500 - 0.055 UK -0.031 0.041 -0.109 0.049 Australia 0.012 0.013 -0.014 0.038 India -0.001 0.009 -0.021 0.018 Europe 0.210* 0.058 0.101 0.307 Asia -0.002 0.004 -0.009 0.005 South Amer ica 0.010 0.012 -0.013 0.034 China 0.011 0.012 -0.011 0.035 Middle East 0.059 0.070 -0.077 0.197 Tenure -0.020 0.026 -0.069 0.032 Seen 0.000 0.001 -0.001 0.001 Profile Views 0.146 0.118 -0.062 0.331 Reputation -0.063 0.098 -0.237 0.117 Gold Badges -0.279* 0.077 -0.435 - 0.127 Silver Badges 0.026 0.030 -0.029 0.092 Bronze Badg es 0. 000 0.008 -0.016 0.017 158 Number of Answ ers 0.001 0.007 -0.014 0.014 Number of Quest ions 0.153 0.199 -0.262 0.588 Reach -0.012 0.011 -0.033 0.012 Reputation website -0.013 0.039 -0.091 0.062 USA -0.049 0.033 -0.113 0.014 UK 0.400 0.219 -0.052 0.776 Australia -0.103 0.082 -0.271 0.052 India 0.000 0.002 -0.003 0.004 Europe -0. 399 0.284 -0.920 0.090 Asia -0.401 0.247 -0.925 0.068 South Amer ica -0.219 0.230 -0.621 0.225 China 0.208* 0.079 0.069 0.376 Middle East 0.032 0.025 -0.017 0.080 Tenure -0.007 0.019 -0.048 0.028 Seen 0.484* 0.231 0.118 0.848 Profile Views -0.003 0.009 -0.021 0.016 Reputation -0.005 0.038 -0.072 0.073 Gold Badges 0.075 0.039 -0.004 0.150 Silver Badges 0.415* 0.196 0.023 0.764 Bronze Badg es 0. 029 0.075 -0.121 0.180 Number of Answ ers 0.000 0.002 -0.003 0.003 Number of Quest ions -1.080* 0.274 -1.500 - 0.596 Reach -0.184 0.248 -0.643 0.321 Rank website -0.255 0.211 -0.658 0.132 USA -0.013 0.066 -0.144 0.111 UK -0.005 0.023 -0.050 0.041 Australia 0.008 0.018 -0.027 0.0 43 India 0.470 0.457 -0.141 1.520 Europe 0.008 0.017 -0.024 0.041 Asia -0.017 0.056 -0.124 0.094 South Amer ica -0.044 0.058 -0.158 0.066 China 1.017* 0.283 0.487 1.578 Middle East -0.059 0.128 -0.301 0.179 Tenure 0.000 0.003 -0.005 0.006 Seen -0.691 0.711 -1.877 0.524 Profile Views -1.450* 0.498 -2.429 - 0.675 Reputation -0.125 0.371 -0.773 0.603 Gold Badges -0.162 0.149 -0.405 0.131 Silver Badges -0.057 0.040 -0.133 0.018 Bronze Badg es 0. 005 0.029 -0.050 0.061 Number of Answ ers 0.859* 0.189 0.381 1.097 Number of Quest ions 0.000 0.008 -0.015 0.014 Reach 0 .038 0.029 -0.018 0.090 159 Rank Change website 0.018 0.028 -0.037 0.070 USA 0.064 0.139 -0.187 0.349 UK -0.054 0.061 -0.168 0.074 Australia 0.000 0.001 -0.003 0.003 India -0.520* 0.272 -0.963 - 0.004 Europe 0.242 0.284 -0.449 0.649 Asia 0. 029 0.193 -0.361 0.397 South Amer ica 0.058 0.069 -0.094 0.162 China 0.009 0.018 -0.024 0.048 Middle East -0.009 0.015 -0.038 0.018 Tenure 0.000 0.000 0.000 0.000 Seen 0.000 0.000 0.000 0.000 Profile Views 0.000 0.000 0.000 0.000 Reputation 0.000 0.000 0.000 0.000 Gold Badges 0.000 0.000 0.000 0.000 Silver Badges 0.000 0.000 0.000 0.000 Bronze Badg es 0. 000 0.000 0.000 0.000 Number of Answ ers 0.000 0.000 0.000 0.000 Number of Quest ions 0.000 0.000 0.000 0.000 Reach 0 .000 0.000 0.000 0.000 Badges Gold Badge website 0.00000 0.00002 -0.00004 0.00005 USA 0.00001* 0.00001 0.00000 0.00003 UK 0.00000 0.00000 -0.00001 0.00001 Australia -0.00028 0.00007 -0.00042 - 0.00016 India 0.00000 0.00001 -0.00001 0.00001 Europe 0.00001 0.00002 -0.00003 0.00004 Asia 0.00003 0.00002 -0.00001 0.00006 South Amer ica -0.0004* 0.00012 -0.00063 - 0.00015 China 0.00000 0.00004 -0.00009 0.00008 Middle East 0 .00000 0.00000 0.00000 0.00000 Tenure 0.0008* 0.00018 0.00048 0.00112 Seen 0.00022 0.00015 -0.00010 0.00047 Profile Views 0.00008 0.00011 -0.00012 0.00030 Reputation -0.00001 0.00005 -0.00009 0.00009 Gold Badges 0.00002 0.00001 -0.00001 0.00005 Silver Badges 0.00001 0.00001 -0.00001 0.00002 Bronze Badg es 0 .00000 0.00001 -0.00001 0.00001 Number of Answ ers 0.00000 0.00000 0.00000 0.00000 Number of Quest ions 0.00000 0.00000 0.00000 0.00000 Reach 0.00000 0.00000 0.00000 0.00000 Silver Bade website 0.00000 0.00001 -0.00001 0.00002 160 USA 0 .00000 0.00000 -0.00001 0.00000 UK 0.00000 0.00000 0.00000 0.00000 Australia -0.00004* 0.00001 -0.00005 - 0.00002 India 0.00001 0.00001 -0.00001 0.00002 Europe -0.00002 0.00002 -0.00005 0.00001 Asia 0.00002* 0.00001 0.00001 0.00003 South Amer ica 0.00000 0.00000 0.00000 0.00000 China 0.00000 0.00000 0.00000 0.00000 Middle East -0.00001* 0.00000 -0.00001 0.00000 Tenure 0.00000 0.00000 0.00000 0.00000 Seen 0.00000 0.00000 0.00000 0.00000 Profile Views 0.00000 0.00000 0.00000 0.00000 Reputation 0.00000 0.00000 -0.00001 0.00000 Gold Badges 0.00000 0.00000 0.00000 0.00000 Silver Badges 0.00000 0.00000 0.00000 0.00000 Bronze Badg es 0 .00000 0.00000 0.00000 0.00001 Number of Answ ers 0.00000 0.00000 -0.00001 0.00000 Number of Quest ions 0.00000 0.00000 0.00000 0.00001 Reach 0.00000 0.00000 0.00000 0.00000 Bronze Badge website 0.00000 0.00000 0.00000 0.00000 USA 0 .00000 0.00000 0.00000 0.00000 UK -0.01716* 0.00725 -0.03297 - 0.00624 Australia 0.00050 0.00043 -0.00038 0.00127 India -0.00228 0.00139 -0.00501 0.00036 Europe 0.00058 0.00132 -0.00209 0.00303 Asia 0.00847 0.00809 -0.01185 0.02379 South Amer ica 0.00511 0.00327 -0.00076 0.01180 China 0.00000 0.00007 -0.00013 0.00013 Middle East 0 .01705 0.01298 -0.01084 0.03890 Tenure -0.03556* 0.00920 -0.05325 - 0.01680 Seen -0.02181* 0.01065 -0.04569 - 0.00412 Profile Views 0.00002 0.00352 -0.00689 0.00666 Reputation -0 .00215* 0.00105 -0.00430 - 0.00027 Gold Badges 0.00013 0.00072 -0.00129 0.00150 Silver Badges 0.00747 0.00362 0.00147 0.01536 Bronze Badg es 0 .00011 0.00013 -0.00014 0.00035 Number of Answ ers -0.00019 0.00044 -0.00105 0.00065 Number of Quest ions -0 .00109* 0.00048 -0.00207 - 0.00009 Reach 0.00532* 0.00233 0.00127 0.01005 Cum Gold Badg e website -0.000 36 0.00097 -0.00239 0.00152 USA 0 .00000 0.00002 -0.00005 0.00005 UK -0.01033* 0.00231 -0.01565 - 0.00638 Australia 0.00124 0.00241 -0.00301 0.00601 161 India -0.00685* 0.00322 -0.01287 - 0.00047 Europe 0.00146 0.00110 -0.00076 0.00348 Asia 0.00011 0.00033 -0.00052 0.00077 South Amer ica 0.00001 0.00024 -0.00046 0.00050 China -0.00640 * 0.00402 -0.01528 - 0.00074 Middle East -0.00011 0.00010 -0.00030 0.00010 Tenure 0.00034 0.00034 -0.00034 0.00100 Seen 0.00066 0.00034 -0.00007 0.00133 Profile Views -0.00568* 0.00216 - 0.00947 - 0.00058 Reputation 0.00064 0.00070 -0.00073 0.00201 Gold Badges 0.00000 0.00002 -0.00003 0.00004 Silver Badges 0.00758* 0.00139 0.00492 0.01065 Bronze Badg es 0 .00253 0.00222 -0.00187 0.00703 Number of Answ ers 0.00520 0.00313 -0.00062 0.01142 Number of Quest ions -0.0 0074 0.00072 -0.00207 0.00082 Reach 0.00011 0.00024 -0.00036 0.00055 Cum Silver Badge website 0.00001 0.00019 -0.00037 0.00038 USA 0.00055* 0.00018 0.00015 0.00076 UK 0.00001 0.00000 0.00000 0.00001 Australia -0.00002 0.00002 -0.00006 0.00001 India -0.00003 0.00002 -0.00006 0.00000 Europe 0.00001 0.00010 -0.00015 0.00022 Asia -0.00002 0.00003 -0.00008 0.00004 South Amer ica 0.00000 0.00000 0.00000 0.00000 China 0.00040* 0.00010 0.00019 0.00054 Middle East -0.00007 0.00012 -0.00025 0.00019 Tenure - 0.00007 0.00011 -0.00028 0.00014 Seen -0.00010* 0.00004 -0.00018 - 0.00001 Profile Views 0.00001 0.00001 -0.00001 0.00003 Reputation -0.00002 0.00001 - 0.00003 0.00000 Gold Badges 0.00229* 0.00125 0.00004 0.00403 Silver Badges 0.00000 0.00003 -0.00006 0.00006 Bronze Badg es 0 .00004 0.00008 -0.00012 0.00020 Number of Answ ers -0.00002 0.00010 -0.00021 0.00016 Number of Quest ions 0.00100 0.00060 -0.00023 0.00211 Reach -0.00040 0.00023 -0.00085 0.00003 Cum Bronze Badg e website 0.00000 0.00000 -0.00001 0.00001 USA -0.00252* 0.00074 -0.00377 - 0.00117 UK 0.00010 0.00128 -0.00171 0.00337 Australia -0.00007 0.00091 -0.00179 0.00139 India 0.00033 0.00024 -0.00010 0.00083 Europe 0.00000 0.00006 -0.00012 0.00011 Asia -0.00002 0.00005 -0.00010 0.00008 162 South Amer ica 0.00000 0.00000 0.00000 0.00000 China 0.00000 0.00000 0.00000 0.00000 Middle East 0 .00000 0.00000 0.00000 0.00000 Tenure 0.00000 0.00000 0.00000 0.00000 Seen 0.00000 0.00000 0.00000 0.00000 Profile Views 0.00000 0.00000 0.00000 0.00000 Reputation 0.00000 0.00000 0.00000 0.00000 Gold Badges -0.000000* 0.00000 0.00000 0.00000 Silver Badges 0.00000 0.00000 0.00000 0.00000 Bronze Badg es 0 .00000 0.00000 0.00000 0.00000 Number of Answ ers 0.00000 0.00000 0.00000 0.00000 Number of Quest ions 0.00000 0.00000 0.00000 0.00000 Reach 0.00000 0.00000 0.00000 0.00000 3.8. COUNTERFACTUAL ANALYSIS AND ITS MAN AGERIAL IMPLICATIONS An advantage of mod eling consum ers’ choices f rom t he uti lity primitiv e is the capabilit y to run counterfactuals. One of the choices of a Gam ification platform is to modify the t hreshold of earning b adges. Given th e hete rogeneous s hort term and long term effects of different badges , a priori it might not be cl ear how changing the thresholds wil l affect the expected number of content contributions at aggregate l evel. Therefore, given the e stimated p arameters, I si mulated the users’ response to perturbation in the number o f badges that the y receive. As a measur e, I used t he ex pected total number of contributions, which as t he su m o f t he predicted prob ability of the u sers’ choices, is an alogous to integrating t he probability o f choices across the population. In summary, to find the effect of each of th e count erfactual scenarios of modifying the badges, I modified the related b adge variable, an d given the othe r entire v ariable and the p arameters, I summed up the predicted choice probability of each of the users. Table 3.14 summarizes t he result of the counterfactual analysis of nine s cenarios: shutting down or five percent incr ease or decrease o f either silver and bron ze badges, gold badges, o r all t he badges. F irst, shuttin g d own the badges h as increased t he level of contribution by 3 % for t he 163 duration of experiment. This result su gge sts that long term effect of distribut ing badges without expir y can negatively affect the contribution level of us ers. In addition, while increasing the number of sil ver and bronze badges b y red ucing the thr esholds has negative impact on t he expected number of contributi ons, increasing the gold badges b y decreasing the threshold results has positive impact on t he expected numbe r o f contributions. Therefore, the platform might be better off to increase t he threshold for earning silver and bronze badges, but decrease the threshold for earning gold badg es. If the platform wants to e ither increase or decrease the threshold across all the badges, th en the co unterfactual anal ysis su ggests incr easing the threshold, s o t hat the platform extracts as many of users contributions as possible, before granting badges. Table 3.14. Counterfactual Anal y s is Result Cases Expected Number of Contributions Absolute Change Improvement Ratio Real Case 944,283 - - Counterfactual 5% Increase in Silver and Bronze Badg es 943,075 - 1,208 - 0.13% Counterfactual 5% Decrease in Silver and Bronze Badg es 945,557 1,274 0.13% Counterfactual 5% Increase in Gold B adges 945,226 942 0.10% Counterfactual 5% Decrease in Gold Badges 943,507 - 777 - 0.08% Counterfactual 5% Increase in All Badges 944,027 - 256 - 0.03% Counterfact ual 5% Decreas e in All Badges 944,819 536 0.06% shut down silver and bronze badges 979,687 35,404 3.75% shut down gold badg es 944,775 492 0.05% shut down all the b adges 975,344 31,061 3.29% 3.9. CONCLUSION In this paper, I developed a structural mod el that accounts for the e ffects of motivational factor of Ga mification ele ments such as Badges and leaderboard on users’ choi ce to contribute content. 164 To allow Gamification platforms to target their customers, I hi ghlight the import ance of controlling for user hetero geneity in the model using Hierarchical Dirichl et Process. First, usin g a large data set from Stack Overflow, I segment users’ profile by a method that ensembles clustering assignments o f LDA, mixture normal and k-mean methods. I showed het erogeneity in users’ behavior b y s egmenting us ers int o competi tors, collaborators, a chievers, e xplorers, a nd uninterested users. Then, by esti mating the model over a samp le of this data set, I showed that users’ responses to various Gamification elements are hetero geneous. I showed that small s ample size can ret urn bias parameters’ estimates. My results d emonstrate that us ers with certain nationalities are sensitive to certain Gamification elements. I further illustrated how the estimated model can be used to analyze a coun terfactual scenario for Gamification platform’s b adge threshold modific ations. This counter factual analysis shows that, if the G amification platform increases the threshold for earnin g silver and bronze badg es, but decreases the threshol d f or earning gold b adges, it can increase users’ contribution. I b elieve th at my modeling approach, proposed esti mation method, an d derived empirical insights i n this paper can be of interest to both practitioners and scholars in academia. 165 APPENDIX APPENDIX 1.A: DIRECT ACYCLIC GRAPH OF CONDITIONAL DISTRIBUTI ONS Probabilistic graphic al approaches are popula r i n computer scienc e, a s they not onl y prove a visual tool to recognize conditional independence, but also they help saving space in representing probability distribu tions, and th ey facilitate p robabilisti c q ueries. Following represents th e probabilistic graphical representation of the model I studied in this paper. Shaded circles represent th e o bserved variables and un-shaded ones represent latent v ariables or pa rameters. The rectangles, called p late, represent the replicati on of variables with the n umber specified at their bottom right. F i gure 1.A. 1. Probabilistic graphical model of cust omer mobile app choices under social influence Mobil e app categor y j = 1:J t = 1:T i = 1:I 166 APPENDIX 1.B: UNSCENTED KALMAN FILTER A recursive al gorithm to update the latent state v ariable with Unscented Kalman Filter has the following steps. I refer the interested reader to Wan and va n der Merve (200 1). Model has the following form, th e first equation observation equation, and the se cond on e state equation, with nonlinear functions H and F: ) , 0 ( ~ , ) ( ) , 0 ( ~ , ) ( 1 Υ + = Ι + = − MVN x F x MVN x H y k k k k k k k k υ υ η η (B1) The estimation algorithm: 2 2 ,..., 1 , ) ( 2 1 ) 1 ( ) ( } ,..., 1 { ] ) )( [( ] [ 2 0 0 2 ' 0 0 0 0 0 0 0 = = + = = + − + + = + = − + = ∞ ∈ − − = = β λ β α λ λ λ λ κ α λ L i L W W L W L W L L k x x x x E P x E x c i m i c m ) ) ) (B2) Drawing Sigma points: ] ) ( [ 1 1 1 1 − − − − + ± = ℵ k k k k P L x x λ ) ) (B3) Updating Time: 167 1 | 2 0 1 | 1 | 1 | 1 | 2 0 1 | 2 0 1 | 1 | ] [ ] ][ [ ] [ − = − − − − − − − = − − = − − − ℑ = ℵ = ℑ + − ℵ − ℵ = ℵ = ℵ = ℵ ∑ ∑ ∑ k ik L i m i k k k k k T k k ik k k ik L i c i k k ik L i m i k k k k k W y H I x x W P W x F ) ) ) ) (B4) Updating Measurement: T y k k k k k k y y x T k k k k k ik L i c i y x T k k k k k k L i c i y K KP P P y y K x x P P K y x W P Y y y W P k k k k k k k − = − + = = − ℑ − ℵ = + − ℑ − ℑ = − − − − − − − − = − − − − = ∑ ∑ ) ( ] ][ [ ] ][ [ 1 1 | 1 | 2 0 1 | 1 | 2 0 ) ) ) ) ) ) ) (B5) APPENDIX 1.C: CONDITIONAL DISTRIBUTIONS FOR ESTIMAT ION OF THE MICRO CHOICE MODEL Conditional distributions of the choice variable include the followin g : jt imm jt i i j it j it i F c I i s y A , , ... 1 , , , | ) = Σ µ (C1) where this conditio nal d ist ribution can be estimat ed b y random wal k metropolis hastin g on the weighted likelihood. The priors for normal mixture distri bution of the individual an d the category specific parameters used are: 168 v v a I v a z A i i i i i i d d i i i i )*}, , {( | )*}, , {( | )*} , {( | * | , , , , , , | )} , {( Σ Σ Σ ∆ Σ µ ϑ ϑ µ µ α α ϑ µ (C2) where the f irst conditional is the standard p osterior Polya Urn representation for the mean and variance of i ndividual specific random coefficient choice model pa rameters. * ) , ( i i Σ µ denotes a set of unique ) , ( i i Σ µ , which the D P p rocess hyper-parameters depend only on (a posteriori). Given the )*} , {( i i Σ µ set d α and based measure parameters (i. e. ϑ , , v a ) are independent, a posteriori. The condition al posterior of th e 0 G h yper-parameters (i.e. ϑ , , v a ), factors into two parts as a is independent of ϑ , v give n )*} , {( i i Σ µ . The form of this conditional posteri or is: ) , , ( ) , | * ( *) , , 0 | ( )*}) , {( | , , ( * 1 1 * ϑ ϑ µ φ µ ϑ v a p I v V v IW a v a p d j I j j j i i = Σ Σ ∝ Σ ∏ = − (C3) where .,.) | (. φ denotes the mu ltivariate normal d ensity. .,.) | ( . IW d enotes Inverted- Wishart distribution. F inally, for Polya representation i mplementation the following conditional distribution is used:        + + Σ Σ Σ Σ + + i prob with i prob with v a G d d d i i i i j j α δ α α ϑ µ µ µ µ 1 ) , , ( ~ )} , ( ),..., , {( | ) , ( ) , ( 0 1 1 1 1 (C4) I assessed the prior hyperparameters t o provide proper but diffuse distributions, defined fo rmally by: 80 , 0.000 01 , 600 , 0.00001 , 50 , 0.0 0001 = = = = = = v v a a s r w v s r ϑ ϑ (C5) 169 Finally to complete the expositio n, the posterior for the partition (segment) parameters has the following form: ( ) ( ) 0 , , ~ ) , ~ ( ~ , , , , , | )) ~ )( ~ ( ' ~ ' ' ~ , ( ~ , , , , | = = + + = + Σ ∆ Σ − − + ∆ − − ∆ − − + × × + ∆ Σ ∑ ∈ µ α α µ α µ µ µ α µ µ µ µ µ µ α µ α ϑ α k k i i k k k k k k k k k k k k k k k k k k k k k k k k k n a n a n a n N z a a z z I v n v IW V v z 170 APPENDIX 1.D: CHOICE PARAMETER ESTIMATES FOR ALTERNATIVE MODELS Table 1.D.1. PARAMETER ESTIMATES: Indivi dual Choice effect (Local imitators) Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -9.19* 6.57 -34.490 -3.216 eBooks 2 α -8.89* 2.84 -13.253 -3.337 Games 3 α -25.82* 8.31 -38.609 -9.519 Health/Diet/Fitness 4 α 1.17 1.83 -6.588 2 .295 Humor/Jokes 5 α -0.41 7.11 -31.222 1.509 Internet/WAP 6 α -12.11* 3.94 -18.256 -4.378 Logic/Puzzle/Trivia 7 α -26.26* 10.30 -51.248 -9.384 Reference/Dictionaries 8 α -17.95* 5.82 -27.128 -6.624 Social Networks 9 α -3.23 1.33 -5.043 0.184 University 10 α -6.86* 1 0.07 -49.254 -1.866 States: Individual download history State 11 α -17.79* 18.37 -93.057 -5.689 Latent imitation level 12 α 0.02* 0.01 0 .005 0.032 App category characteristics (factors): Popularity of app category 13 α 0.39 0.73 -2.766 0 .789 Investment apps category 14 α -10.73* 3.45 -15.955 -3.700 Hedonic apps category 15 α 12.67 7.57 -15.563 20.683 * p<0.05 171 Table 1.D.2. PARAMETER ESTIMATES: Indivi dual Choice effect (Global imitators) Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -2.13* 0.26 -2.79 -1.89 eBooks 2 α -0.6 0.84 -0.68 0 .52 Games 3 α 0.22* 0.37 0 .17 1.18 Health/Diet/Fitness 4 α 1.06 0.86 -1.46 1 .53 Humor/Jokes 5 α -2.68* 0.34 -3.20 -1.63 Internet/WAP 6 α -0.53* 0.28 -1.76 -0.44 Logic/Puzzle/Trivia 7 α -1.72 0.75 -2.07 1.08 Reference/Dictionaries 8 α -1.45 0.46 -1.76 0.64 Social Networks 9 α -1.64* 0.36 -1.97 -0.44 University 10 α -2.05* 0.37 -2.44 -1.61 States: Individual download history State 11 α -3.3* 0.44 -4.00 -2.15 Latent imitation level 12 α 0.01* 0.00 0 .00 0.01 App category characteristics (factors): Popularity of app category 13 α -0.08* 0.15 -0.38 -0.06 Investment apps category 14 α -0.42* 1.09 -1.24 -0.40 Hedonic apps category 15 α -0.31* 0.49 -2.29 -0.22 172 Table 1.D.3. PARAMETER ESTIMATES: Individual Cho ice effect (Global Adopters) Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -24.94* 16.81 -14.327 -2.669 eBooks 2 α -18.31* 12.36 -15.290 -6.381 Games 3 α -18.66* 11.97 -11.222 -2.296 Health/Diet/Fitness 4 α -13.37 8.16 -5.939 2 .982 Humor/Jokes 5 α -11.26* 6.43 -22.097 -9.715 Internet/WAP 6 α -6.92* 2.54 -8.021 -3.021 Logic/Puzzle/Trivia 7 α -0.13* 1.06 -18.122 -8.332 Reference/Dictionaries 8 α -17.74* 12.56 -11.092 -4.547 Social Networks 9 α -26.77 1 7.29 -15.530 0.076 University 10 α -9.2* 7.16 -7.791 -2.916 States: Individual download history State 11 α -35.93* 22.88 -34.350 -13.821 Latent imitation level 12 α 0.01* 0.01 0 .011 0.035 App category characteristics (factors): Popularity of app category 13 α -1.05 0.73 -0.830 1.767 Investment apps category 14 α 19.1 13.08 -0.922 7.230 Hedonic apps category 15 α -25.79 1 7.78 -6.606 1 0.330 173 Table 1.D.4. PARAMETER ESTIMATES: Indivi dual Choice effect (No social influence) Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -20.51* 6.14 -28.162 -2.250 eBooks 2 α -2.43* 0.39 -2.647 -1.372 Games 3 α -11.89 5.26 -19.040 0.482 Health/Diet/Fitness 4 α -10.4* 3.55 -15.011 -0.619 Humor/Jokes 5 α -8.9 3.35 -13.168 0.716 Internet/WAP 6 α -21.99* 7.19 -31.258 -2.083 Logic/Puzzle/Trivia 7 α -16.36* 4.60 -21.876 -1.765 Reference/Dictionaries 8 α -7.59 1.80 -8.841 0.232 Social Networks 9 α -10.77* 3.39 -14.949 -0.369 University 10 α -2.44* 0.46 -2.661 -0.457 States: Individual download history State 11 α -15.84* 4.79 -22.241 -4.298 Latent imitation level 12 α - - - - App category characteristics (factors): Popularity of app category 13 α -1.77* 0.68 -2.708 -0.254 Investment apps category 14 α -7.51* 2.81 -11.425 -1.362 Hedonic apps category 15 α -9.29* 3.67 -14.263 -0.893 174 Table 1.D.5. PARAMETER ESTIMATES: Individual Cho ice Hierarchical Model (Local imitators): Tenure explanation of the effects Parameter explained by Tenure Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -0.0032* 9.66E-05 -0.0034 -0.0030 eBooks 2 α -0.0012* 1.42E-04 -0 .0015 -0.0010 Games 3 α -0.0005* 1.31E-04 -0.0008 -0.0002 Health/Diet/Fitness 4 α -0.0023* 1.37E-04 -0.0026 -0.0021 Humor/Jokes 5 α 0.0006* 7.44E-05 0.0004 0.0007 Internet/WAP 6 α 0.0022* 1.29E-04 0.0019 0.0023 Logic/Puzzle/Trivia 7 α 0.0028* 1.60E-04 0.0025 0.0031 Reference/Dictionaries 8 α 0.0004* 9.12E-05 0.0002 0.0006 Social Networks 9 α 0.0034* 1.46E-04 0.0031 0.0036 University 10 α 0.0007* 4.04E-05 0.0006 0.0007 States: Individual download history State 11 α -0.005* 8.06E-05 -0.0051 -0.0048 Latent imitation level 12 α 0.0001* 5.88E-06 0.0000 0.0001 App category characteristics (factors): Popularity of app category 13 α 0.0001* 1.13E-05 0.0001 0.0001 Investment apps category 14 α 0.0025* 6.35E-05 0.0024 0.0026 Hedonic apps category 15 α -0.0012* 1 .09E-04 -0.0014 -0.0010 * p<0.05 175 Table 1.D.6. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (Global imitators): Tenure explanation of the effects Parameter explained by Tenure Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α -0.0038* 1.61E-04 -0.0041 -0.0035 eBooks 2 α -0.0014* 1.78E-04 -0.0017 -0.0011 Games 3 α 0.0009* 8.48E-05 0.0007 0.0010 Health/Diet/Fitness 4 α 0.0046* 4.85E-04 0.0040 0.0054 Humor/Jokes 5 α -0.0061* 3.34E-04 -0.0065 -0.0056 Internet/WAP 6 α -0.0005* 7.52E-05 -0.0006 -0.0004 Logic/Puzzle/Trivia 7 α -0.0035* 1.65E-04 -0.0038 -0.0032 Reference/Dictionaries 8 α -0.0033* 3.75E-04 -0.0039 -0.0028 Social Networks 9 α -0.0034* 2.16E-04 -0.0037 -0.0030 University 10 α -0.0047* 2.57E-04 -0.0051 -0.0043 States: Individual download history State 11 α -0.0086* 5.63E-04 -0.0095 -0.0077 Latent imitation level 12 α -0.0001* 1.31E-05 -0.0001 -0.0001 App category characteristics (factors): Popularity of app category 13 α -0.0002* 1.72E-05 -0.0002 -0.0002 Investment apps category 14 α -0.0016* 9.04E-05 -0.0018 -0.0014 Hedonic apps category 15 α -0.0005* 8.27E-05 -0.0006 -0.0004 * p<0.05 176 Table 1.D.7. PARAMETER ESTIMATES: Indivi dual Choice Hierarchical Model (Global Adopters): Tenure explanation of the effects Parameter explained by Tenure Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α 0.0003* 1.00E-04 1.36E-04 0 .00046 eBooks 2 α 0.00034* 8.80E-05 1.46E-04 4.74E-04 Games 3 α -0.00016 1.00E-04 -3.15 E-04 6.43E-05 Health/Diet/Fitness 4 α 0.00028* 7.29E-05 1.38E-04 0.000424 Humor/Jokes 5 α 0.00027* 9.25E-05 1.04E-04 0.000454 Internet/WAP 6 α 0.00157* 1.09E-04 1.35E-03 1.76E-03 Logic/Puzzle/Trivia 7 α 0.00072* 9.03E-05 5.92E-04 0.00091 Reference/Dictionaries 8 α 0.00047* 6.80E-05 3.06E-04 5.76E-04 Social Networks 9 α -0.00006 1.00E-04 -2.49 E-04 9.45E-05 University 10 α 0.00071* 9.72E-05 5.37E-04 8.64E-04 States: Individual download history State 11 α 0.00119* 1.98E-04 8.84E-04 0.001637 Latent imitation level 12 α -0.00001 4.69E-06 -1.43E-05 3.22E-06 App category characteristics (factors): Popularity of app category 13 α -0.00007* 1.65E-05 -9.67E-05 -3.90E-05 Investment apps category 14 α -0.00095* 1.17E-04 -1.14E-03 -7.95E-04 Hedonic apps category 15 α 0.0002* 8.21E-05 7.29E-05 3.66E-04 * p<0.05 177 Table 1.D.8. PARAMETER ESTIMATES: Individual Cho ice Hierarchical Model (No social influence): Tenure explanation of the effects Parameter explained by Tenure Estimate Std. Dev. 2.5 th 97.5 th Category specific preference: Device Tools 1 α 0.00192* 1.34E-04 1.63E-03 0.002171 eBooks 2 α 0.00162* 1.21E-04 1.33E-03 1.83E-03 Games 3 α 0.00024 4.04E-04 -4.20E-04 8.69E-04 Health/Diet/Fitness 4 α -0.00004 1.86E-04 -3.63 E-04 0.000259 Humor/Jokes 5 α 0.00019 1.53E-04 -6.07E-05 0.000484 Internet/WAP 6 α 0.00164* 2.65E-04 1.15E-03 2.04E-03 Logic/Puzzle/Trivia 7 α 0.00207* 1.44E-04 1.87E-03 0.002432 Reference/Dictionaries 8 α 0.00292* 1.36E-04 2.58E-03 3.13E-03 Social Networks 9 α 0.00128* 1.39E-04 1.01E-03 0.001511 University 10 α 0.00066* 8.36E-05 4.77E-04 7.96E-04 States: Individual download history State 11 α 0.00084* 1.19E-04 6.21E-04 1.08E-03 Latent imitation level 12 α - - - - App category characteristics (factors): Popularity of app category 13 α 0.000005 3.77E-05 -6.62E-05 6.02E-05 Investment apps category 14 α 0.00023 1.65E-04 -1.11E-04 4.58E-04 Hedonic apps category 15 α -0.00014 2.00E-04 -4.80E-04 1.75E-04 * p<0.05 178 Table 1.D.9. PARAMETER ESTIMATES: Individual Choice effect (Local imitators) Total number of users: 1258 Positive Significant Negative Significant Category specific preference: Device Tools 1 α 0 1258 eBooks 2 α 0 1258 Games 3 α 0 1258 Health/Diet/Fitness 4 α 1197 61 Humor/Jokes 5 α 1197 61 Internet/WAP 6 α 0 1258 Logic/Puzzle/Trivia 7 α 0 1258 Reference/Dictionaries 8 α 0 1258 Social Networks 9 α 58 1197 University 10 α 0 1258 States: Individual download history State 11 α 0 1258 Latent imitation level 12 α 1217 0 App category characteristics (factors): Popularity of app category 13 α 1197 61 Investment apps category 14 α 0 1258 Hedonic apps category 15 α 1197 61 179 Table 1.D.10. PARAMETER ESTIMATES: Indi vidual Choice effect (Global imitators) Total number of users: 1258 Positive Significant Negative Significant Category specific preference: Device Tools 1 α 0 1258 eBooks 2 α 42 1216 Games 3 α 1250 8 Health/Diet/Fitness 4 α 1208 50 Humor/Jokes 5 α 0 1257 Internet/WAP 6 α 0 1258 Logic/Puzzle/Trivia 7 α 42 1216 Reference/Dictionaries 8 α 42 1216 Social Networks 9 α 8 1250 University 10 α 8 1250 States: Individual download history State 11 α 0 1258 Latent imitation level 12 α 1051 4 App category characteristics (factors): Popularity of app category 13 α 8 1250 Investment apps category 14 α 8 1250 Hedonic apps category 15 α 8 1250 180 Table 1.D.11. PARAMETER ESTIMATES: Indi vidual Choice effect (Global Adopters) Total number of users: 1258 Positive Significant Negative Significant Category specific preference: Device Tools 1 α 0 1258 eBooks 2 α 0 1258 Games 3 α 0 1258 Health/Diet/Fitness 4 α 0 1258 Humor/Jokes 5 α 55 1203 Internet/WAP 6 α 0 1258 Logic/Puzzle/Trivia 7 α 438 545 Reference/Dictionaries 8 α 0 1258 Social Networks 9 α 0 1258 University 10 α 0 1249 States: Individual download history State 11 α 0 1258 Latent imitation level 12 α 607 257 App category characteristics (factors): Popularity of app category 13 α 55 1203 Investment apps category 14 α 1203 55 Hedonic apps category 15 α 55 1203 181 Table 1.D.12. PARAMETER ESTIMATES: Indi vidual Choice effect (No social influence) Total number of users: 1258 Positive Significant Negative Significant Category specific preference: Device Tools 1 α 0 1258 eBooks 2 α 0 1258 Games 3 α 54 1103 Health/Diet/Fitness 4 α 0 1258 Humor/Jokes 5 α 54 1204 Internet/WAP 6 α 0 1258 Logic/Puzzle/Trivia 7 α 0 1258 Reference/Dictionaries 8 α 54 1204 Social Networks 9 α 0 1258 University 10 α 0 1258 States: Individual download history State 11 α 2 1256 Latent imitation level 12 α - - App category characteristics (factors): Popularity of app category 13 α 2 1241 Investment apps category 14 α 0 1243 Hedonic apps category 15 α 2 1241 182 F i gure 1.D. 1. PARAMETER DISTRIBUTION: Heterogeneit y in Individual Choice ( Local Imitators) Device tool s cat. ebook ca t. Logic/puzzl e Trivia cat . Reference/ Dictionaries Games cat. Health/ Diet/ Fitness app cat. So cial NW cat. Universit y cat Humor/Jokes cat. Internet / WAP cat. Individu al State Imitators d ensity Popul ar apps Investment apps Freemiu m apps 183 F i gure 1.D.2. PARAMETER DISTRIBUTION: Heterogeneity in Indiv idual C hoice (Global Imitators) Device tool s cat. ebook ca t. Logic/puzzl e Trivia cat . Reference/ Dictionaries Games cat. Health/ Diet/ Fitness app cat. So cial NW cat. Universit y cat Humor/Jokes cat. Internet / WAP cat. Individu al State Imitators d ensity Popul ar apps Investment apps Freemiu m apps 184 F i gure 1.D.3. PARAMETER DISTRIBUTION: Heterogeneity in Indiv idual C hoice (Global Adopters) Device tool s cat. ebook ca t. Logic/puzzl e Trivia cat . Reference/ Dictionaries Games cat. Health/ Diet/ Fitness app cat. So cial NW cat. Universit y cat Humor/Jokes cat. Internet / WAP cat. Individu al State Imitators d ensity Popul ar apps Investment apps Freemiu m apps 185 F i gure 1.D.4. PARAMETER DISTRIBUTION: Heterogeneity in Individu al Choice (No social influence) Device tool s cat. ebook ca t. Logic/puzzl e Trivia cat . Reference/ Dictionaries Games cat. Health/ Diet/ Fitness app cat. So cial NW cat. Universit y cat Humor/Jokes cat. Internet / WAP cat. Individu al State Imitators d ensity Popul ar apps Investment apps Freemiu m apps 186 APPENDIX 2.A: LATENT DIRICHLET ALLOCATION LDA is a three-level hiera rchical Ba yesian model, in which e a ch item o f a c ollection is modeled as a finite mix ture o ver an underl ying set of topics (Blei e t a l 2 003). LDA i s a generative ap proach; it use naïve con ditional independence assumption, an d it neglect the order of features b y assu min g exchangeability and usin g bag of words representation. T hese assumptions bring t wo main benefi ts to these approaches: simplicity, computational efficiency. Formally the LDA model assumes the follow ing generative process for each item i in a collection C consisting of element (feature) e: 1. Choose N ~ Poisson ( ξ ), where N is the number of elements e 2. Choose ) ( ~ α θ Dir , where θ is the probability tha t a given document has primitive topic 3. For each of the N features n i : a. Choose a topic ) ( ~ θ l Multinomia z n b. Choose a feature n i f rom ) , | ( β n n z i p , a multinomial probability conditioned on the topic A k-dimensional Dirichlet rando m variable θ can take values in the ( k-1)-simplex (a k-vector θ lies in the (k-1)-simplex if 1 , 0 1 = ≥ ∑ = k i i i θ θ ), and has the followi ng p robability d ensity o n this simplex: 187 1 \ 1 1 1 1 1 1 ... ) ( ) ( ) | ( − − = = Γ ∏ Γ = ∑ k k i k i k i i p α α θ θ α α α θ I r epresented the Probability Gr aphical Model (PGM) of LDA in figure 1 .4. As figure d epicts, there ar e three levels to the L DA r epresentation. The parameters β α , are collection level parameters, and they a re sampled onc e. The variable d θ has Dir ichlet distribution, and it is document level variable, so it i s sampled once per document. This variable simply defines the weight distribution of to pics within th e document. Finally v ariables n d z and n d w are featu re lev el parameters and the y are sampled once fo r each feature within each d ocument. Variable n d z defines the topic o f n’ths word within document d, and variable n d w defines the feature instance that appears at location n within document d. A s I can see an LDA model is a type of conditionally in dependent hierarchical model, and it is often referred t o as parametric empirical Bayes model. One of the advantages of an LDA model is that it i s parsimonious, so unlike probabilistic Latent Semantic Indexing (pLSI) model, it does not suffer from over fittin g. 188 F igure 2.A.1. Graphical model representation of LDA To estimate LDA model, I defined the likelihood of model in the followin g: ∏ ∫ ∏ ∑ = = = M d N n d z d d d d d d n d n n n d z w p z p p D p 1 1 ) , | ( ) | ( )( | ( ) , | ( θ β θ α θ β α The key inferential problem to solve for LDA is c omputing post erior dis tribution of topic hidden variables d d z , θ , the f irst one wi th Dirichlet d istribution, and the second one with multinomial distribution. To normaliz e the distributio n of w ords given β α , , I marginalized over t he h idden variables as followin g: ∏ ∫ ∏ ∑ ∏ ∏ ∏ ∑ = = = = = − Γ Γ = M d N n w ij k i V j i k i i i i i i d j n i d D p 1 1 1 1 1 1 ) ( )( ( ) ( ) ( ) , | ( θ β θ θ α α β α α Due to the co upling between θ and β in the summation over latent topics this likelihood function is intractable. Therefore to estimate it Blei et al. (2003) suggests using variational inference method. V ariational inference or v ariational Ba y esian r efers to a fa mily of techniques fo r θ 189 approximating intractable integrals arising in Bayesian inference and m achine learning. These family of meth ods are an alternative to samplin g methods, and they are basicall y used to analytically approximate t he posterior probability of the unobser vable variables, in order to do statistical inference ove r these variables. These methods also give a lower bound to the marginal log likelihood. This family of lower bounds is in dexed b y a set o f variational parameters. To obtain ti ghtest lower boun d I used an optimization procedure to select the v ariational parameters. A si mple w ay t o obtain a t ractable f amil y of lo wer bounds is to consider si mple modifications o f the o riginal grap hical model, by re moving d ependencies and introd ucing new variational parameters instead. In the LDA model, I used t he following vari ational distribution to approximate posterior distribution of unobserved variables given the observed data s follows: ∏ = = N n n n z q q z q 1 2 1 ) | ( ) | ( ) , | , ( φ γ θ φ γ θ Where (.) 1 q is a Dirichlet d ist ribution with parameter s γ and (.) 2 q is a multi nomial distrib ution with parameters n φ . Variation al parameters are resu lt o f solvin g th e foll owing optimi zation problem: )) , , | , ( || ) , | , ( ( ) , ( min arg ) , ( * * β α θ φ γ θ φ γ φ γ w z p z q D KL = where KL D represents the Kullback-Leibler (KL) divergen ce between the variati onal distr ibution and the true joint posterior of l atent parameters ) , , | , ( β α θ w z p . Form all y, KL D is defined as follows: 190 ∑ = ) , ( ) ) , , | , ( ) , | , ( log( ) , | , ( ) , , | , ( || ) , | , ( ( φ γ β α θ φ γ θ φ γ θ β α θ φ γ θ w z p z q z q w z p z q D KL As a result, I can write KL-divergence in the followin g format: )) , , | , ( || ) , | , ( ( ) , ; , ( ) , | ( β α θ φ γ θ β α φ γ β α w z p z q D L w Logp KL + = where )] , ( [log )] , | , , ( [log ) , ; , ( z q E w z p E L q q θ β α θ β α φ γ − = This relation suggests that maximizing the l ower bound ) , ; , ( β α φ γ L with respect to γ and φ is equivalent to minimizing the KL divergence between t he variational posterior probability and the true posterior p robability. Expanding ) , ; , ( β α φ γ L using factorization of p and q gives the following: ni N n k i ni k j j k i i i k i i k j j k j j i N n k i ni k j j k i i i k i i k j j q q q q q z q E q E z w p E z p E p E L φ φ γ γ γ γ γ γ γ φ γ γ α α α θ β θ α θ β α φ γ log ) ( ) ( )( 1 ( ) ( log ) ( log ) ) ( ) ( ( ) ( ) ( )( 1 ( ) ( log ) ( log )] ( [log )] ( [log ) ] , | ( [log )] | ( [log )] | ( [log ) , ; , ( 1 1 1 1 1 1 1 1 1 1 1 1 1 ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ = = = = = = = = = = = = = + Ψ − Ψ − + Γ − Γ − Ψ − Ψ + Ψ − Ψ − + Γ − Γ = − − + + = Where (.) Γ is gamma function and (.) Ψ is i ts derivative. They key for this d erivation is the following equation: ∑ = Ψ − Ψ = k j j i i E 1 ) ( ) ( ] | [lo g α α α θ , which is direct derivative o f general fact that the derivative of log normalization factor w ith respec t to th e n atural parameter of an exponential distribution i s equal to the expectation of sufficient statistics. Collecting terms that are only related to each of the variational pa rameters γ and ni φ from ) , ; , ( β α φ γ L , and gettin g 191 the derivative respectivel y give us an algorithm to sol ve t he above optimiz ation problem to fi nd variational parameters. In p articular, I can use a simple iterative fixed-poi nt method and update two variational parameters by the following equations unti l convergance: ∑ = + = ∝ n n ni i i i q iw ni E n 1 ]} | ) [log( ex p{ φ α γ γ θ β φ This optimization is document s pecific, so I viewed the Dirichlet parameter ) ( * w γ as providi ng a representation of a docu ment in the topic simplex. In summary, I had th e follo wing variational inference algorithm for LDA (Blei et al 2003): (1) Initialize k ni / 1 : 0 = φ for all i and n (2) Initialize k N i i / : + = α γ for all i and n (3) Repeat a. For n=1 to N i. For i = 1 to k 1. )) ' ( exp( : 1 i iw t ni n γ β φ Ψ = + ii. Normalize 1 + t ni φ to sum to 1 b. ∑ = + + + = N n t n t 1 1 1 : φ α γ (4) until convergence 192 This algorithm has the order of ) ( 2 k N O . Given the variational Bayesian method, I had tractable lower bound on the log li kelihood, a bound whi ch I can maximize with respect to α and β . I can thus find approx imate empirical Bayes estimates for th e LDA model via an alternating variational EM (VEM) proc edure t hat maximizes a lower b ound w ith re spect to variati onal parameters γ and φ , and th en, for fix ed values of the variational parameters, maximizes t he lower bound with respect t o the model parameters α and β . The VEM algorithm is defined in the following: 1. (E-step) For each docu ment, find the opti mization value of the variational parameters } : , { * * D d d d ∈ φ γ . This is done as described in the above variational inference algorithm. 2. (M-step) Maximize the resulting lower bound on t he log likelihood wi th respect to the model parameters α and β . This corresponds to findin g t he max imum likelih ood estimates with expected sufficient st atistics f or each document under the approximate posterior which is computed in the E-step. The update for th e conditional multi nomial parameter β can be written out analyticall y as: ∑ ∑ = = ∝ M d N n j dn dni ij d w 1 1 * φ β The last concern about LDA is to make su re t hat s parsity do es not make th e li kelihood zero, an extended gra phical model with prior on β , where β is a k*V r andom matrix(k nu mber o f topics and V number of features, a row for each compon ent), wit h i ndependence identically Dirichlet distributed with parameter η rows assumption. Now i β can be t reated as a random v ariable to 193 be endowed to t he poste rior distributio n of hidden variables, giving u s the following variati onal distribution with independence assumption: ∏ ∏ = = = M d d d d d d k i i i M M M z q Dir z q 1 1 : 1 : 1 : 1 ) , | , ( ) | ( ) , , | , , ( φ γ θ λ β φ γ λ θ β To account for t his modification, I only needed to chan ge th e variational inference al gorithm b y augmenting the following update of variational parameter λ as follows: ∑ ∑ = = + = M d N n j dn dni ij d w 1 1 * φ η λ This equation finalizes the plot o f VEM algori thm to estimate an L DA model. There is a n alternative approach proposed by Phan et al. (2 008) that uses Gibbs samplin g to estimate an LDA model. This a pproach d raws f rom the posterior distribution of p(z |w) b y sa mpling as follows: α α δ δ k n n V n n z w K z p i i d i d K i K i j K i i i + + + + ∝ = − − − − − ) ( ,. ) ( , (.) , ) ( , ) , | ( where i z − i s the vector of cu rrent topic memberships of all w ords wi thout t he i’th word i w . The index j indicates th at i w i s equal to the j’th term in the vocabulary. ) ( , j K i n − gives ho w o ften the j’th term of t he vo cabulary is currentl y ass igned to top ic K without the i 'th word, and th e dot implies the summation over all relevant in dex instances. i d indicates the do cument in the collection to which the word i w belongs to. In this Bayesian formulation δ and α are th e prior parameters for 194 the term distribut ion of topi cs β and the to pic distribution of d ocuments θ , respectively. T he predictive distribution of the parameter θ and β given w and z are given b y: δ δ β V n n K i j K i j K + + = − − (.) , ) ( , ) ( ˆ α α θ k n n i i d i d K i d K + + = − − ) ( ,. ) ( , ) ( ) The likelihood for the Gibbs sampling also has the following form: ))} ( lo g( ))] ( log( {[ ) ) ) ( ) ( log( )) | ( log( (.) 1 1 ) ( δ δ δ δ V n n V k z w p K k K V j j K V + Γ − + Γ + Γ Γ = ∑ ∑ = = APPENDIX 2.B: K-MEANS CLUSTERING F i gure 2.B.1. Within groups sum of square based on number of clusters in K-Means algorithm for bidders 195 F igure 2.B.2. Within groups’ sum of square based on number of clusters in K-Means algorithm for auctions Table 2.B.1. Cluster center comparison between k-mean and mixture normal fuzzy clustering Segment Size Bidders Feedback mean STD (Bidder’s feedback) Number of Bids on This item STD(NBTI) total number of bids in 30 days STD (TNB30D) Number of item s bided on in 30 days STD (NIB30D) Bid activ ity with current Sell er STD(BACS) Number of Cat bid on STD(NCBO) k-means approach mean 261 5913 153 1 9 3 867 307 456 134 25 11 2 1 STD 285 19587 233 0 14 3 1569 408 989 169 28 8 2 1 Mixture normal Fuzzy clustering mean 261 3471 363 5 9 7 680 631 258 267 24 19 2 1 STD 275 12400 848 0 8 7 1073 1066 413 450 20 12 1 1 196 APPENDIX 2.C: ESTIMATION PROCEDURE We summarize the Monte Carlo (Generalized) Expectation Maximization algorithm usin g pseudo code. [ Outline of the algorithm ] Parameters to estimate: ) , , , ( j j j j j η ι τ γ = Ψ : Auction specific, ) , , , ( i i i i i ρ δ β α = Θ : Bidder specific ) , , , , , ( 2 2 1 1 j j j j wj vj j ξ ζ ξ ζ σ σ σ σ σ σ = Σ : Variance of state space Clustering step: eBay -specified au ction clu ster indices are denoted by j clus . Bi dders are clustered using mixtu re normal fuzzy clustering to ex tract i ind : index o f membership of bidder i in bidder segment. Generalized E-M algorithm: Step 0: Initialize all parameters to estimate Ψ , Θ , and Σ E-Step: • Co mpute weighted least square to estimate ) , ˆ , , ˆ ( auction k a uction bidder k bidder b b Σ Σ • Co mpute prior over bidder and auction specific parameters • Co mpute expected likelihood function usin g Kalman forward filtering and backward smoothing to estimate t he distribution of state parameters. Then use Monte Carlo Sampling to inte g rate over latent state. M-Step: Improve the ex pected likelihood function w.r.t ) , , ( Σ Θ Ψ using simulate d an nealing method and return to step 1a. [ Details of the algorithm ] 197 [Input data]: a sequence of bid it b of ind ividual I i ,.. ., 1 = on t’th bid T t , ... , 1 = within each auction J j ,..., 1 = , and a vector of cross sectional in formation about each bidder i d for each bidder, and a vector of cross sectional information about each auction j d for each auction. [Preprocessing] : 1. eBay-specified auction cluster indices are denoted by j clus . 2. Identify the segment of each of bidders b y estimating mixtu re normal fuzz y clustering, specified in equation (15). With the following EM algorithm: [ E-step] : Compute “expected” s egment of all b idders fo r each s egment b y evaluating t he Gaussian density of the bidder i’s data for each segment: ∑ = Σ Σ = Σ K k k k k i ind ind i nd i i d P d P ind P i i i 1 ) , | ( ) , | ( ) , , | ( π µ π µ π µ [ M-Step ]: Compute maxi mum likelihood of the model given the data’s class membershi p distribution: I k ind P ind P d d k ind P ind P d k ind P I i i k I i i I i T k i k i i k I i i I i i i k ∑ ∑ ∑ ∑ ∑ = = = = = Σ = = Σ − − Σ = = Σ Σ Σ = = 1 1 1 1 1 ) , , | ( ) , , | ( ] ][ )[ , , | ( ) , , | ( ) , , | ( π µ π π µ µ µ π µ π µ π µ µ The output of thi s algorithm after convergence is i ind which is the segment of bidder i. 3. Set initial value for the fo llowing parameters vecto rs: [Auction sp ecific parameters] ) ,..., ( ), ,..., ( ), ,... , ( ), ,..., ( 1 1 1 1 J J J J j η η η ι ι ι γ γ γ τ τ τ = = = = Stacked in ) , , , ( j j j j j η ι τ γ = Ψ , so ) ,..., ( 1 J Ψ Ψ = Ψ [Bidder sp ecific parameters ] ) ,.. ., ( ), ,..., ( ), ,..., ( ), ,..., ( 1 1 1 1 I I I I ρ ρ ρ δ δ δ β β β α α α = = = = Stacked in ) , , , ( i i i i i ρ δ β α = Θ so ) ,..., ( 1 I Θ Θ = Θ [Varian ce of the state space equations] 198 ) ,..., ( ), ,..., ( ) ,..., ( ), ,..., ( ) ,..., ( ), ,..., ( 2 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 ξ ξ ξ ζ ζ ζ ξ ξ ξ ζ ζ ζ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ σ J J J J Jw w w Jv v v = = = = = = Stacked in ) , , , , , ( 2 2 1 1 j j j j wj vj j ξ ζ ξ ζ σ σ σ σ σ σ = Σ so ) ,..., ( 1 J Σ Σ = Σ [Main procedure to maximize a posteriori] 1. Compute the prior on the auction specific parameters:         Σ Ψ         Σ = ∑ ∑ = = = − = = = ) ( ) ( ˆ 1 ) ( ) ( 1 1 ) ( ) ( K k auction k k j clust T k j clust K k auction k k j clust T k j clust auction d d d b Then prior is defined as follows: ) , , ˆ | ( ) ( j auction k auct ion k j cl ust j Nor m d b P Σ Ψ = 2. Compute prior on the bidder specific parameters:         Σ Θ         Σ = ∑ ∑ = = = − = = = ) ( ) ( ˆ 1 ) ( ) ( 1 1 ) ( ) ( K k bidder k k i ind T k i ind K k bidder k k i ind T k i ind bidders d d d b Then prior is defined as follows: ) , , ˆ | ( ) ( i bidder k b idders k i ind i Norm d b P Σ Θ = 3. Compute the likelihood contribution of belief of bidder about the bids: For j := 1, …, J do: [Kalman Filter on the evolution of bids i n equation 4 and 5 ] For t := 1,…,T do: [Time updating (Prediction)] Project state ahead of a step ahead j jt j jt γ θ τ θ + = − − 1 Project the error covariance matrix a head jw j jt t j jt t V V σ τ θ τ θ + = − − − ' ) ( ) ( 1 1 [Measurement update (Correction) ] Compute the Kalman gain ( ) jv jt jt jt V V K σ θ θ + = − − − − ) ( ) ( 1 1 Compute estimate with measurement: ) ( − − − + = jt jt jt jt jt b K θ θ θ Update the error covaria nce: − − = ) ( ) ( ) ( jt t jt jt t V K I V θ θ EndFor 199 For t := T,…,0 do: [Backward Smoothing] Correction factor ) ( ) ( 1 + = jt jt j jt V V C θ θ τ Correct estimate with step ahead state p rediction: ) ( 1 1 + + + + − + = jt jt jt jt jt C θ θ θ θ Update the error covaria nce: ( ) T jt jt t jt t jt jt t jt t C V V C V V ) ( ) ( ) ( ) ( 1 1 1 1 + + + + + + − + = θ θ θ θ EndFor EndFor [Monte Carl o E-Step] From the time var ying distribution of states draw S sa mple points Compute the followin g likeliho od contributi on of the belief ab out bid s based o n t he draws, by integratin g out the latent state: For j : = 1, …, J do: ∫ − × j jT j j j jv jt j t Norm jv jt jt Norm d P b P θ θ γ τ σ θ θ σ θ ,..., ) , , , | ( ) , | ( 1 1 EndFor 4. Compute the likelihood co ntribution of belief of bi dder about the number of bidders: Apply Kalman Fi lter and backward smoothing on the evolutio n of bids in equation (8) and (9) . Then, appl y th e Monte Car lo E-Step (the pseudocode is si milar to part (3), so I skip it here) 5. Compute the likelihood co ntribution of the evolutio n of valuation: [Invert the latent bid s to recover a measure of valuat ion] For j : = 1, …, J do:       + + + − + + = − − − − − − − ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 1 1 1 1 it t i it t it t it i i i t t it i it t i it t it it t it g g g g G g G E v θ β θ θ θ β α θ θ α θ α θ θ θ θ An adaptive quadrature algorit hm can be used to run the fol lowing integration: 200 it it t it t i it t it t it i i it t it i it t i it t it it t it d f g g g g G g G v θ θ θ β θ θ θ β α θ θ α θ α θ θ θ ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( 1 1 1 1 1 1 1 1 − − − − − − − − ∫       + + + − + + = EndFor Apply Kalman Filter and backward smoothing on th e evolution of bids in equation (14) . Then, apply the Monte Carlo E-Step (the pseudocode is si milar to part (3), so I s kip it here): [Generalized M-Step] Evaluate a posteriori of th e parameters given the log of priors on au ction and bidder spe cific parameters, and log of l ikelihood contribution of th e belief about t he b ids and number o f bidders in each auction, by su mming them u p, and optimize ov er the followin g v ector of parameters Ψ , Θ , and Σ . Due to the h igh n umber of parameters and multi-modality, I use simulated ann ealing with adaptive cooling for this step. APPENDIX 2.D: EXTRA TABLES FOR THE MAIN AND ALTERNATIVE MODEL (ONLINE COMPANION) Table 2.D.1. Bidder’s characteristics within each auction categor y Bidders segment segment size bidder feedback score (AVG) SD (FDBK score) Number of bids on this item (AVG) SD (Num. Bidder) total number of bids in 30 days (AVG) SD (total number of bids in 30 days) Number of items bided On (30 days) SD(num. items Bided on) Average num. bidding Activity with current Seller SD(NBCS) N categories bided on SD (N cat. Bided on) Jewelry and Watches 1 550 524 3077 5 10 275 710 127 341 21 28 2 1 Collectibles 859 863 4916 7 13 243 471 90 201 25 29 2 1 Clothing, Shoes and Accessories 453 342 1178 5 8 163 505 95 379 28 33 2 1 Crafts 5 58 536 1185 4 6 175 589 74 178 33 34 2 1 201 Pottery and Glass 607 967 4721 5 8 195 447 90 235 26 28 3 1 Antiques 546 643 1089 5 9 213 477 109 241 24 30 3 1 Toys and Hobbies 744 920 5589 5 9 159 357 64 170 26 30 2 1 Stamps 651 1188 1899 5 8 504 940 227 346 20 25 1 1 Books 482 651 1612 5 6 171 803 74 355 32 33 2 1 Tickets and Experiences 4 89 574 1224 3 4 56 119 33 85 40 36 2 1 Art 4 56 469 745 5 7 69 110 30 53 50 40 2 1 Gift Cards and Coupons 522 818 1784 4 6 373 1065 264 1001 17 27 2 1 Music 585 1047 3583 5 8 172 334 91 196 31 31 2 1 Consumer Electronics 734 425 2562 5 10 142 347 64 187 29 32 2 1 DVDs and Movies 602 635 1688 4 6 180 831 80 222 24 29 2 1 Dolls and Bears 679 1301 8033 5 9 219 370 90 155 17 24 2 1 Entertainment Memorabilia 5 06 626 1257 5 9 140 323 62 168 43 38 2 1 Health and Beauty 541 480 2098 5 9 100 191 44 94 35 34 2 1 Video Games and Consoles 682 567 4264 5 8 159 392 71 170 23 30 2 1 Table 2.D.2. Maximum A Posteriori of the model Element of the maximum a posteriori model selection criteria Log Likelihood Number of bidders evoluti on state space model -788,079 Bid evolution with each auction state space model -92,739,612 Valuation evolution state space model -627,982 prior on the auctions parameters -16,854 prior on the bidders parameters -10 7,558 202 Table 2.D.3. Bidder’s segment profile after mixture normal clustering Bidders 'segment In dex Segment Size Bidders Feedba ck mean STD (Bidder’s feedba c k) Number of Bids on Thi s item STD(NBTI) total number of bids in 30 days STD (TNB30D) Number of items bided on in 30 days STD (NIB30D) Bid activity with cu rrent Seller STD(BACS) Number of categorie s Bided on Mean STD(NCBO) 1 215 70 68 15 17 211 151 48 38 19 24 2 1 2 23 630 606 7 8 3266 2604 856 715 9 20 2 1 3 89 335 379 21 19 940 679 163 138 12 17 2 1 4 963 588 1148 1 0 233 692 148 382 20 30 2 1 5 153 671 5 5 0 192 0 90 1 26 1 2 0 6 466 221 331 5 6 27 26 10 10 45 34 2 1 7 90 1864 3782 3 1 252 215 160 154 12 22 2 1 8 992 907 4109 1 0 214 545 136 357 19 29 2 1 9 535 452 444 4 2 101 88 48 46 21 17 2 1 10 284 75 77 15 14 107 182 14 1 9 56 33 2 1 11 42 1783 2171 26 18 186 121 52 33 22 18 3 2 12 83 978 1289 5 4 462 335 217 213 1 1 2 1 13 12 19165 29755 9 9 3314 2474 1146 842 3 3 4 3 14 52 449 598 21 14 1243 740 209 114 12 19 2 1 15 113 50 46 18209 23 26 1778 1551 718 692 26 33 2 2 16 466 435 398 4 2 87 80 40 38 36 26 2 1 17 522 1146 1751 9 12 404 348 159 174 23 27 2 1 18 589 663 683 1 0 97 96 66 73 6 6 2 1 19 310 173 190 2 2 7 7 4 4 59 36 1 0 20 395 1426 1867 9 12 286 237 139 135 19 26 3 2 21 403 1 96 226 6 8 12 11 4 2 62 27 1 0 22 530 608 646 4 2 102 72 47 36 32 30 2 1 23 49 5504 11521 10 18 2369 2317 1283 1631 18 32 2 2 24 481 427 410 1 0 36 32 25 25 11 10 3 1 25 142 59 48 3 3 3 3 1 1 100 0 1 0 26 871 856 3857 1 0 81 162 56 124 19 29 2 1 27 2 42 518 322 5 2 201 153 83 52 20 11 2 1 28 569 122 156 7 10 29 34 9 9 4 6 33 2 1 29 62 567 1003 19 12 1123 911 225 201 4 4 2 1 30 83 1544 1884 13 15 162 170 40 29 14 13 3 1 31 102 340 348 3 1 420 546 194 265 2 2 3 2 32 7 14621 22217 2 0 4530 5814 1081 9 97 4 9 3 2 33 64 929 961 1 0 1939 2030 1631 20 99 7 18 2 2 34 163 655 85 5 2 209 93 93 26 26 6 2 0 35 183 273 599 3 0 80 142 30 61 35 35 2 1 36 198 91 105 14 12 26 24 5 5 76 24 2 1 37 346 560 1422 2 0 108 256 57 161 31 36 2 1 38 971 714 1790 1 0 132 2 77 81 179 23 32 2 1 39 65 527 1 202 5 1 29 18 12 8 32 20 2 1 40 14 2268 2423 10 5 301 147 115 58 17 19 3 1 41 73 175 174 16 14 808 612 159 84 3 3 2 1 42 73 350 449 7 4 143 80 45 23 7 4 3 1 43 93 48 45 12 8 134 118 30 29 32 24 2 1 44 41 198 264 35 17 1 77 97 35 35 27 11 2 1 45 19 1780 1464 18 10 822 577 212 133 15 13 2 1 46 5 84027 45365 3 1 831 893 644 703 2 2 3 3 47 3 8113 4001 11 8 3761 2896 1486 1391 39 40 3 1 203 Table 2.D.4. The wi nner regret i α estimates across bid der’s segments Bidders Segm ent Seg m ent Size Estimat e STE t-s tat p-valu e Segment 1 215 -1.36*** 0.06 -2 2.75 <0.0001 Segment 2 23 -1.1 9*** 0.19 -6.32 <0.0001 Segment 3 89 -1.3 1*** 0.09 -1 4.02 <0.0001 Segment 4 963 -1.35*** 0.03 -4 9.76 <0.0001 Segment 5 153 -1.33*** 0.07 -1 8.72 <0.0001 Segment 6 466 -1.26*** 0.04 -3 3.82 <0.0001 Segment 7 90 -1.1 9*** 0.09 -1 2.56 <0.0001 Segment 8 992 -1.27*** 0.03 -4 5.81 <0.0001 Segment 9 535 -1.33*** 0.04 -3 5.85 <0.0001 Segment 10 284 -1.38*** 0.05 -2 5.37 <0.0001 Segment 11 42 -1.6 7*** 0.13 -1 3.34 <0.0001 Segment 12 83 -1.3 5*** 0.09 -1 4.82 <0.0001 Segment 13 12 -1.1 3*** 0.23 -4.81 <0.001 Segment 14 52 -1.1 6*** 0.11 -1 0.74 <0.0001 Segment 15 113 -1.21*** 0.08 -1 5.86 <0.0001 Segment 16 466 -1.35*** 0.04 -3 2.58 <0.0001 Segment 17 522 -1.35*** 0.04 -3 8.33 <0.0001 Segment 18 589 -1.32*** 0.04 -3 7.02 <0.0001 Segment 19 310 -1.38*** 0.05 -2 7.72 <0.0001 Segment 20 395 -1.28*** 0.04 -3 0.05 <0.0001 Segment 21 403 -1.22*** 0.04 -2 9.13 <0.0001 Segment 22 530 -1.30*** 0.04 -3 2.94 <0.0001 Segment 23 49 -1.4 2*** 0.12 -1 2.18 <0.0001 Segment 24 481 -1.36*** 0.04 -3 4.85 <0.0001 Segment 25 142 -1.32*** 0.07 -1 7.62 <0.0001 Segment 26 871 -1.28*** 0.03 -4 5.48 <0.0001 Segment 27 242 -1.25*** 0.06 -2 2.10 <0.0001 Segment 28 569 -1.30*** 0.04 -3 5.25 <0.0001 Segment 29 62 -1.4 8*** 0.11 -1 2.93 <0.0001 Segment 30 83 -1.4 6*** 0.09 -1 6.57 <0.0001 Segment 31 102 -1.36*** 0.09 -1 4.97 <0.0001 Segment 32 7 -0.87* 0.38 -2.30 0.027437 Segment 33 64 -1.4 3*** 0.13 -1 1.08 <0.0001 Segment 34 163 -1.30*** 0.07 -1 9.08 <0.0001 Segment 35 183 -1.28*** 0.06 -2 1.78 <0.0001 Segment 36 198 -1.27*** 0.06 -2 0.19 <0.0001 Segment 37 346 -1.31*** 0.05 -2 8.41 <0.0001 Segment 38 971 -1.32*** 0.03 -4 7.30 <0.0001 Segment 39 65 -1.3 4*** 0.10 -1 3.31 <0.0001 Segment 40 14 -1.3 4*** 0.19 -7.05 <0.0001 Segment 41 73 -1.3 3*** 0.10 -1 3.87 <0.0001 Segment 42 73 -1.2 9*** 0.08 -1 5.56 <0.0001 Segment 43 93 -1.2 8*** 0.09 -1 3.94 <0.0001 Segment 44 41 -1.4 1*** 0.13 -1 0.50 <0.0001 Segment 45 19 -1.3 3*** 0.15 -8.87 <0.0001 Segment 46 5 -0.52* 0.23 -2.26 0.036761 Segment 47 3 -0.52 0.41 -1.28 0.14536 2 * p<0.1, ** p<0.05, ***p <0.001 204 Table 2.D.5. The l oser r egret i β estimates across bidder’s segments Bidders Segm ent Segment Siz e Estimat e STE t -stat p-value Segment 1 215 -1.34*** 0.06 -22.1 4 <0. 0001 Segment 2 23 -1.11 *** 0.19 -5.69 <0.0001 Segment 3 89 -1.39 *** 0.09 -15.5 3 <0.0 001 Segment 4 963 -1.33*** 0.03 -47.2 2 <0. 0001 Segment 5 153 -1.37*** 0.07 -20.3 3 <0. 0001 Segment 6 466 -1.33*** 0.04 -32.7 6 <0. 0001 Segment 7 90 -1.15 *** 0.09 -12.2 2 <0.0 001 Segment 8 992 -1.35*** 0.03 -48.6 9 <0. 0001 Segment 9 535 -1.31*** 0.04 -35.5 5 <0. 0001 Segment 10 284 -1.48*** 0.05 -29.1 8 <0. 0001 Segment 11 42 -1.55 *** 0.13 -12.0 3 <0.0 001 Segment 12 83 -1.45 *** 0.09 -15.8 6 <0.0 001 Segment 13 12 -1.46 *** 0.31 -4.73 <0.001 Segment 14 52 -1.29 *** 0.11 -11.2 7 <0.0 001 Segment 15 113 -1.51*** 0.08 -18.9 2 <0. 0001 Segment 16 466 -1.26*** 0.04 -34.6 1 <0. 0001 Segment 17 522 -1.33*** 0.04 -36.5 2 <0. 0001 Segment 18 589 -1.28*** 0.04 -34.9 1 <0. 0001 Segment 19 310 -1.32*** 0.05 -26.6 3 <0. 0001 Segment 20 395 -1.36*** 0.05 -29.0 5 <0. 0001 Segment 21 403 -1.34*** 0.05 -29.2 4 <0. 0001 Segment 22 530 -1.24*** 0.04 -33.6 8 <0. 0001 Segment 23 49 -1.47 *** 0.14 -10.8 8 <0.0 001 Segment 24 481 -1.36*** 0.04 -34.6 5 <0. 0001 Segment 25 142 -1.34*** 0.07 -19.3 6 <0. 0001 Segment 26 871 -1.36*** 0.03 -45.3 4 <0. 0001 Segment 27 242 -1.41*** 0.05 -26.4 7 <0. 0001 Segment 28 569 -1.29*** 0.04 -36.4 3 <0. 0001 Segment 29 62 -1.30 *** 0.11 -11.9 6 <0.0 001 Segment 30 83 -1.31 *** 0.11 -11.7 1 <0.0 001 Segment 31 102 -1.33*** 0.08 -16.9 3 <0. 0001 Segment 32 7 -1.70*** 0 .27 -6.20 <0.00 1 Segment 33 64 -1.51 *** 0.10 -14.7 7 <0.0 001 Segment 34 163 -1.38*** 0.07 -18.5 5 <0. 0001 Segment 35 183 -1.21*** 0.06 -19.2 9 <0. 0001 Segment 36 198 -1.36*** 0.06 -22.3 3 <0. 0001 Segment 37 346 -1.33*** 0.04 -30.2 4 <0. 0001 Segment 38 971 -1.35*** 0.03 -47.3 9 <0. 0001 Segment 39 65 -1.32 *** 0.11 -12.5 3 <0.0 001 Segment 40 14 -1.27 *** 0.26 -4.92 <0.001 Segment 41 73 -1.30 *** 0.10 -13.5 7 <0.0 001 Segment 42 73 -1.40 *** 0.10 -14.1 6 <0.0 001 Segment 43 93 -1.32 *** 0.10 -13.3 9 <0.0 001 Segment 44 41 -1.51 *** 0.11 -13.9 2 <0.0 001 Segment 45 19 -1.30 *** 0.18 -7.41 <0.0001 Segment 46 5 -0.79 0 .49 -1.63 0.0 82383 Segment 47 3 -1.23** 0.32 -3.85 <0.05 * p<0.1, ** p<0.05, ***p <0.001 205 Table 2.D.6. The upd ate of valu ation paramet ers i δ and learning par ameter i ρ estimates across bidder’s segments Bidders Segment Segment Size Valuation revelation i δ STE ( i δ ) Learning i ρ STE ( i ρ ) Segment 1 215 1.21 *** 0.05 0 .15** 0.06 Segment 2 23 0 .79*** 0 .09 0.21 0.15 Segment 3 89 1 .23*** 0 .08 0.25 *** 0.1 0 Segment 4 963 1.22 *** 0.03 0 .26*** 0.03 Segment 5 153 1.30 *** 0.07 0 .16** 0.07 Segment 6 466 1.20 *** 0.04 0 .25*** 0.04 Segment 7 90 1 .25*** 0 .08 0.28 *** 0.0 9 Segment 8 992 1.25 *** 0.02 0 .27*** 0.03 Segment 9 535 1.25 *** 0.03 0 .18*** 0.04 Segment 10 284 1.22 *** 0.05 0 .28*** 0.05 Segment 11 42 1 .33*** 0 .11 0.27 ** 0.13 Segment 12 83 1 .33*** 0 .08 0.07 0.09 Segment 13 12 1 .37*** 0 .23 0.48 * 0.26 Segment 14 52 1 .15*** 0 .10 0.39 *** 0.1 2 Segment 15 113 1.22 *** 0.07 0 .32*** 0.08 Segment 16 466 1.22 *** 0.04 0 .24*** 0.04 Segment 17 522 1.20 *** 0.03 0 .28*** 0.04 Segment 18 589 1.26 *** 0.03 0 .27*** 0.04 Segment 19 310 1.22 *** 0.04 0 .23*** 0.05 Segment 20 395 1.20 *** 0.04 0 .24*** 0.04 Segm ent 21 403 1.24 *** 0.04 0 .26*** 0.04 Segment 22 530 1.20 *** 0.03 0 .16*** 0.04 Segment 23 49 1 .26*** 0 .12 0.21 * 0.12 Segment 24 481 1.21 *** 0.03 0 .23*** 0.04 Segment 25 142 1.29 *** 0.06 0 .12* 0.06 Segment 26 871 1.27 *** 0.03 0 .24*** 0.03 Segment 27 242 1.18 *** 0.04 0 .40*** 0.06 Segment 28 569 1.23 *** 0.03 0 .29*** 0.04 Segment 29 62 1 .33*** 0 .09 0.05 0.14 Segment 30 83 1 .40*** 0 .09 0.29 *** 0.1 0 Segment 31 102 1.13 *** 0.08 0 .39*** 0.09 Segment 32 7 1 .04*** 0 .09 0.19 0.20 Segment 33 64 1 .25*** 0 .10 0.25 ** 0.10 Segment 34 163 1.21 *** 0.06 0 .33*** 0.07 Segment 35 183 1.27 *** 0.06 0 .26*** 0.07 Segment 36 198 1.24 *** 0.06 0 .14** 0.06 Segment 37 346 1.22 *** 0.04 0 .25*** 0.05 Segment 38 971 1.19 *** 0.02 0 .25*** 0.03 Segment 39 65 1 .13*** 0 .09 0.28 *** 0.0 9 Segment 40 14 1 .28*** 0 .24 0.50 * 0.25 Segment 41 73 1 .25*** 0 .08 0.30 *** 0.1 0 Segment 42 73 1 .17*** 0 .09 0.39 *** 0.1 0 Segment 43 93 1 .20*** 0 .07 0.25 *** 0.0 9 Segment 44 41 1 .10*** 0 .12 0.33 ** 0.14 Segment 45 19 1 .42*** 0 .17 0.28 * 0.15 Segmen t 46 5 1 .07*** 0 .24 0.81 *** 0.1 8 Segment 47 3 1 .17** 0.27 -0. 08 0.26 * p<0.1, ** p<0.05, ***p<0.001 206 Table 2.D.7. The winner regret i α and the loser regret i β estimates across auction categories Auction Category number of bidde rs winner regret STE (winner regret) t-stat (winner regret) p-value (winner regret) Loser regret STE (Loser regret ) t-stat (Loser regret) p-value (Loser regret) Jewelry and Watches 1550 -1.32*** 0.02 -60.59 <0.0001 -1.32*** 0. 02 -60.33 <0.0001 Collectibles 859 -1.35*** 0.03 -46.96 <0.0001 -1.38*** 0. 03 -47.07 <0.0001 Clothing, Shoes and Accessories 453 -1.24*** 0.04 -29.94 <0.0001 -1.40*** 0. 04 -35.06 <0.0001 Crafts 558 -1.33*** 0.04 -37.27 <0.0001 -1.33*** 0. 04 -34.27 <0.0001 Pottery and Glass 607 -1.38*** 0.04 -38.98 <0.0001 -1.35*** 0. 04 -38.37 <0.0001 Antiques 546 -1.28*** 0.04 -33.61 <0.0001 -1.30*** 0. 04 -33.74 <0.0001 Toys and Hobbies 744 -1.27*** 0.03 -40.54 <0.0001 -1.35*** 0. 03 -41.96 <0.0001 Stamps 65 1 -1.38*** 0.03 -41.66 <0.0001 -1.32*** 0. 03 -41.22 <0.0001 Books 48 2 -1.37*** 0.04 -35.47 <0.0001 -1.31*** 0. 04 -33.14 <0.0001 Tickets and Exper iences 489 -1.26*** 0.04 -32.47 <0.0001 -1.28*** 0. 04 -34.57 <0.0001 Art 456 -1.25*** 0.04 -31.10 <0.0001 -1.28*** 0. 04 -31.00 <0.0001 Gift Cards and Coupons 522 -1.32*** 0.04 -32.59 <0.0001 -1.30*** 0. 04 -31.89 <0.0001 Music 585 -1.32*** 0.04 -36.80 <0.0001 -1.34*** 0. 04 -35.76 <0.0001 Consumer Electronic s 734 -1.30*** 0.03 -42.21 <0.0001 -1.36*** 0. 03 -42.92 <0.0001 DVDs and Movies 6 02 -1.28*** 0.04 -35.48 <0.0001 -1.37*** 0. 03 -39.65 <0.0001 Dolls and Bears 679 -1.29*** 0.03 -40.39 <0.0001 -1.37*** 0. 03 -41.79 <0.0001 Entertainment Memorabilia 506 -1.37*** 0.04 -36.04 <0.0001 -1.31*** 0. 04 -34.57 <0.0001 Health and Beauty 541 -1.31*** 0.04 -35.79 <0.0001 -1.36*** 0. 04 -35.91 <0.0001 Video Games and Consoles 682 -1.29*** 0.03 -38.58 <0.0001 -1.32*** 0. 03 -39.19 <0.0001 * p<0.1, **p<0.05 , ***p<0.01 207 Table 2.D.8. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments Bidders segment number of bidders Learning value from bid parameter ( i ρ ) STE ( i ρ ) Valuation revelation parameter ( i δ ) STE ( i δ ) Jewelry and Watches 1550 0.25*** 0.02 1.22*** 0.02 Collectibles 859 0.26*** 0.03 1.25*** 0.03 Clothing, Shoes and Accessories 453 0.20*** 0.04 1.23*** 0.04 Crafts 558 0.2 5*** 0.04 1.21*** 0.03 Pottery and Glass 607 0.2 6*** 0.03 1.26*** 0.03 Antiques 546 0.24*** 0 .04 1.20*** 0.03 Toys and Hobbies 744 0.2 5*** 0.03 1.24*** 0.03 Stamps 651 0.25*** 0.03 1.24*** 0.03 Books 482 0.22*** 0.04 1.25*** 0.03 Tickets and Experiences 489 0.22*** 0.04 1.22*** 0.03 Art 456 0.18*** 0.04 1.19*** 0.04 Gift Cards and Coupons 522 0.29*** 0.04 1.26*** 0.03 Music 585 0.23*** 0.03 1.19*** 0.03 Consumer Electronics 734 0.25*** 0.03 1.23*** 0.03 DVDs and Movies 602 0.29*** 0.04 1.23*** 0.03 Dolls and Bears 679 0.3 2*** 0.03 1.28*** 0.03 Entertainment Memorabilia 506 0 .24*** 0.04 1.17*** 0.03 Health and Beauty 541 0.24 *** 0.04 1.24*** 0.03 Video Games and Consoles 682 0 .24*** 0.03 1.19*** 0.03 * p<0.1, **p<0.05 , ***p<0.01 208 Table 2.D.9. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments Auction Cluster Auction Cluster Size growth of bids ( i τ ) STE ( i τ ) Drift of bids ( i γ ) STE ( i γ ) Last minute flood ( j η ) STE ( j η ) Mean entrance rate ( j ι ) STE ( j ι ) Jewelry and Watches 150 1.65*** 0. 07 5.44*** 0.08 1.01*** 0. 06 1.84*** 0.08 Collectibles 104 1.63*** 0.09 5.63** * 0.10 0.93*** 0.06 1.90*** 0.09 Clothing, Shoes and Accessories 85 1.86*** 0. 09 5.56*** 0.10 0.93*** 0. 07 2.11*** 0.08 Crafts 79 1.81*** 0.11 5.48*** 0.10 0.97*** 0.08 2.03*** 0.11 Pottery and Glass 75 1.50*** 0.10 5.47*** 0.10 1.10*** 0.10 2.16** * 0.11 Antiques 69 1 .68*** 0.12 5.60*** 0.11 0.95*** 0 .11 1.89*** 0 .13 Toys and Hobbies 94 1.73*** 0.10 5.61*** 0.09 0.90*** 0.09 2 .21*** 0.10 Stamps 73 2.00*** 0.12 5.64*** 0.12 1.04*** 0.12 2.08*** 0.13 Books 85 1.83*** 0.12 5.50*** 0.11 1.13*** 0.12 2 .12*** 0.12 Tickets and Experiences 92 1.74*** 0.13 5.58*** 0.12 0.95*** 0.12 2.10*** 0.13 Art 71 1.79*** 0.16 5.52*** 0.12 1.05*** 0.16 2.04*** 0.17 Gift Cards and Coupons 86 1.77*** 0.15 5.36*** 0.12 1.06*** 0.15 2.14*** 0.15 Music 87 1.86*** 0.16 5.65*** 0.12 1.15*** 0.16 2 .03*** 0.16 Consumer Electronics 84 1.76*** 0.17 5.37*** 0.15 1.13*** 0.17 2.13*** 0.17 DVDs and Movies 88 1.82*** 0.17 5.75*** 0.14 1.11*** 0.17 2.26*** 0.18 Dolls and Bears 85 1 .66*** 0.19 5.81*** 0.16 1.26*** 0. 19 2.08*** 0.19 Entertainment Memorabilia 89 1 .86*** 0.19 5.63*** 0.15 1.12*** 0 .19 2.06*** 0 .20 Health and Beauty 75 1 .84*** 0.24 5.77*** 0.20 1.15*** 0. 24 2.26*** 0.24 Video Games and consoles 94 1 .86*** 0.20 5.65*** 0.17 1.04*** 0. 20 2.21*** 0.20 * p<0.1, **p<0.05 , ***p<0.01 209 F igure 2.D.1. Th e probabilisti c gra phical plate model of the main model 2.D.2.1. ESTIMATION RESULTS OF THE MODEL WITH LDA-ES TIMATED AUCTION CLUSTERS Table 2.D.10. Summar y statisti cs for the bidder specific parameter estimations within each auction category (19) within each bidder segment (47) Parameter min max Mean SD min max Mean SD avg. winner regret - 1.37 -1.27 -1.33 0.03 -1.9 -1.09 -1.35 0.13 se winner regret 0.02 0.05 0.04 0.01 0.03 0.49 0.11 0.10 avg. loser regret - 1.38 -1.25 -1.32 0.03 -1.98 -1.08 -1.35 0.17 se loser regret 0.03 0.05 0.04 0.005 0.03 0.5 0.11 0.10 avg. valuation param. 1.2 1.3 1.26 0.03 0.85 1.89 1.27 0.14 se valuation param. 0.02 0.04 0.04 0.005 0.03 0.54 0.10 0.10 avg. learning param. 0 .16 0.33 0.25 0.04 0 1.33 0.27 0.19 se learning param. 0.02 0.04 0.03 0.01 0.03 0.78 0.12 0.14 J F f β ' α z Z I V α λ * c η it ϑ jt θ T jt b jt n jt τ ψ jt m jt m ' jt C ' jt C K 210 Table 2.D.11. Relation between the winner re gre t i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments winner regret loser regret valuation revelation learning winner regret 1 loser regret 0.25 1 valuation revelation -0.14 0.45 1 learning -0.38 0.03 0.61 1 Table 2.D.12. Relation between the winner re gre t i α , the loser regret i β , the update of valuation parameters i δ and learning parameter i ρ estimates across forty seven bidder segments Regressand Re gre ssor Estimate SE Lower 95% Upper 95% W inn er regret Intercept -1.10** 0.15 -1.40 -0.80 loser regret 0 .18* 0.11 -0.03 -0.41 W inn er Regret Intercept -1.28** 0.03 -1.35 -1.22 Learning -0.25** 0.09 -0.45 -0.07 Loser Regre t Intercept -2.03** 0.20 -2.44 1.62 Valuation revelation 0.53** 0.16 0.21 0.86 ** Two tail 0.95% con fidence inte rval signific ance * One tail 0.95% confi dence int erval significan ce 211 Table 2.D.13. Explaining winner regret i α , the loser regret i β , the update of valuation parameters i δ and the learning parameter i ρ estimates across 47 bidder segments Regressand Regressor Estimate SE Lower 95% Upper 95% W inn er Regret ) 26 . 0 ( 2 = − R Adjusted Intercept -1.3535* 0.0163 -1.3864 -1.3206 Segment Size 0.0000 0.0001 -0.0002 0.0001 Bidders Feedback mean - 0.0001 0.0001 -0.0004 0.0002 Number of Bids on This item -0.0005 0.0026 -0.0057 0.0046 total number of bids in 30 days -0.0075* 0.0035 -0.0145 -0.0004 Number of items bid on in 30 days 0.0000 0 .0001 -0.0002 0.0002 Bid activity with current Seller -0.0016 0.0011 -0.0038 0.0006 Number of categories Bid on Mean -0.0396 0.0439 -0.1284 0.0491 Loser Regret ) 26 . 0 ( 2 = − R Adjusted Intercept -1.3509* 0.0221 -1.3956 -1.3061 Segment Size 0.0000 0.0001 -0.0002 0.0002 Bidders Feedback mean 0.0000 0.0002 -0.0004 0.0004 Number of Bids on This item 0.0037 0.0035 -0.0033 0.0107 total number of bids in 30 days -0.0122* 0.0047 -0.0218 -0.0026 Number of items bid on in 30 days 0.0002 0 .0001 -0.0001 0.0004 Bid activity with current Seller -0.0018 0.0015 -0.0048 0.0011 Number of categories Bid on Mean -0.1336* 0.0597 -0.2544 -0.0129 Learning value from bids ) 30 . 0 ( 2 = − R Adj ust ed Intercept 1 .2688* 0.0 160 1.2364 1.3013 Segment Size 0.0000 0.0001 -0.0002 0.0001 Bidders Feedback mean 0.0007* 0.0001 0.0004 0.0010 Number of Bids on This item 0.0038 0.0025 -0.0013 0.0089 total number of bids in 30 days -0.0108* 0.0034 -0.0178 -0.0039 Number of items bid on in 30 days 0.0002 0 .0001 -0.0000 0.0003 Bid activity with current Seller -0.0002 0.0011 -0.0024 0.0019 Number of categories Bid on Mean -0.0105 0.0433 -0.0980 0.0770 Valuation update ) 76 . 0 ( 2 = − R Adjusted Intercept 0 .2707* 0.0 166 0.0000 0.2372 Segment Size 0.0001 0.0001 0.2501 -0.0001 Bidders Feedback mean 0.0013* 0.0001 0.0000 0.0010 Number of Bids on This item 0.0032 0.0026 0.2314 -0.0021 total number of bids in 30 days 0.0021 0.0036 0.5510 -0.0051 Number of items bid on in 30 days 0.0000 0 .0001 0.8363 -0.0002 Bid activity with current Seller 0.0006 0.0011 0.5586 -0.0016 Number of categories Bid on Mean -0.0227 0.0447 0.6151 -0.1132 * Two tail 0.95% con fidence inter val significan ce 212 Table 2.D.14. Summar y statistics for the auction specific parameter estimations within each auction category (19) within each auction cluster (50) Parameter min max Mean SD min max Mean SD avg. growth of bids 1.61 2.03 1.79 0.12 1.67 9.87 3.20 1.53 se growth of bids 0. 07 0.24 0.15 0.05 0.04 7.33 1.46 1.38 avg. drift of bids 5.24 5.82 5.58 0.15 5.15 12.72 6.79 1.42 se drift of bids 0.08 0.2 0.13 0.03 0.05 6.81 1.28 1.28 avg. last minute flood 0.93 1.33 1.10 0.10 0.92 9.33 2.52 1.56 se last minute flood 0.06 0.24 0.14 0.05 0.04 7.43 1.49 1.40 avg. mean entrance rate 1.92 2 .31 2.08 0.13 1.39 10.42 3.43 1.58 se mean entrance rate 0.08 0.25 0.15 0.04 0.05 7.23 1.45 1.37 213 Table 2.D.15. Counterfactual analysis of shutting down only winner and both winner/loser regret Auction Categor y Number of Auctions Average improvement of shutting down winner regret Average improvement of shutting down both winner and loser Jewelry and Watches 149 32% 29% Collectibles 103 2 6% 23% Clothing, Shoes and Accessories 8 4 21% 39% Crafts 78 50% 42% Pottery and Glass 74 18% 28% Antiques 68 45% 49% Toys and Hobbies 93 31% 30% Stamps 72 52% 28% Books 84 50% 42% Tickets and Experiences 91 22% 6% Art 70 31% 28% Gift Cards and Coupons 85 40% 21% Music 86 32% 39% Consumer Electronics 83 34% 35% DVDs and Movies 87 47% 47% Dolls and Bears 84 29% 29% Entertainment Memorabilia 88 25% 6% Health and Beauty 74 38% 19 % Video Games and Consoles 93 21% 21% Total improvement 26% 23% Average improvement across all auctions 3 4% 29% 214 Table 2.D.16. Aucti on’s Cluster profile Auction Cluster index Cluster size number of bidders mean STD (number of bidders) number of bids STD (number of bids) mean duration (Days) STD (duration in Days) Dominant Auction Categories 1 14 10 4 44 11 4 2 Jewelry, collectible 2 12 9 5 44 1 5 5 2 Clothing, antique 3 46 9 5 42 1 4 5 2 Jewelry, collectible, pottery 4 13 10 4 45 16 5 2 Stamps and books 5 14 10 5 44 18 5 2 Jewelry 6 9 6 4 25 1 9 5 1 Jewelry 7 454 9 4 63 19 5 2 Toys and hobbies 8 1 9 8 3 33 9 5 2 Collectible 9 10 9 5 41 1 9 5 1 Stamps 10 2 7 8 3 44 18 5 2 DVD and enterta in ment 11 5 1 0 1 0 1 0 Pottery 12 214 13 5 44 15 4 2 Video Games and Consoles 13 3 2 10 5 47 16 5 2 Pottery 14 1 4 8 3 48 20 5 2 Dolls and Bears 15 39 8 3 44 17 5 2 Healthy and beauty 16 1 1 6 4 50 11 6 1 Jewelry 17 4 8 12 6 50 19 5 2 Music and DVDs 18 1 5 8 4 38 16 5 2 Video Games and Consoles 19 3 7 9 4 48 15 5 1 Clothing 20 2 2 8 5 47 21 5 2 Stamps, e ntertainment, j ewelry 21 29 10 4 50 18 5 1 Entertainment and music 22 2 4 8 4 40 19 5 2 Clothing and jewelry 23 2 9 10 4 55 18 4 2 DVD, consumer electronics 24 12 7 3 36 9 5 2 Toy, music, gift 25 19 10 3 47 13 5 1 Clothing and gift 26 2 4 10 4 44 15 5 2 Crafts 27 3 8 10 5 51 20 5 2 Art 28 19 9 4 41 13 5 1 Dolls a nd B ears 29 3 0 9 4 43 18 5 2 Pottery 30 2 3 8 4 38 14 5 2 Craft and book 31 16 7 3 40 14 5 1 Antique 32 8 8 4 43 17 5 2 Craft, toys, book 33 1 3 7 4 32 13 5 2 Helath and beauty 34 38 9 4 43 14 5 1 Art 35 2 0 7 3 34 12 5 2 Clothing and art 36 2 3 8 3 42 10 5 2 Stamps 37 8 10 5 36 13 4 2 Antique, collectible 38 30 8 3 44 20 5 1 Gift Cards and Coupons 39 1 0 10 3 38 12 4 2 Video game, dolls 40 7 11 6 44 12 5 2 Video game, g ift card , pottery 41 17 7 2 50 16 5 2 Clothing 42 16 10 4 45 15 5 2 Pottery, do lls 43 1 6 8 3 43 16 5 2 Stamps, video games 44 20 9 5 42 19 4 2 Stamps, toys 45 17 8 4 48 18 5 2 Books entertainment 46 1 0 11 6 43 12 5 2 Gift cards 47 1 7 9 5 44 12 5 2 Books 48 42 9 5 42 19 5 2 Consumer electronics 49 1 1 11 4 45 16 4 2 Ticket experience 50 5 8 1 48 7 6 0 Books, music, DVD 215 Table 2.D.17. Maximum A Posteriori of the model Element of the maximum a posteriori model selection criteria Log Likelihood Number of bidders evoluti on state space model - 2,055,997 Bid evolution with each auction state space model -96,099,682 Valuation evolution state space model -991,098 prior on the auctions parameters -17,261 prior on the bidders parameters -11 2,190 216 Table 2.D.18. The win ner regret i α estimat es across bi dder’s se gments Bidders Segm ent Seg ment Siz e Estim ate STE t -stat p-val ue Segment 1 215 -1.30 ** * 0.07 -19. 28 <0.0001 Segment 2 23 -1.6 8 *** 0.18 -9.19 < 0.0001 Segment 3 89 -1.2 8 *** 0.10 -13. 02 <0.0001 Segment 4 963 -1.29 ** * 0.03 -42. 10 <0.0001 Segment 5 153 -1.36 ** * 0.09 -15. 53 <0.0001 Segment 6 466 -1.31 ** * 0.05 -26. 69 <0.0001 Segment 7 90 -1.2 1 *** 0.11 -11. 38 <0.0001 Segment 8 992 -1.36 ** * 0.03 -45. 07 <0.0001 Segment 9 535 -1.24 ** * 0.04 -30. 68 <0.0001 Segment 10 284 -1.23 ** * 0.06 -21. 24 <0.0001 Segment 11 42 -1.1 1 *** 0.14 -7.87 < 0.0001 Segment 12 83 -1.3 3 *** 0.10 -13. 70 <0.0001 Segment 13 12 -1.5 5 *** 0.31 -4.99 < 0.001 Segment 14 52 -1.4 1 *** 0.14 -10. 04 <0.0001 Segment 15 113 -1.53 ** * 0.09 -16. 17 <0.0001 Segment 16 466 -1.43 ** * 0.05 -31. 14 <0.0001 Segment 17 522 -1.21 ** * 0.05 -25. 80 <0.0001 Segment 18 589 -1.28 ** * 0.04 -32. 03 <0.0001 Segment 19 310 -1.32 ** * 0.06 -23. 60 <0.0001 Segment 20 395 -1.32 ** * 0.05 -26. 16 <0.0001 Segment 21 403 -1.34 ** * 0.05 -27. 61 <0.0001 Segment 22 530 -1.36 ** * 0.04 -33. 75 <0.0001 Segment 23 49 -1.3 2 *** 0.11 -11. 76 <0.0001 Segment 24 481 -1.33 ** * 0.04 -30. 30 <0.0001 Segment 25 142 -1.35 ** * 0.09 -15. 08 <0.0001 Segment 26 871 -1.35 ** * 0.03 -40. 01 <0.0001 Segment 27 242 -1.32 ** * 0.06 -20. 62 <0.0001 Segment 28 569 -1.36 ** * 0.04 -33. 70 <0.0001 Segment 29 62 -1.3 3 *** 0.14 -9.79 < 0.0001 Segment 30 83 -1.5 4 *** 0.10 -15. 96 <0.0001 Segment 31 102 -1.33 ** * 0.10 -12. 96 <0.0001 Segment 32 7 -1.37 ** 0. 40 -3.41 <0 .05 Segment 33 64 -1.2 9 *** 0.11 -12. 25 <0.0001 Segment 34 163 -1.36 ** * 0.07 -18. 81 <0.0001 Segment 35 183 -1.37 ** * 0.08 -17. 58 <0.0001 Segment 36 198 -1.34 ** * 0.06 -21. 06 <0.0001 Segment 37 346 -1.33 ** * 0.05 -26. 05 <0.0001 Segment 38 971 -1.33 ** * 0.03 -42. 88 <0.0001 Segment 39 65 -1.0 9 *** 0.12 -8.97 < 0.0001 Segment 40 14 -1.3 0 *** 0.28 -4.64 < 0.001 Segment 41 73 -1.4 1 *** 0.12 -11. 29 <0.0001 Segment 42 73 -1.4 4 *** 0.11 -12. 63 <0.0001 Segment 43 93 -1.3 2 *** 0.10 -13. 58 <0.0001 Segment 44 41 -1.3 6 *** 0.13 -10. 59 <0.0001 Segment 45 19 -1.2 4 *** 0.18 -6.97 < 0.0001 Segment 46 5 -1.45 ** 0. 40 -3.60 <0 .05 Segment 47 3 -1.90 ** 0. 49 -3.88 <0 .05 * p<0.1, **p<0.05, ***p<0.01 217 Table 2.D.19. The lo ser regret i β estimates across bidder’ s segments Bidders Segm ent Se gm ent Size Estim ate STE t-stat p-v alue Segment 1 215 -1.2 9 *** 0.07 -18 .31 <0. 0001 Segment 2 23 -1.15 *** 0.28 -4.1 2 <0. 001 Segment 3 89 -1.21 *** 0.09 -13.1 1 <0. 0001 Segment 4 963 -1.3 6 *** 0.03 -43 .41 <0. 0001 Segment 5 153 -1.3 2 *** 0.08 -16 .87 <0. 0001 Segment 6 466 -1.3 3 *** 0.05 -29 .32 <0. 0001 Segment 7 90 -1.27 *** 0.10 -12.4 6 <0. 0001 Segment 8 992 -1.3 4 *** 0.03 -43 .52 <0. 0001 Segment 9 535 -1.2 6 *** 0.04 -29 .25 <0. 0001 Segment 10 284 -1.2 9 *** 0.06 -21 .81 <0. 0001 Segment 11 42 -1.14 *** 0.14 -8.3 5 <0. 0001 Segment 12 83 -1.38 *** 0.11 -12.9 8 <0. 0001 Segment 13 12 -1.37 *** 0.32 -4.3 0 <0. 01 Segment 14 52 -1.50 *** 0.13 -11.5 2 <0. 0001 Segment 15 113 -1.4 0 *** 0.10 -14 .72 <0. 0001 Segment 16 466 -1.2 9 *** 0.05 -28 .33 <0. 0001 Segment 17 522 -1.3 9 *** 0.04 -31 .98 <0. 0001 Segment 18 589 -1.3 9 *** 0.04 -34 .08 <0. 0001 Segment 19 310 -1.2 5 *** 0.06 -22 .04 <0. 0001 Segment 20 395 -1.3 8 *** 0.05 -30 .21 <0. 0001 Segment 21 403 -1.2 5 *** 0.05 -27 .67 <0. 0001 Segment 22 530 -1.2 7 *** 0.04 -30 .60 <0. 0001 Segment 23 49 -1.19 *** 0.15 -8.1 3 <0. 0001 Segment 24 481 -1.3 4 *** 0.04 -33 .17 <0. 0001 Segment 25 142 -1.3 4 *** 0.09 -15 .08 <0. 0001 Segment 26 871 -1.3 4 *** 0.03 -39 .38 <0. 0001 Segment 27 242 -1.2 5 *** 0.06 -20 .25 <0. 0001 Segment 28 569 -1.2 9 *** 0.04 -30 .67 <0. 0001 Segment 29 62 -1.55 *** 0.11 -14.0 3 <0. 0001 Segment 30 83 -1.40 *** 0.12 -11.8 1 <0. 0001 Segment 31 102 -1.3 2 *** 0.09 -14 .93 <0. 0001 Segment 32 7 -1.98 *** 0.50 -3.93 <0.0 1 Segment 33 64 -1.29 *** 0.13 -10.2 1 <0. 0001 Segment 34 163 -1.2 7 *** 0.08 -16 .42 <0. 0001 Segment 35 183 -1.3 2 *** 0.07 -17 .79 <0. 0001 Segment 36 198 -1.3 5 *** 0.07 -18 .79 <0. 0001 Segment 37 346 -1.3 0 *** 0.05 -23 .93 <0. 0001 Segment 38 971 -1.3 5 *** 0.03 -44 .24 <0. 0001 Segment 39 65 -1.18 *** 0.11 -10.5 4 <0. 0001 Segment 40 14 -1.98 *** 0.24 -8.3 4 <0. 0001 Segment 41 73 -1.29 *** 0.13 -10.0 3 <0. 0001 Segment 42 73 -1.30 *** 0.09 -14.5 7 <0. 0001 Segment 43 93 -1.50 *** 0.10 -15.0 9 <0. 0001 Segment 44 41 -1.08 *** 0.14 -7.8 6 <0. 0001 Segment 45 19 -1.42 *** 0.18 -7.7 7 <0. 0001 Segment 46 5 -1.33 *** 0.17 -7.59 <0.0 01 Segment 47 3 -1.73 ** 0.5 0 -3.46 <0.05 * p<0.1, **p<0.05, ***p<0.01 218 Table 2.D.20. The up date of val uation par ameters i δ and learn ing param eter i ρ estimates across bidder’s segments Bidders Segm ent Segment Siz e Val uation revelation i δ STE ( i δ ) L earning i ρ STE ( i ρ ) Segment 1 215 1.2 7*** 0.06 0 .28*** 0.07 Segment 2 23 1.53*** 0.19 0.53** 0.23 Segment 3 89 1.22*** 0.07 0.22** 0.11 Segment 4 963 1.2 2*** 0.03 0 .22*** 0.03 Segment 5 153 1.2 8*** 0.06 0 .11 0.08 Segment 6 466 1.2 6*** 0.04 0 .26*** 0.05 Segment 7 90 1.16*** 0.09 0.20* 0.11 Segment 8 992 1.2 9*** 0.03 0 .25*** 0.03 Segment 9 535 1.1 6*** 0.03 0 .20*** 0.04 Segment 10 284 1.2 9*** 0.05 0 .26*** 0.06 Segment 11 42 1.38*** 0.16 0.27* 0.15 Segment 12 83 1.35*** 0.08 0.15 0.10 Segment 13 12 1.21*** 0.22 0.38* 0.21 Segment 14 52 1.04*** 0.11 0.49*** 0 .14 Segment 15 113 1.2 3*** 0.07 0 .23** 0 .09 Segment 16 466 1.2 4*** 0.04 0 .33*** 0.05 Segment 17 522 1.2 9*** 0.04 0 .25*** 0.04 Segment 18 589 1.2 2*** 0.04 0 .26*** 0.04 Segment 19 310 1.2 6*** 0.05 0 .19*** 0.05 Segment 20 395 1.3 0*** 0.04 0 .29*** 0.05 Segment 21 403 1.2 3*** 0.04 0 .32*** 0.05 Segment 22 530 1.2 2*** 0.03 0 .29*** 0.04 Segment 23 49 1.31*** 0.12 0.26* 0.14 Segment 24 481 1.2 7*** 0.04 0 .19*** 0.04 Segment 25 142 1.2 2*** 0.06 0 .33*** 0.08 Segment 26 871 1.2 8*** 0.03 0 .26*** 0.03 Segment 27 242 1.2 3*** 0.06 0 .23*** 0.06 Segment 28 569 1.2 8*** 0.03 0 .29*** 0.04 Segment 29 62 1.07*** 0.09 0.00 0.12 Segment 30 83 1.40*** 0.09 0.27** 0.11 Segment 31 102 1.2 8*** 0.07 0 .22** 0 .09 Segment 32 7 0.85** 0.27 0 .20 0.47 Segment 33 64 1.25*** 0.10 0.14 0.13 Segment 34 163 1.1 5*** 0.07 0 .22*** 0.07 Segment 35 183 1.1 8*** 0.06 0 .11 0.07 Segment 36 198 1.3 3*** 0.06 0 .05 0.07 Segment 37 346 1.3 5*** 0.04 0 .27*** 0.05 Segment 38 971 1.2 4*** 0.03 0 .25*** 0.03 Segment 39 65 1.29*** 0.09 0.25** 0.12 Segment 40 14 1.24*** 0.17 0.32 0.30 Segment 41 73 1.27*** 0.10 0.27** 0.12 Segment 42 73 1.42*** 0.09 0.15 0.12 Segment 43 93 1.34*** 0.08 0.09 0.09 Segment 44 41 1.51*** 0.14 0.42** 0.17 Segment 45 19 1.13*** 0.19 0.05 0.18 Segment 46 5 1.89** 0.54 1 .33** 0 .49 Segment 47 3 1.20* 0.40 0.55 0.78 * p<0.1, **p<0.05, ***p<0.01 219 Table 2.D.21. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments Auction Cluster Auction Cluster Size growth of bids ( i τ ) STE ( i τ ) Drift of bids ( i γ ) STE ( i γ ) Last minute flood ( j η ) STE( j η ) Mean entrance rate ( j ι ) STE ( j ι ) 1 15 2.28*** 0.20 5.47*** 0.36 1.27*** 0.28 1.88*** 0.25 2 13 1.93*** 0.23 5.15*** 0.33 1.16*** 0.20 1.93*** 0.32 3 47 1.73*** 0.15 5.32*** 0.14 1.22*** 0.12 2.07*** 0.16 4 14 1.84*** 0.28 5.43*** 0.27 0.96*** 0.27 1.39*** 0.28 5 15 1.89*** 0.37 5.46*** 0.21 0.92*** 0.30 2.01*** 0.30 6 10 1.74*** 0.49 5.50*** 0.21 1.13* 0.53 2.75*** 0.37 7 455 1.67*** 0.04 5.54*** 0.05 0.96*** 0.04 2.03*** 0.05 8 20 1.89*** 0.40 5.29*** 0.28 1.23*** 0.37 2.13*** 0.37 9 11 2.25*** 0.67 5.78*** 0.43 1.35* 0.74 2.45*** 0.72 10 28 2.15*** 0.33 5.82*** 0.29 1.29*** 0.34 2.53*** 0.32 11 6 3.82** 1.34 6.34*** 0.91 2.48 1.57 3.82** 1.34 12 215 1.68*** 0.08 5.59*** 0.08 1.10*** 0.07 2.16*** 0.09 13 33 2.08*** 0.37 5.39*** 0.29 1.76*** 0.38 2.43*** 0.35 14 15 2.39** 0.84 6.49*** 0.57 2.22** 0.84 2.66*** 0.81 15 40 2.05*** 0.36 5.91*** 0.27 1.53*** 0.36 1.95*** 0.37 16 12 2.89** 1.16 6.33*** 0.90 2.08 1.22 3.18** 1.15 17 49 1.84*** 0.33 5.68*** 0.28 1.40*** 0.34 2.29*** 0.34 18 16 3.10*** 1.00 7.17*** 0.74 2.43** 1.04 2.87** 1.00 19 38 2.15*** 0.47 5.73*** 0.38 1.53*** 0.48 2.28*** 0.47 20 23 2.34*** 0.81 6.12*** 0.64 1.76** 0.83 2.64*** 0.80 21 30 2.43*** 0.66 6.05*** 0.54 1.62** 0.67 2.77*** 0.66 22 25 2.49*** 0.82 6.29*** 0.66 1.73** 0.84 2.82*** 0.80 23 30 2.36*** 0.72 6.09*** 0.61 1.51** 0.74 2.64*** 0.71 24 13 3.31* 1.67 6.70*** 1.40 2.78 1.71 3.48* 1.65 25 20 2.72** 1.15 6.31*** 0.98 2.43* 1.17 2.72** 1.16 26 25 2.77*** 0.96 6.20*** 0.83 2.03* 0.99 2.81*** 0.97 27 39 2.49*** 0.65 5.87*** 0.58 1.57** 0.67 2.55*** 0.65 28 20 3.06** 1.29 6.66*** 1.11 2.08 1.33 3.57** 1.28 29 31 2.49*** 0.89 6.24*** 0.76 2.06** 0.89 2.99*** 0.87 30 24 2.78** 1.17 6.60*** 1.01 2.23* 1.19 2.90** 1.18 31 17 3.54** 1.67 7.20*** 1.46 2.94 1.71 3.72** 1.66 32 9 5.32 3.16 8.36** 2.81 4.31 3.27 5.30 3.16 33 14 3.66 2.19 7.64*** 1.89 3.18 2.21 3.68 2.19 34 39 2.43*** 0.83 6.13*** 0.74 1.76** 0.85 2.82*** 0.82 220 35 21 3.29** 1.56 7.06*** 1.38 2.55 1.59 3.50** 1.54 36 24 2.97** 1.42 6.93*** 1.25 2.19 1.44 3.23** 1.41 37 9 5.70 3.70 8.99** 3.31 5.46 3.72 6.31 3.63 38 31 3.06** 1.16 6.85*** 1.04 2.28* 1.18 2.98** 1.16 39 11 5.00 3.25 8.59** 2.91 4.54 3.30 5.02 3.25 40 8 6.52 4.48 9.71** 4.06 5.84 4.57 6.91 4.43 41 18 3.85* 2.13 7.66*** 1.92 3.17 2.17 4.05* 2.12 42 17 4.05* 2.32 7.94*** 2.07 3.51 2.34 4.17* 2.30 43 17 3.97 2.38 7.41*** 2.17 3.35 2.41 4.47* 2.35 44 21 3.80* 1.97 7.38*** 1.80 3.03 2.00 4.39** 1.94 45 18 4.16* 2.34 7.79*** 2.13 3.56 2.38 4.30* 2.34 46 11 5.75 3.84 9.18** 3.52 5.05 3.91 6.19 3.81 47 18 4.04 2.46 8.12*** 2.23 3.31 2.50 4.68* 2.43 48 43 2.87** 1.07 6.28*** 0.99 1.90* 1.09 3.13*** 1.07 49 12 5.69 3.78 8.96** 3.49 5.13 3.83 5.59 3.79 50 6 9.87 7.33 12.72 6. 81 9.33 7.43 10.42 7.23 * p<0.1, **p<0.05 , ***p<0.01 221 Table 2.D.22. The winner regret i α and the loser regret i β estimates across auction categories Auction Category number of bidde rs winner regret STE (winner regret) t-stat (winner regret) p-value (winner regret) Loser regret STE (Loser regret ) t-stat (Loser regret) p-value (Loser regret) Jewelry and Watches 1550 - 1.30*** 0.02 - 52.40 <0.0001 - 1.31*** 0.03 -51.92 <0.0001 Collectibles 859 - 1.27*** 0.04 - 36.14 <0.0001 - 1.33*** 0.03 -39.61 <0.0001 Clothi ng, Shoes and Accessories 453 - 1.28*** 0.04 - 30.07 <0.0001 - 1.31*** 0.05 -28.91 <0.0001 Crafts 558 - 1.31*** 0.04 - 29.89 <0.0001 - 1.38*** 0.04 -34.35 <0.0001 Pottery and Glass 607 - 1.29*** 0.04 - 32.21 <0.0001 - 1.27*** 0.04 -32.27 <0.0001 Antiques 546 - 1.37*** 0.04 - 32.08 <0.0001 - 1.34*** 0.04 -33.99 <0.0001 Toys and Hobbies 744 - 1.31*** 0.03 - 38.17 <0.0001 - 1.32*** 0.04 -36.28 <0.0001 Stamps 65 1 - 1.36*** 0.04 - 36.26 <0.0001 - 1.32*** 0.04 -34.17 <0.0001 Books 48 2 - 1.32*** 0.04 - 29.72 <0.0001 - 1.25* ** 0.04 -29.83 <0.0001 Tickets and Exper iences 489 - 1.37*** 0.04 - 31.65 <0.0001 - 1.37*** 0.04 -30.73 <0.0001 Art 456 - 1.29*** 0.05 - 26.94 <0.0001 - 1.32*** 0.05 -28.36 <0.0001 Gift Cards and Coupons 522 - 1.34*** 0.04 - 29.96 <0.0001 - 1.34*** 0.04 -30.39 <0.0001 Music 585 - 1.36*** 0.04 - 34.28 <0.0001 - 1.35*** 0.04 -36.23 <0.0001 Consumer Electronic s 734 - 1.35*** 0.03 - 39.15 <0.0001 - 1.35*** 0.04 -35.97 <0.0001 DVDs and Movies 6 02 - 1.36*** 0.04 - 33.58 <0.0001 - 1.32*** 0.04 -34.18 <0.0001 Dolls and Bears 679 - 1.30*** 0.04 - 32.33 <0.0001 - 1.31*** 0.04 -35.22 <0.0001 Entertainment Memorabilia 506 - 1.34*** 0.04 - 32.42 <0.0001 - 1.36*** 0.04 -31.30 <0.0001 Health and Beauty 541 - 1.37*** 0.04 - 34.49 <0.0001 - 1.30*** 0.04 -30.04 <0.0001 Video Games and Consoles 682 - 1.35*** 0.04 - 37.11 <0.0001 - 1.32*** 0.04 -35.68 <0.0001 * p<0.1, **p<0.05 , ***p<0.01 222 Table 2.D.23. The update of valuation parameters i δ and learning parameter i ρ estimates across bidder’s segments Bidders segment number of bidders Learning value from bid parameter ( i ρ ) STE ( i ρ ) Valuation revelation parameter ( i δ ) STE ( i δ ) Jewelry and Watches 1550 1.24*** 0.02 0.25*** 0 .02 Collectibles 859 1.20*** 0.03 0.23*** 0 .03 Clothing, Shoes and Accessories 453 1.28*** 0.04 0.23*** 0 .04 Crafts 558 1.2 1*** 0.03 0.21*** 0 .04 Pottery and Glass 607 1.2 8*** 0.03 0.24*** 0 .04 Antiques 546 1.26*** 0 .04 0.25*** 0 .04 Toys and Hobbies 744 1.2 9*** 0.03 0.28*** 0 .04 Stamps 651 1.27*** 0.03 0.29*** 0 .04 Books 482 1.20*** 0.04 0.33*** 0 .04 Tickets and Experiences 489 1.25*** 0.04 0.20*** 0 .04 Art 456 1.28*** 0.04 0.24*** 0 .04 Gift Cards and Coupons 522 1.24*** 0.04 0.26*** 0 .04 Music 585 1.27*** 0.03 0.26*** 0 .04 Consumer Electronics 734 1.28*** 0.03 0.27*** 0 .04 DVDs and Movies 602 1.27*** 0.03 0.29*** 0 .04 Dolls and Bears 679 1.2 9*** 0.03 0.16*** 0 .04 Entertainment Memorabilia 506 1 .21*** 0.04 0.25*** 0 .04 Health and Beauty 541 1.30 *** 0.04 0.26*** 0 .04 Video Games and Consoles 682 1 .26*** 0.03 0.21*** 0 .04 * p<0.1, **p<0.05 , ***p<0.01 223 Table 2.D.24. The growth of bids and their drift parameters, i τ and i γ , and the rush of bidders at the end of auction rate and average entrance rate in each period, j η and j ι , estimates across auction segments Auction Cluster Auction Cluster Size growth of bids ( i τ ) STE ( i τ ) Drift of bids ( i γ ) STE ( i γ ) Last minute flood ( j η ) STE ( j η ) Mean entrance rate ( j ι ) STE ( j ι ) Jewelry and Watches 150 1.62*** 0.07 5.47*** 0.08 0.93*** 0.06 1 .99*** 0.08 Collectibles 104 1.79*** 0.09 5.24*** 0.10 1.03*** 0.08 1.99*** 0.10 Clothing, Shoes and Accessories 85 1.62*** 0.10 5.46*** 0.12 1.07*** 0 .09 1.94*** 0 .11 Crafts 79 1.84*** 0.10 5.58*** 0.12 0.99*** 0.08 2.10*** 0.13 Pottery and Glass 75 1.78*** 0.12 5.40*** 0.10 1.01*** 0.10 2.09*** 0.12 Antiques 69 1.77*** 0.11 5.40*** 0.12 1.05*** 0 .11 1.92*** 0.12 Toys and Hobbies 94 1.70*** 0.11 5.55*** 0.10 1.05*** 0.09 1 .95*** 0.14 Stamps 73 1.77*** 0.14 5.63*** 0.11 1.07*** 0.12 2.13*** 0.14 Books 85 1.61*** 0.12 5.75*** 0.12 1.02*** 0.12 2 .08*** 0.13 Tickets and Experiences 92 1.64*** 0.13 5.60*** 0.12 1.17*** 0 .12 1.92*** 0.15 Art 71 1.87*** 0.17 5.54*** 0.15 1.05*** 0.16 2.05*** 0.17 Gift Cards and Coupons 86 2.03*** 0.15 5.63*** 0.13 1.21*** 0.15 2.00*** 0.16 Music 87 1.65*** 0.16 5.53*** 0.14 1.07*** 0.16 2 .20*** 0.16 Consumer Electronics 84 1.84*** 0.17 5.50*** 0.15 1.04*** 0 .17 2.27*** 0.18 DVDs and Movies 88 1.80*** 0.18 5.68*** 0.14 1.24*** 0.18 2.31*** 0.18 Dolls and Bears 85 1.92*** 0.20 5.65*** 0.17 1.23*** 0 .19 2.27*** 0 .20 Entertainment Memorabilia 89 1 .87*** 0.20 5.82*** 0.17 1.24*** 0 .19 2.15*** 0.20 Health and Beauty 75 2.00*** 0.24 5.73*** 0.20 1.33*** 0.24 1.93*** 0.25 Video Games and consoles 94 1.83*** 0.21 5.81*** 0.17 1.12*** 0 .21 2.24*** 0 .21 * p<0.1, **p<0.05 , ***p<0.01 224 F igure 2.D.2. Histogram of regret and valuation evoluti on parameters across bidder segments 225 Figure 2.D.3. Counterfactual anal y sis of shutting d own winner regret (blue line the optimal bidding when regret is shut down, and red line the observed) 226 Figure 2.D.4. Histogram of winner regret parameter distributi on across item categories 227 Figure 2.D.5. Histogram of loser regret parameter dis tribution across item categories 228 F igure 2.D.6. Hist ogram of distributi on of learning param eter distribution across item categories 229 Figure 2.D.7. Histogram of valua tion revalation param eter distribution across item categories 230 2.D.3. K-MEANs BIDDER CLUSTERS Table 2.D.25. Bidder’s segment profile (based on k-means approach) Bidders 'segment I ndex Segment Size Bidders Feedback me an STD (Bidder’s feedba ck) Number of Bid s on This item STD(NBTI) total number of bids in 30 days STD (TNB30D) Number of items bided on in 30 days STD (NIB30D) Bid activity with cu rrent Seller STD(BACS) Number of categorie s Bided on Mean STD(NCBO) 1 252 319 695 4 2 33 75 18 54 70 7 1 0 2 940 654 242 5 1 189 55 88 25 26 3 2 0 3 185 233 389 14 4 215 174 48 42 13 9 2 0 4 27 728 623 15 14 3660 696 876 325 18 28 3 1 5 51 1539 1578 1 1 2218 543 1643 287 5 14 2 1 6 542 573 678 2 2 93 93 50 51 6 6 4 0 7 384 402 709 3 2 43 94 20 44 65 13 2 0 8 128 101 157 23 4 97 191 11 26 8 6 14 1 1 9 70 539 951 2 3 122 127 57 57 70 20 5 1 10 77 2944 2913 2 2 1232 300 918 198 4 8 1 1 11 89 660 1178 18 5 326 277 73 69 14 13 5 1 12 170 346 746 11 3 152 153 41 43 15 9 1 0 13 156 281 611 28 5 295 249 56 54 22 13 1 0 14 994 533 662 2 1 83 79 44 41 6 5 3 0 15 389 258 432 3 2 27 54 14 28 34 5 1 0 16 102 203 480 42 7 143 184 17 39 72 22 1 1 17 428 849 987 2 2 145 168 83 98 6 8 5 1 18 569 274 378 2 1 30 48 16 24 19 4 1 0 19 66 862 994 2 2 1341 455 638 237 6 13 4 1 20 3 63923 4161 2 0 2033 492 1631 332 0 0 8 1 21 283 749 930 2 2 345 132 189 76 3 4 2 0 22 1024 207 462 2 1 21 85 8 25 99 2 1 0 23 18 4219 3906 1 1 6547 796 5777 810 0 0 0 1 24 24 1610 1138 12 12 7469 2529 1533 583 9 21 2 1 25 245 334 372 3 3 71 116 37 69 34 9 3 0 26 102 384 449 19 7 1483 455 240 111 9 13 2 1 27 354 305 500 2 2 29 68 17 43 51 4 1 0 28 534 331 475 2 2 25 27 13 13 15 4 2 0 29 112 6040 1989 2 2 135 137 91 88 9 13 1 0 30 337 342 545 2 2 24 41 13 20 33 7 2 0 31 170 748 811 2 2 539 217 296 121 3 5 3 0 32 91 35 7 762 48 6 781 584 105 125 16 12 2 1 33 959 345 495 1 1 49 41 26 21 5 3 1 0 34 910 441 594 2 1 71 55 38 29 4 3 2 0 35 217 579 741 2 2 496 161 268 78 3 6 1 0 36 8 118039 13206 2 1 1973 810 1517 624 1 2 7 3 37 145 185 417 16 4 65 86 13 19 49 12 2 1 38 358 931 1016 2 1 220 80 132 43 3 5 1 0 39 7 35175 8251 3 1 103 76 74 55 20 32 2 1 40 29 16632 2988 2 2 240 316 164 191 17 32 2 1 41 28 477 786 2 2 1330 580 805 278 82 18 1 0 42 21 5739 6559 2 2 3898 631 3013 368 1 2 3 3 43 91 606 1181 35 7 357 244 58 51 25 18 4 1 44 205 1030 1122 2 2 848 254 447 142 7 12 1 0 45 73 5572 2184 1 1 180 176 128 130 7 11 3 1 46 238 133 349 9 3 62 165 13 37 94 8 1 0 47 41 161 182 71 10 908 1084 79 86 36 30 2 1 231 APPENDIX 3.A: CONDITIONAL DISTRIBUTIONS FOR ESTIMAT ION OF THE GAMIFICTION CHOICE MODEL Conditional distributions of the choice variable include the followin g : I i cbdg bdg rnk rnk rep crep rcv cont y i i it it iw iw iw iw i t it j it i ... 1 , , , , , , , , , , | 1 1 1 1 1 1 1 1 = Σ ∆ Λ − − − − − − − − µ (A1) where this conditional dist ribution can be estimated b y random walk metropolis hasting on the weighted likelihood. The priors for n ormal mix ture distributi on of the individual and th e categor y sp ecific parameters used are: v v a I v a z i i i i i i d d i i i i )*}, , {( | )*}, , {( | )*} , {( | * | , , , , , , | )} , {( Σ Σ Σ ∆ Λ Σ µ ϑ ϑ µ µ α α ϑ µ (A2) where the f irst conditional is the standard p osterior Polya Urn representation for the mean and variance of i ndividual specific random coefficient choice model pa rameters. * ) , ( i i Σ µ denotes a set of unique ) , ( i i Σ µ , which the D P p rocess hyper-parameters depend only on (a posteriori). Given the )*} , {( i i Σ µ set d α and based measure parameters (i. e. ϑ , , v a ) are i ndependent, a posteriori. The conditional posterior of th e 0 G hyper-parameters (i.e. ϑ , , v a ), factors into two parts as a is independent of ϑ , v give n )*} , {( i i Σ µ . The form of this conditional posteri or is: 232 ) , , ( ) , | * ( *) , , 0 | ( )*}) , {( | , , ( * 1 1 * ϑ ϑ µ φ µ ϑ v a p I v V v IW a v a p d j I j j j i i = Σ Σ ∝ Σ ∏ = − (A3) where .,.) | (. φ denotes the mu ltivariate normal d ensity. .,.) | ( . IW d enotes Inverted- Wishart distribution. F inally, for Polya representation i mplementation the following conditional distribution is used:        + + Σ Σ Σ Σ + + i prob with i prob with v a G d d d i i i i j j α δ α α ϑ µ µ µ µ 1 ) , , ( ~ )} , ( ),..., , {( | ) , ( ) , ( 0 1 1 1 1 (C4) I assessed the prior hyperparameters t o provid e proper but diffuse distributions, defined fo rmally by: 4 , 0.1 , 3 , 0.1 , 2 , 0.01 = = = = = = v v a a s r w v s r ϑ ϑ (C5) Finally t o complete the exposition, the posterior for the partiti on (segment) p arameters has t he following form: ( ) ( ) 0 , , ~ ) , ~ ( ~ , , , , , | )) ~ )( ~ ( ' ~ ' ' ~ , ( ~ , , , , | = = + + = + Σ ∆ Σ − − + ∆ − − ∆ − − + × × + ∆ Σ ∑ ∈ µ α α µ α µ µ µ α µ µ µ µ µ µ α µ α ϑ α k k i i k k k k k k k k k k k k k k k k k k k k k k k k k n a n a n a n N z a a z z I v n v IW V v z 233 APPENDIX 3.B: EXTRA TABLES FOR THE ALTE RNAT IVE MODEL Table 3.B.1. PARAMETER ESTIMATES: Individual Ch oice effect (10K sample size with model that explains parameters with fixed variables at Hierarch y ) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect 50.881 160.168 50.144 53.493 States: Previous contribu tion -0.004 0. 042 -0.097 0.095 Reciprocity (contribution received) -0.065 0.201 -0.489 0. 256 Leader Board: Cum Reputat ion 0.101 0.190 -0.335 0. 455 Reputation -6.340 18 .781 -7. 643 -4.691 Rank -0.053 0. 419 -0.926 0.797 Rank Change 0.000 0.001 -0.001 0. 001 Badges Gold Badge -0.198 0. 826 -1.074 0.350 Silver Bade -0.084 0. 443 -0.973 0.526 Bronze Badge -0.002 0. 370 -0.711 0.679 Cum Gold Badg e 0.014 0.197 -0.387 0. 344 Cum Silver Badg e -0.015 0. 157 -0.317 0.284 Cum Bronze Badg e -0. 010 0. 125 -0.252 0.235 234 Table 3.B.2. PARAMETER ESTIMATES: Individual Choice effect (10K sample size with model that explains parameters with fixed variables at Hierarch y) Positive Significant Negative Significant % positive % Negative Fixed Effect 9711 60 97% 1% States: Previous contribu tion 864 902 9% 9% Reciprocity (contribution received) 450 1250 5% 13% Leader Board: Cum Reputat ion 1124 565 11% 6% Reputation 60 9706 1% 97% Rank 552 860 6% 9% Rank Change 1316 140 7 1 3% 14% Badges Gold Badge 33 318 0% 3% Silver Bade 69 91 1% 1% Bronze Badge 208 201 2% 2% Cum Gold Badg e 300 298 3% 3% Cum Silver Badg e 837 945 8% 9% Cum Bronze Badg e 968 986 1 0% 10% 235 Table 3.B.3. PARAMETER ESTIMATES: Individual Choice effect (5K sample size with model tha t explains parameters with all variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.405 0.938 -2.826 0.830 States: Previous contribu tion -0.008 0. 065 -0.100 0.101 Reciprocity (contribution received) -0.085 0.370 -0.429 0. 217 Leader Board: Cum Reputat ion 0.111 0.198 -0.318 0. 455 Reputation -0.441 0. 874 -1.706 1.014 Rank -0.032 0. 374 -0.805 0.750 Rank Change 0.000 0.001 -0.002 0. 002 Badges Gold Badge 0.078 1.237 -1.291 1. 955 Silver Bade -0.155 0. 519 -0.818 0.566 Bronze Badge -0.011 0. 615 -0.862 1.174 Cum Gold Badg e 0.040 0.269 -0.388 0. 460 Cum Silver Badg e -0.011 0. 187 -0.290 0.277 Cum Bronze Badg e -0. 010 0. 152 -0.237 0.217 236 Table 3.B. 4. PARAMETER ESTIMATES: Individual Choi ce effect (5K sample size with model that explains parameters with all variables at Hierarch y) Positive Significant Negative Significant % positive % Negative Fixed Effect 94 725 2% 15% States: Previous contribu tion 428 498 9% 10% Reciprocity (contribution received) 205 6 24 4% 12% Leader Board: Cum Reputat ion 541 261 11% 5 % Reputation 167 362 3% 7% Rank 308 360 6% 7% Rank Change 665 716 13% 14% Badges Gold Badge 643 569 13% 11% Silver Bade 53 58 1% 1% Bronze Badge 147 92 3% 2% Cum Gold Badg e 158 163 3% 3% Cum Silver Badg e 429 468 9% 9% Cum Bronze Badg e 491 476 1 0% 10% 237 Table 3.B.5. PARAMETER ESTIMATES: Individual Choice effect (5K sample size with model tha t explains parameters with full variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.094 0.818 -1.414 2.455 States: Previous contribu tion -0.010 0. 079 -0.107 0.110 Reciprocity (contribution received) -0.089 0.449 -0.481 0. 232 Leader Board: Cum Reputat ion 0.111 0.193 -0.312 0. 444 Reputation -0.447 0. 622 -1.478 0.940 Rank -0.023 0. 362 -0.803 0.735 Rank Change 0.000 0.001 -0.002 0. 002 Badges Gold Badge -0.089 0. 641 -1.730 0.844 Silver Bade -0.045 0. 312 -0.592 0.446 Bronze Badge -0.049 0. 390 -0.833 1.005 Cum Gold Badg e 0.022 0.230 -0.418 0. 451 Cum Silver Badg e -0.012 0. 189 -0.294 0.279 Cum Bronze Badg e -0. 011 0. 152 -0.249 0.222 238 Table 3.B.6. PARAMETER ESTIMATES: Individual Choice effect (5K sample size with model that explains parameters with fixed variables at Hierarch y) Positive Significant Negative Significant % positive % Negative Fixed Effect 356 536 7% 11% States: Previous contribu tion 427 513 9% 10% Reciprocity (contribution received) 208 6 22 4% 12% Leader Board: Cum Reputat ion 532 251 11% 5 % Reputation 147 390 3% 8% Rank 301 363 6% 7% Rank Change 684 707 14% 14% Badges Gold Badge 116 310 2% 6% Silver Bade 49 54 1% 1% Bronze Badge 144 92 3% 2% Cum Gold Badg e 154 164 3% 3% Cum Silver Badg e 412 459 8% 9% Cum Bronze Badg e 490 478 1 0% 10% 239 Table 3.B.7.PARAMETER ESTIMATES: Individual Choice effect (1K size for k-mean stratified sample with model that explains parameters with fix ed variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.003 2.045 -3.356 2.772 States: Previous contribu tion -0.034 0. 089 -0.236 0.165 Reciprocity (contribution received) -0.031 0.213 -0.557 0. 403 Leader Board: Cum Reputat ion 0.095 0.233 -0.384 0. 540 Reputation -0.271 0. 869 -1.937 1.391 Rank 0.004 0.386 -0.818 0. 834 Rank Change 0.000 0.009 -0.003 0. 004 Badges Gold Badge -0.134 1. 138 -2.132 2.822 Silver Bade -0.448 1. 019 -1.756 1.265 Bronze Badge -0.193 2. 120 -5.708 0.877 Cum Gold Badg e 0.021 0.494 -0.584 0. 432 Cum Silver Badg e -0.020 0. 170 -0.394 0.330 Cum Bronze Badg e -0. 011 0. 188 -0.288 0.330 240 Table 3.B.8. PARAMETER ESTIMATES: Individual Choice effect (1K size for k-mean stratified sample with model that explains parameters with fix ed variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 215 284 4% 6% States: Previous contribu tion 97 134 10% 13% Reciprocity (contribution received) 62 121 6% 12 % Leader Board: Cum Reputat ion 122 73 12% 7% Reputation 78 106 8% 11 % Rank 66 68 7% 7% Rank Change 122 123 12% 12% Badges Gold Badge 128 213 13% 21% Silver Bade 26 98 3% 10% Bronze Badge 28 57 3% 6% Cum Gold Badg e 39 34 4% 3% Cum Silver Badg e 77 98 8% 10% Cum Bronze Badg e 126 96 13% 1 0% 241 Table 3.B.9. PARAMETER ESTIMATES: Individual Choice effect (1K size for LDA stratified sample with model that explains parameters with fix ed variables at Hierarchy) Estimate Std . Dev. 2.5 th 97.5 th Fixed Effect 14.635 2.287 12.119 19.938 Times: Previous contribu tion -0.015 0. 079 -0.174 0.174 Reciprocity (contribution received) -0.110 0.208 -0.613 0. 272 Leader Board: Cum Reputat ion 0.069 0.183 -0.318 0. 435 Reputation -1.989 0. 847 -3.334 -0.197 Rank -0.092 0. 357 -0.853 0.654 Rank Change 0.000 0.003 -0.005 0. 003 Badges Gold Badge -0.314 1. 366 -2.758 2.276 Silver Bade -0.136 1. 076 -2.265 1.370 Bronze Badge 0.047 0.535 -1.181 1. 053 Cum Gold Badg e 0.044 0.262 -0.472 0. 526 Cum Silver Badg e -0.034 0. 178 -0.413 0.329 Cum Bronze Badg e -0. 017 0. 137 -0.266 0.236 242 Table 3.B.10. PARAMETER ESTIMATES: Individual Choice effect (1K size for LDA stratified sample with model that explains parameters with fix ed variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 1000 0 20% 0% States: Previous contribu tion 114 116 11% 12% Reciprocity (contribution received) 42 156 4% 16 % Leader Board: Cum Reputat ion 99 49 10% 5% Reputation 5 936 1% 94% Rank 60 81 6% 8% Rank Change 128 145 13% 15% Badges Gold Badge 162 320 16% 32% Silver Bade 46 69 5% 7% Bronze Badge 25 19 3% 2% Cum Gold Badg e 38 32 4% 3% Cum Silver Badg e 84 114 8% 11 % Cum Bronze Badg e 94 93 9 % 9% 243 Table 3.B.11. PARAMETER ESTIMATES: Individual Choice effect (1K size for Unifor m stratified sample with model that explains parameters with fix ed variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.260 1.280 -2.227 3.432 States: Previous contribu tion -0.027 0. 080 -0.177 0.153 Reciprocity (contribution received) -0.109 0.272 -0.690 0. 318 Leader Board: Cum Reputat ion 0.091 0.182 -0.319 0. 424 Reputation -0.083 1. 060 -1.772 1.937 Rank -0.073 0. 415 -0.995 0.863 Rank Change 0.000 0.005 -0.004 0. 004 Badges Gold Badge -0.351 1. 971 -3.725 2.775 Silver Bade -0.533 1. 056 -2.180 1.218 Bronze Badge 0.060 0.892 -1.307 1. 414 Cum Gold Badg e -0.015 0. 275 -0.503 0.580 Cum Silver Badg e -0.027 0. 163 -0.313 0.271 Cum Bronze Badg e -0. 003 0. 150 -0.288 0.305 244 Table 3.B.12. PARAMETER ESTIMATES: Individual Choice effect (1K size for Unifor m stratified sample with model that explains parameters with fix ed variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 82 89 2% 2% States: Previous contribu tion 93 118 9% 12 % Reciprocity (contribution received) 52 152 5% 15 % Leader Board: Cum Reputat ion 100 52 10% 5% Reputation 126 103 13% 10% Rank 69 92 7% 9% Rank Change 137 139 14% 14% Badges Gold Badge 158 323 16% 32% Silver Bade 14 145 1% 15 % Bronze Badge 21 23 2% 2% Cum Gold Badg e 49 45 5% 5% Cum Silver Badg e 82 98 8% 10% Cum Bronze Badg e 133 111 1 3% 11% 245 Table 3.B.13. PARAMETER ESTIMATES: Individual Choice effect (1K size for mix ed-normal stratified sample with model that explains parameters with fixed variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect 0.065 1.353 -2.403 3.649 States: Previous contribu tion -0.023 0. 076 -0.153 0.103 Reciprocity (contribution received) -0.055 0.279 -0.437 0. 237 Leader Board: Cum Reputat ion 0.119 0.206 -0.294 0. 506 Reputation -0.456 0. 621 -1.603 0.814 Rank -0.048 0. 282 -0.676 0.494 Rank Change 0.000 0.005 -0.006 0. 004 Badges Gold Badge -0.085 0. 933 -1.858 2.451 Silver Bade -0.108 0. 895 -2.220 1.019 Bronze Badge -0.029 0. 607 -0.800 0.798 Cum Gold Badg e 0.019 0.319 -0.445 0. 397 Cum Silver Badg e -0.007 0. 196 -0.321 0.339 Cum Bronze Badg e -0. 023 0. 149 -0.312 0.259 246 Table 3.B.14. PARAMETER ESTIMATES: Individual Choice effect (1K size for mix ed-normal stratified sample with model that explains parameters with fixed variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 195 137 4% 3% States: Previous contribu tion 88 105 9% 11 % Reciprocity (contribution received) 41 124 4% 12 % Leader Board: Cum Reputat ion 114 44 11% 4% Reputation 31 152 3% 15 % Rank 44 58 4% 6% Rank Change 121 125 12% 13% Badges Gold Badge 81 108 8% 11 % Silver Bade 25 42 3% 4% Bronze Badge 25 21 3% 2% Cum Gold Badg e 40 38 4% 4% Cum Silver Badg e 97 83 10% 8% Cum Bronze Badg e 108 109 1 1% 11% 247 Table 3.B.15. PARAMETER ESTIMATES: Individual Choice effect (1K size for mix ed-normal stratified sample with model that explains parameters with full variables at Hierarchy) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.294 1.505 -3.123 2.165 States: Previous contribu tion -0.037 0. 103 -0.195 0.143 Reciprocity (contribution received) -0.119 0.498 -0.927 0. 356 Leader Board: Cum Reputat ion 0.134 0.273 -0.373 0. 533 Reputation -0.579 1. 416 -2.484 2.422 Rank 0.031 0.583 -1.019 1. 080 Rank Change -0.001 0. 026 -0.005 0.005 Badges Gold Badge 0.046 1.667 -5.272 1. 733 Silver Bade -0.094 1. 547 -2.989 3.156 Bronze Badge 0.115 1.429 -1.215 1. 436 Cum Gold Badg e -0.025 0. 176 -0.398 0.300 Cum Silver Badg e -0.019 0. 182 -0.399 0.324 Cum Bronze Badg e -0. 006 0. 160 -0.320 0.329 248 Table 3.B.16. PARAMETER ESTIMATES: Individual Choice effect (1K size for mix ed-normal stratified sample with model that explains parameters with full variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 155 288 3% 6% States: Previous contribu tion 92 145 9% 15 % Reciprocity (contribution received) 57 135 6% 14 % Leader Board: Cum Reputat ion 131 57 13% 6% Reputation 78 220 8% 22 % Rank 85 88 9% 9% Rank Change 134 157 13% 16% Badges Gold Badge 135 89 14% 9% Silver Bade 122 129 12% 13% Bronze Badge 32 27 3% 3% Cum Gold Badg e 24 20 2% 2% Cum Silver Badg e 73 115 7% 12 % Cum Bronze Badg e 129 80 13% 8% 249 Table 3.B.17. PARAMETER ESTIMATES: Individual Choice effect (1K size for k-mean stratified sample with model that explains parameters with full variables at Hierarch y) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect -0.419 1.661 -4.120 2.602 States: Previous contribu tion -0.024 0. 150 -0.199 0.147 Reciprocity (contribution received) -0.168 0.397 -0.734 0. 286 Leader Board: Cum Reputat ion 0.118 0.199 -0.330 0. 472 Reputation -0.419 0. 893 -2.209 1.093 Rank -0.049 0. 376 -0.777 0.742 Rank Change 0.000 0.008 -0.006 0. 004 Badges Gold Badge -0.131 1. 950 -3.361 3.930 Silver Bade 0.052 1.412 -2.208 1. 787 Bronze Badge 0.068 0.847 -1.282 1. 994 Cum Gold Badg e -0.021 0. 727 -0.440 0.546 Cum Silver Badg e -0.021 0. 216 -0.364 0.317 Cum Bronze Badg e -0. 006 0. 167 -0.266 0.274 250 Table 3.B.18. PARAMETER ESTIMATES: Individual Choice effect (1K size for k-mean stratified sample with model that explains parameters with full variables at Hierarch y) Positive Significant Negative Significant % positive % Negative Fixed Effect 202 380 4% 8% States: Previous contribu tion 99 109 10% 11% Reciprocity (contribution received) 48 135 5% 14 % Leader Board: Cum Reputat ion 114 53 11% 5% Reputation 56 218 6% 22 % Rank 70 72 7% 7% Rank Change 141 140 14% 14% Badges Gold Badge 222 292 22% 29% Silver Bade 131 75 13% 8% Bronze Badge 52 32 5% 3% Cum Gold Badg e 47 36 5% 4% Cum Silver Badg e 97 82 10% 8% Cum Bronze Badg e 103 84 10% 8% 251 Table 3.B.19. PARAMETER ESTIMATES: Individual Choice effect (1K size for LDA stratified sample with model that explains parameters with full variables at Hierarch y) Estimate Std . Dev. 2.5 th 97. 5 th Fixed Effect 13.886 2.334 9.135 17.403 States: Previous contribu tion -0.034 0. 082 -0.197 0.134 Reciprocity (contribution received) -0.104 0.249 -0.695 0. 410 Leader Board: Cum Reputat ion 0.130 0.203 -0.287 0. 522 Reputation -1.850 0. 930 -3.612 -0.218 Rank -0.093 0. 448 -1.088 0.860 Rank Change 0.000 0.002 -0.004 0. 005 Badges Gold Badge -0.874 1. 402 -3.808 2.347 Silver Bade 0.034 1.194 -2.064 3. 101 Bronze Badge 0.130 0.548 -0.800 1. 268 Cum Gold Badg e 0.000 0.289 -0.493 0. 600 Cum Silver Badg e -0.035 0. 176 -0.403 0.364 Cum Bronze Badg e -0. 008 0. 140 -0.289 0.280 252 Table 3.B.20.PARAMETER ESTIMATES: Individual Choi ce effect (1K size for LDA stratified sample with model that explains parameters with full variables at Hierarch y) Positive Significant Negative Significant % positive % Negative Fixed Effect 997 1 20% 0% States: Previous contribu tion 82 107 8% 11 % Reciprocity (contribution received) 43 131 4% 13 % Leader Board: Cum Reputat ion 115 39 12% 4% Reputation 6 850 1% 85% Rank 58 70 6% 7% Rank Change 131 140 13% 14% Badges Gold Badge 70 555 7% 56 % Silver Bade 163 144 16% 14% Bronze Badge 39 16 4% 2% Cum Gold Badg e 33 30 3% 3% Cum Silver Badg e 89 94 9% 9% Cum Bronze Badg e 106 89 11% 9% 253 Table 3.B.21. PARAMETER ESTIMATES: Individual Choice effect (1K size for Uniform stratified sample with model that explains parameters with full variables at Hierarchy) Estimate Std. Dev. 2. 5 th 97.5 th Fixed Effect 0.459 2.628 -4.001 4.872 States: Previous contribu tion -0.021 0.080 -0. 190 0.156 Reciprocity (contribution received) -0.104 0.244 -0. 662 0 .294 Leader Board: Cum Reputat ion 0.124 0. 203 -0.327 0. 489 Reputation -0.651 1.382 -2. 805 1.752 Rank 0.062 0. 534 -0.922 0. 892 Rank Change 0.000 0. 002 -0.004 0. 004 Badges Gold Badge 0.012 2. 723 -4.257 5. 609 Silver Bade -0.167 1.287 -4. 334 3.800 Bronze Badge -0.014 0.966 -2. 003 1.228 Cum Gold Badg e 0.051 0. 211 -0.385 0. 426 Cum Silver Badg e -0.023 0.149 -0. 325 0.269 Cum Bronze Badg e 0.004 0. 120 -0.230 0.274 254 Table 3.B.22. PARAMETER ESTIMATES: Individual Choice effect (1K size for Uniform stratified sample with model that explains parameters with full variables at Hierarchy) Positive Significant Negative Significant % positive % Negative Fixed Effect 358 155 7% 3% States: Previous contribu tion 85 119 9% 12% Reciprocity (contribution received) 44 132 4% 13% Leader Board: Cum Reputat ion 120 48 12% 5% Reputation 76 243 8% 24% Rank 72 70 7% 7% Rank Change 135 140 14% 14 % Badges Gold Badge 257 332 26% 33 % Silver Bade 53 91 5% 9% Bronze Badge 47 57 5% 6% Cum Gold Badg e 34 34 3% 3% Cum Silver Badg e 69 95 7% 10% Cum Bronze Badg e 112 8 4 11% 8% 255 REFERENCES Anderson, Chris. 2006. Th e Long Tail: Why the Future of Business Is Selli ng Less of More, New York: Hyperion . Andreoni, J. 1990. Impure altruism and donations to public g oods: a theory of warm-glow giving. The economic journal , 100 (401), 464-477. Antin, J., E. F. Churchill. 2 011. Badges in social media: A s ocial ps ychological perspective. In CHI 2011 Gamification Workshop Proceedings (Vancouver, BC, Canada, 2011) . Ariely, D., I. Si monson. 2003. Bu y ing, bidding, playing, or competing? Value assessment and decision dynamics in online auctions. Journal of Consumer Psychology , 13 (1), 113-123. Ariely, D., A. Ockenfels , A. E. Ro th. 2005. An ex perimental anal y sis of endin g rules in intern et auctions. RAND Journal of Economics , 36 (4), 890-907. Astor, P., M. Ada m, Jäh nig, C., S. Seifert. 2011. Measuring regret: E motional aspects of auction design Proc eedings o f the 19th European Confe rence on Information S ystems (EC IS) . (Helsinki, Finland). 1129–1140. Bajari, P., A. Hortacsu. 2003. Economic insights fro m internet auctions: A surve y (No. w10076 ). National Bureau of Economic Research . Bajari, P., A. Hortacsu. 2003. The winner's curse, reserve prices, and endogenous entr y: empirical insights from eBay auctions. RAND Jour nal of Economics , 34 (2), 329-355. Banks, D. T., Hutchinson, J. W., R. J. M eyer. 2002. Reputation in marketing channels: Repeated- transactions bargaining with two-sided uncertainty. Marketing Science , 21 (3), 251-272. Bapna, R., P. Goes, A. Gupta, Y. Jin. 2004. User heterogeneity and its impact on electronic auction market design: An empirical exploration. MIS Quarterly , 28 (1), 21-43. Belisle, C. J. P. 1992. Conver gence t heorems for a class of simulated an nealing algorithms on Rd. J. Applied Probability , 29, 885–895. Bell, D. E. 1982. Regret in decision making unde r u ncertainty. Operations research , 30 (5), 961- 981. Bell, D. R., S. Song. 2007 . Neighborhood effects and trial on the Internet: Evidence from onli ne grocery retailing. Quantitative Marketing and Economics, 5 (4), 361-400. 256 Bierens, H. J. , H. Son g, 2011. Nonparametric identification of t he first-price au ction model. Working paper.(http://econ. la. psu. edu/~ hbierens/auctions_ident. pdf). Bishop, C. M. 2006. Pattern recognition and machine lear ning (Vol. 1, p. 740). New York: springer . Blei, D. M., M. I. Jo rdan, 2006. Variati onal inference for Dirichlet process mixt ures. Bayesian analysis , 1 (1), 121-143. Blei, D. M., A. Y . Ng, , M. I. Jo rdan. 2003. L atent dirichlet allocation. the Journal of machine Learning research , 3 (), 993-1022. Boles, T. L., D. M. Messick, 1995. A reverse out come bias: The influence of multiple reference points on the evaluation of outcomes and d ecisions. Organizational Behavior and Hu man Decision Processes , 61 (3), 262-275. Bolton, G. E., E. Katok, A. O ckenfels. 20 04. How effective are electronic reputation mechanisms? An experimental investigation. Management science , 50 (11), 1587-1602. Bolton, G. , B. Greiner, A. Ockenfels. 2013. Engineering trust: reciprocity in the production of reputation information. Management Science , 59 (2), 265-285. Bolton, R . N., P. K. Kannan, M. D. Bramlett. 2000. Implications of lo yalty p rogram membership and service experiences for customer retention and value. Journal of t he academy of marketing science , 28 (1), 95-108. Bose, S., A. Da ripa, 2011. Shills and Snipes. W orking Paper (No. 14/12). Universit y of Leicester. Bosomworth, D. 2011. Gamification- what is it an d why is it i mportant? Smart insi ghts website. http:// www.smartinsights.com/digital-marketing-strategy /pleasu re-the-future-of- marketing/. July 26, 2011. Accessed June 7, 2015. Bradlow, E. T., Y. H. Park. 2007. Bayesian estimation of bid seq uences in internet auctions using a generalized record-breaking model. Marketing Science , 26 (2), 218-229. Bradlow, E. T., B. Bronnenberg, G. J. Ru ssell, N. Arora, D. R. Bell, S. D. Duvvu ri, S. Y ang. 2005. Spatial models in marketing. Marketing Letters, 16 (3-4), 267-278. Camerer, C. F ., T. H. Ho. 19 94. Vi olations o f the b etweenness ax iom and n onlinearit y in probability. Journal of risk and uncertainty , 8 (2), 167-196. 257 Camerer, C., M. Weber. 1992. Recent develop ments in modeling preferences: Uncertaint y and ambiguit y . Journal of risk and uncertainty , 5 (4), 325-370. Campo, S., I. Perrigne, Q . V uong. 2003. Asymmetr y in first ‐ price aucti ons wi th affiliated private values. Journal of Applied Econometrics , 18 (2), 179-207. Carare, O. 2012. The imp act of bestseller ran k on de mand: evidence from the a pp market. International Economic Review, 53 (3), 717-742. Chakravarti, D., E. Greenleaf, A. Sinha, A. Cheema, J. C. Cox, D. Friedman, R. Zwick. 2002. Auctions: Research opportunities in marketing. Marketing Letters , 13 (3), 281-296. Chatterjee, R. A., J. E liashberg. 1 990. T he inn ovation diffusion process in a heterogeneous population: A micromodelin g approach. Management Science, 36 (9), 1057-1079. Chaudhuri, A. 2011. Sustaining cooperation i n l aboratory public goods experiments: a selective survey of the literature. Experimental Economics , 14 (1), 47-83. Chen, Y. 20 08. Incentive-compatible mechanisms for p ure public goods: A survey of experimental research. Handbook of experimental economics results , 1 , 625-643. Chen, Y., T. H. H O, Y . M. KIM. 2010. Knowledge market d esign: A field ex perime nt at Google Answers. Journal of Public Economic Theory , 12 (4), 641-664. Chevalier, J . A., D. Mayzlin. 2006. The effect of word of mouth o n sales: Onlin e book reviews. Journal of marketing research , 43 (3), 345-354. Chiu, C. M., E. T . Wang, Y. H. Fang. 2009. A ntecedents o f citizenshi p behaviors in op en professional virtual communities. PACIS 2009 Proceedings , 8. Cho, S. D., D. R. C hang. 2008. Salesperson's inn ovation resistance and jo b satisfaction in intra- organizational diffusion of sales force auto mation technolog ies: th e case o f South Korea. Industrial Marketing Management , 37 (7), 841-847. Choi, J. , S. K. Hui, D. R. Bell. 2010. Spatiotemporal anal ysis of i mitation behavior across n ew buyers at an online grocery retailer . Journal of Marketing Research, 47 (1), 75-89. Cornes, R., T . Sandler. 1994. T he comparative static properties of the impure public good model. Journal of Public Economics , 54 (3), 403-421. Crama, Y., M. Sch yns. 2003. Simulated ann ealing for complex portfolio selection proble ms. European Journal of operational research , 150 (3), 546-571. 258 De V alpine, P. 2012. Frequentist analysis of hi erarchical models fo r po pulation dynamics and demographic data. Journal of Ornithology , 152 (2), 393-408. Dekimpe, M. G., P. M. Parker, M. Sarvary. 2000. Global diffusion o f t echnological innovati ons: A coupled-hazard approach. Journal of Marketing Research, 37 (1), 47-59. Deterding, S. 2012. Gamification: designing for motivation. Interactions , 19 (4), 14-17. Diecidue, E., N. Rudi, W. Tang. 2012. D y namic purchase d ecisions under regret: Price and availability. Decision Analysis , 9 (1), 22-30. Ding, M., J. Eliashberg, J. Huber, R. Saini. 2005. Emotional bid ders—An analytical and experimental e xamination of consumers' b ehavior in a priceline-like rev erse auction. Management Science , 51 (3), 352-364. Dover, Y., J. Goldenberg, D. Shapira. 2012. Network traces on penetration: Uncovering degree distribution from adoption data. Marketing Science, 31 (4), 689-712. Dubé, J. P., G. J. Hitsch, P. E. Ro ssi, M. A. V itorino. 2008. Category pricing with state- dependent utility. Marketing Science , 27 (3), 417-429. Ellison, G. 2006. Bounded rat ionality in industrial o rganization. Econometric Society Monographs , 42 , 142. Ely, J. C., T. Hossain. 2009. Snipi ng and squatting in auction markets. A merican Economic Journal: Microeconomic s, 1 (2) 68-94. Engelbrecht-Wiggans, R . 19 89. Th e effect o f regret on optimal bidding in auctions. Management Science , 35 (6), 685-692. Engelbrecht-Wiggans, R., E. Katok. 2007. R egret in auctions: Theory and evid ence. Economic Theory , 33 (1), 81-101. Engelbrecht-Wiggans, R., E. Katok. 2008. Re gret and feedback information in first-price sealed- bid auctions. Management Science , 54 (4), 808-819. Erdem, T., M. P. Keane. 1996. Decision- making under unce rtainty: Capturing d ynamic b rand choice processes in turbulent consumer goods markets. Marketing science , 15 (1), 1-20. Erdem, T., M. P. Keane. 1996. Decision- making under unce rtainty: Capturing d ynamic b rand choice processes in turbulent consumer goods markets. Marketing science, 15 (1), 1-20. Filiz-Ozbay, E., E. Y. Ozbay. 2007. Auctions with anticipated regret: Theor y and experiment. The American Economic Review , 97 (4), 1407-1418. 259 Francisco Conejo. 2014. Loyalty 3.0: How to Revolutionize Customer and Emplo y ee Engagement with Big D ata and Gamification. Journal of Consumer Ma rketing, 31 (1), 86-87. Garg, R., R. Telan g. 2013. I nferrin g app demand from publicl y availabl e data. MIS Quarterly, 37 (4), 1253-1264. Genkin, A., D . L ewis, D. Madig an. 2007. Large-scale Bay esian logistic regr ession for t ext categorization. Technometrics , 49 (3), 291-304. Ghahramani, Z., M. J . Beal. 2001. Propagation al gorithms for variational B ayesian learning. Proceeding of Advances in neural information processing systems NIPS (13) , 507-513. Ghose, A., S. P. Han. 2011 . A dynamic structural model of user learning on the mobile Internet. Available at SSRN 1485049 . Ghose, A., S. P. H an. 2 011. An empirical anal ysis of user cont ent generation and u sage behavior on the mobile Internet. Management Science, 57 (9), 1671-1691. Ghose, A., S. P. Han. 2014. Estimating demand for mobile applic ations in the new econom y. Management Science . 60 (6), 1470-1488. Ghose, A., Goldf arb, A., S. P. Han. 2 012. How is th e mobile Internet differen t? Search costs and local activities. Information Systems Research, 24 (3), 613-631. Gilovich, T., V. H. Medvec. 1995. Th e exp erience of regret : what, when, an d why. Psychological review , 102 (2), 379. Gilovich, T., V. H. Medvec , D. Kahneman. 1998. Var ieties o f regret: A debate and par tial resolution. Psychological Review , 105 (3), 602. Godes, D., J. C. Silva. 2012. Sequential and temporal d yn amics of online opinion. Marketing Science , 31 (3), 448-473. Goldfarb, A. , C. E. Tu cker. 20 11. Privac y r egulation and online adve rtising. Management Science , 57 (1), 57-71. Greenleaf, E. A. 2004. Reserves, regret, an d rejoicing in open English auctions. Journal of Consumer Research , 31 (2), 264-273. Guadagni, P. M., J. D. Little. 2008. A lo g it mo del of brand choice calibrated on scanner data. Marketing Science , 27 (1), 29-48. 260 Guerre, E., I. Perrigne, Q. Vuong. 2000. Optimal Nonpara metric Esti mation of First ‐ Price Auctions. Econometrica , 68 (3), 525-574. Haile, P. A., E. Tamer. 2 003. Inference with an in complete model of English auctions. Journal of Political Economy , 111 (1), 1-51. Haile, P. A., H. Hon g, M . Shum. 2003. Nonparametric tests for commo n values a t first-price sealed-bid auctions (No. w10105). National Bureau of Economic Research . Hartmann, W. R. 2010. Demand estimation with social interactions and the i mplications fo r targeted marketing. Marketing Science, 29 (4), 585-601. Hartmann, W. R., P. Manchanda, H. Nair, M. B othner, P. D odds , D. Godes, C. Tucker. 2 008. Modeling social interactions: Identification, empirical me thods and policy implications. Marketing letters, 19 (3-4), 287-304. Haruvy, E., P. T. Popkowski Leszczyc. 2010. Search and choice in online consumer auctions. Marketing Science, 29 (6), 1152-1164. Heidhues, P., B. K ő szegi. 2008. Competition and price variation when consumers a re loss averse. The American Economic Review , 98 (4), 1245-1268. Hey, J. D., C . Orme. 1994. Investigating generalizations of expected utility theor y using experimental data. Econometrica: Journal of the Econometric Society , 62 (6), 1291-1326. Holt, C. A., R. Sherman. 1994. The los er's curse . The Am erican Economi c Review , 8 4 (3), 642- 652. Hortaçsu, A., E. R. Nielsen. 2010. Commentary-Do Bids Equal Values on eBay?. Marketing Science, 29 (6), 994-997. Hossain, T. 2008. Learning by bidding. The RAND Journal of Economics , 39 (2), 509-529. Hubert, L., P. Arabie 1985. Comparing partition s. Journal of classification , 2 (1), 193-218. Inman, J. J., M. Zeel enberg. 2002. Regret in rep eat purchase versus swi tching decisions: The attenuating role of decision justifiabilit y . Journal of Consumer Research , 29 (1), 116-128. Inman, J. J., J. S. Dyer, J. Jia. 1997. A g en eralized uti lity model of disappointment an d regret effects on post-choice valuation. Marketing Science , 16 (2), 97-111. Iyengar, R., C. Van d en B ulte, T. W . Valente. 20 11. Opinion leadership and social conta gion in new product diffusion. Marketing Science, 30 (2), 195-212. 261 Jaihak, C., V. R. Rao. 2011. A General co nsumer p reference model fo r ex perience products: application t o internet re commendation servic es. Johnson S chool Resea rch Paper Series , (49-2011). Jap, S. D., P. A. Naik. 20 08. B idanalyzer: A method for estimation and selection of dynamic bidding models. Marketing Science , 27 (6), 949-960. Jedidi, K., C. F. Mela, S. Gupta. 1999. Managing advertisin g and pro motion for lon g-run profitability. Marketing science , 18 (1), 1-22. Joachimsthaler, E. A., J. L . Lastovicka. 1984. Optimal stimulation level-explo ratory behavior models. Journal of Consumer Research , 11( 12), 830-835. Julier, S. J., J. K. Uhlmann. 1997. A new extension of the Kalman filter to n onlinear systems. Int. symp. aerospace/defense sensing, simul. and controls , 3( 26), 182-193 . Kagel, J. H., R. M. Harstad, D. Lev in. 1987. Information impact and alloca tion rules in auctions with affiliated private values: A laboratory study. Econometrica: J ournal of t he Econometric Society , 55 (), 1275-1304. Kahneman, D. 2003. A perspective on judgment and choice: mapping bound ed rationality. American psychologist , 58 (9), 697. Kahneman, D., A. T versk y . 1979. Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric So ciety , 47 (2), 263-291. Kahneman, D., J. L. Knetsch, R. H. Thaler. 1991. Anomalies: The endowmen t effect, los s aversion, and status quo bias. The journal of economic perspectives , 5 (1), 193-206. Kalman, R. E., R . S. Bucy. 1961. New results in linear filtering and predicti on theo ry. Journal of Fluids Engineering , 83 (1), 95-108. Kaufman, B. E. 1999. Emotional arousal as a source of bounded rationalit y. Journal of Economic Behavior & Organization , 38 (2), 135-144. Keinan, A., R. Kivetz. 2008. Remed y ing h yperopia: The effects o f self-control regret on consumer behavior. Journal of Marketing Research , 45 (6), 676-689. Kim, Y., R. Telang, W. B. Vogt, R. Krishnan. 2010. An em pirical analysis of mobile voice service and SMS: a structural model. Management Science, 56 (2), 234-252. Kirton, M. 1976. Adapt ors and innovators: A description and measure. Journal of applied psychology , 61 (5), 622. 262 Kivetz, R ., I. Simonson. 2002. E arning the right t o indul ge: Effort as a d eterminant of customer preferences tow ard fr equency pro gram rewards. Journal of Marketing R esearch , 3 9 (2), 155-170. Kohli, R., D. R. L ehmann, J. Pae. 1999. Extent and impact of incubation t ime in new product diffusion. Journal of Product Innovation Management , 16 (2), 134-144. Kopalle, P. K., Y. Sun, S. A. Nesli n, B. Sun, V. Swa minathan, 2 012. T he joint sales impact of frequency reward and cu stomer tier components o f lo yalty p rograms. Marketing Science , 31 (2), 216-235. Kurt, D., J. J. Inman, J. J. A rgo. 2011. The influence o f friends on consumer spending: The role of a gency-communion orientation and self-monito ring. Journal of Marke ting Research, 48 (4), 741-754. Laffont, J. J., H. Ossard, Q. Vuon g. 1995. E conometrics of first- price auctions. Econometrica: Journal of the Econometric Society , 63 (), 953-980. Le Cessie, S., J. C . Van Houwelin gen.1992. Ridge estimators i n lo gistic regression. Applied statistics , 41 (), 191-201. Lee, J. Y., D. R. Bell. 2 013. Neigh borhood soc ial capital an d so cial learni ng for e xperience attributes of products. Marketing Science , 32 (6), 960-976. Lelarge, M. 2012. Diffusion and cascading behavi or in random networks. Games and Economic Behavior, 75 (2), 752-775. Lewis, M. 2 004. The influence of lo yalty pro grams and short-term promoti ons on customer retention. Journal of Marketing Research , 41 (3), 281-292. Li, T., I. Perri gne, Q. Vuon g. 1 998. Structural E stimation of the A ffiliated Private Value Model with an Application to OCS Auctions. Mimeo, USC. Li, Z., K. -W. Huang, H. C avusoglu. 2015. Th e role of Gamification onv oluntary contributions: The c as e of stack overflow Q&A community, W o rking paper, Un der 1 st re vision at Information Systems Research , 1–43. Little, J. D. 1 979. Decision support s ystems for marketing managers. The Journal of Marketing , 43 (3), 9-26. Liu, C. Z., Y. A. Au, H. S. Choi. 2012. An empirical st udy of the freemium strategy for mobile apps: evidence from the Google Pla y market, International Conference on Information Systems, ICIS 2012, 3 (), 2069-2085. 263 Liu, Y. 2007 . The long -term impact of loy alty programs on consumer purchase behavio r and loyalty. Journal of Marketing , 71 (4), 19-35. Loomes, G., R. Sugden. 1982. Re gret theory: An alternative th eory of rational choice u nder uncertainty. The economic journal , 92 (368), 805-824. Loomes, G. , R . Sugden. 1986. Disap pointment an d d ynamic consist ency in choice unde r uncertainty. The Review of Economic Studies , 53 (2), 271-282. Mahajan, V., E. Muller. 1 986. Adver tising pulsi ng polici es for generating awareness for new products. Marketing Science , 5 (2), 89-106. Mahajan, V., A. K. Jain , M. Bergier, 1977. Parameter estimation in m arketin g models in the presence of multicolline arity: An application of ridg e regression. Journal of Marke ting Research , 14 (4), 586-591. Mallapragada, G., R. Grewal, G. Lilien. 20 12. U ser-generated open so urce products: F ounder's social capital and time to product release. Marketing Science , 31 (3), 474-4 92. Manski, C. F . 1993. Identification of endogen ous social effects: The reflection problem . The review of economic studies, 60 (3), 531-542. McAlister, L. 1982. A dynamic attribute satiatio n model of variety-seekin g behavior. Journal of Consumer Research , 9 (), 141-150. McMahan, H. B. 2011. Follo w-the-reg ularized-leader and mirror d escent: Equivalence th eorems and l1 reg ularization. In Internation al Conference on Artificial Intelligence and Statistics, 525-533. McMahan, H. B., G. Holt, D. Sculle y, M. Young, D. Ebner, J . Grady, J. Kubica. 2013. Ad click prediction: a view fro m the trenches. I n Proceedings of the 19th A CM SIGKDD international conference on Knowledge discovery and data mining , 12 22-1230. Mela, C. F., S. Gupta, D. R . Lehmann. 1 997. T he l ong-term impact o f pro motion and advertisin g on consumer brand choice. Journal of Marketing research , 34 (5), 248-261. Mittelstaedt, R. A., S. L. Grossbart, W . W. Curtis , S. P. Devere. 1976. O ptimal stimulation level and the adoption decision process. Journal of Con sumer Research , 3(2), 84-94. Moe, W. W ., D. A. S chweidel. 2012. Online product opinion s: Inci dence, evaluati on, and evolution. Marketing Science , 31 (3), 372-386. 264 Moon, S., G. J. Russell. 2008. Predicting produ ct purchase from inferre d customer similarity: An autologistic model approach. Management Science, 54 (1), 71-82. Murphy, K. P. 2012. Machine learning: a probabilis tic perspective . MIT press. Nair, H. S., P. Manchanda, T. Bhatia. 2010. Asymmetric social interactions in ph ysician prescription behavior: Th e role of opinion leaders. Jou rnal of Marketing Research, 47 (5), 883-895. Narayan, V., V . R. Rao, C. Saunders. 20 11. How peer influence affects attribute preferences: A Bayesian updating mechanism. Marketing Science, 30 (2), 368-384. Nasiry, J., I. Popescu. 2012. Advance selling when consumers regret. Management Science , 58 (6), 1160-1177. Ockenfels, A., A. E. Roth. 2002. The ti ming o f bids in in ternet auctions: Market design, bidder behavior, and artificial agents . AI magazine, 23 (3), 79. Oestreicher-Singer, G. , A. Sundararajan. 2012. T he visib le h and? Demand effects of recommendation networks in electronic markets . M a nagement Scien ce, 58 (11), 19 63- 1981. Pai, P. F., W. C. Hong. 2006. So ftware reliability forecastin g b y support vector machines with simulated annealing algorithms. Journal of Systems and Soft ware , 79 (6), 747-755. Park, Y. H., E. T. Bradlow. 20 05. An integrated model for bidding b ehavior i n Internet auctions: Whether, who, when, and how much. Journal of Marketing Research , 42 (4 ), 470-482. Peluso, A. M. 2011. Con sumer satisfacti on: Advanceme nts in theory , modelin g, and empirical findings. Peter Lang . Bern, Switzerland Penny, W . 200 1. KL-Divergence of No rmal, Gamma, D irichlet and Wishart d ensities. http://www.fil.ion.ucl.ac.uk/~wpenny/publications/densities.ps Penny, W. 2002. An EM algorithm for Gaussian Markov Ran dom Fields. http://www.fil.ion.ucl.ac.uk/~wpenny/publications/gmrfs.pdf Peres, R., E. Muller, V. Mahajan. 2010. Innovation diffusion and new pro duct growth models: A critical review and research directions. International Journal of Research in Marketing, 27 (2), 91-106. Peter E. Ros si, G. M. All enby, R. E. McCulloch. 2005. Bayesian statisti cs and marketing. J ohn. Wiley & Sons . 265 Popescu, I., Y. Wu. 2007. D ynamic pricing strategies with r eference effects. Op erations Research , 55 (3), 413-429. Putsis, W . P. Jr., V. Sriniv asan. 20 00. E stimation techniques for macro diff usion models, In: Mahajan, V., Muller, E. and W i nd, Y. (Eds.), New Product Diffusion M odels. Bost on: Kluwer Academic Publishers , 263–291 Putsis Jr, W. P., S. B alasu bramanian, E . W. Kaplan, S. K. Sen. 1997. Mi xing b ehavior in cross- country diffusion. Marketing Science, 16 (4), 354-369. Raban, D. R . 2009. Self ‐ presentation and the valu e of infor mation in Q&A websites. Journa l of the American society for information science and technology , 60( 12), 2465-2473. Ren, Y., R. E. Kraut. 2014. Agent Based Modeli ng to Inform the Design of Multiuser Systems. In Ways of Knowing in HCI , 395-419. Springer New York. Roese, N. J. 1994. The functional basis of coun terfactual thinking. Journal of personality a nd Social Psychology , 66 (5), 805. Rogers, E. M. 1983. Diffusion of innovations. New Yo rk: Free Press , 18 (20), 271. Rossi, P. 2014. Ba y esian non-and semi-para metric methods and a pplications. Princeton University Press . Roth, A. E., A. Ockenfels. 2000. Last minute b iddin g and the rules for endin g second-price auctions: Theory and evidence from a natural experiment on the In ternet (No. w7729). National bureau of economic research. Roth, S., D.Sc hneckenb erg. 2 012. The Gamification of innovation. Creativity and i nnovation Management , 21 (4), 460-461. Sahgal, A. 2011. Gamification is serious business for marketers. CMO website. http://www.cmo.com/articles/2011/9/14/ g amification-is-serious-business-for- marketers.html. September 14, 2011. Accessed June 7, 2015. Salant, Y. 2011. Procedu ral anal ysis of choic e rules with applications to b ounded rationalit y. The American Economic Review , 101 (2), 724-748. Salcu, A. V ., C. Acatrinei. 2013. Gamification applied in affiliate marketing. Case stud y of 2Parale. Management and Marketing , 8 (4). 767-790. Schwartz, B. 2004. The Paradox of Choice: Why More Is Less, New York: Harper Collins . 266 Seetharaman, P. B. 2004. Mo deling multiple s ources of stat e dependence in r andom utility models: A distributed lag approach. Marketing Science , 23 (2), 263-271. Seetharaman, P. B., A . Ainslie, P. K. C hintagunta. 1999. Investigatin g household state dependence effects across categories. Journal of Marketing Research , 36 (4), 488-500. Selten, R. 1975. Reexamination of the per fectness concept for eq uilibrium points in extensive games. International journal of game theory , 4 (1), 25-55. Shen, L. , A. Fishbach, C . K. Hsee. 2014. The Motivatin g-Uncertainty Effect: Uncertainty Increases Resource Investment in the Process of Reward Pursuit. Journa l of Consumer Research, 41 (), 1301-1315. Shugan, S. M. 2005. Br and lo yalty pr ograms: ar e th ey sha ms?. Marketing S cience , 24 (2), 185- 193. Simon, H. A. 1972. Theories of bounded rationality. Decision and organization, 1 (), 161-176. Simonson, I. 1992. The influence o f anticipatin g r egret and responsibilit y on purchase decisions. Journal of Consumer Research , 19 (1), 105-118. Smith, C. W. (1990). Auctions: The social construction of value. Univ of C alifornia Press . Song, I., P. K. Chintagunta. 2003. A micromodel of new product adoption with heterogeneous and forward-loo king consu mers: Application to the d igital camera category. Quant itative Marketing and Economics, 1 (4), 371-407. Srinivasan, K., X. Wang. 2 010. Commentary-B idders' Experience and Le arnin g in Online Auctions: Issues and Implications. Marketing Science , 29 (6), 988-993. Steenkamp, J. B. E., H. Baumgartner. 1992. The role of optimum sti mulation level in ex ploratory consumer behavior. Journal of Consumer Research , 19 (3), 434-448. Stephen, A. T., O. Toubia. 2 010. Derivin g value fr om social commer ce n etworks. Journal o f marketing research, 47 (2), 215-228. Strehl, A., J. Ghosh 2 003. Cl uster ensembles---a knowl edge reuse framework for co mbining multiple partitions. The Journal of Machine Learning Research , 3 (1), 583-617. Su, X., F . Zhang. 2 009. On the v alue o f commit ment and availabilit y guarantees when selling to strategic consumers. Management Science , 55 (5), 713-726. Talukdar, D., K. Sudhir, A. Ainslie. 2002. In vestigating new product diffusion across products and countries. Marketing Science, 21 (1), 97-114. 267 Tedjamulia, S. J., D. L. Dean, D. R. Olsen, C. C. Albrecht. 2005. Motivating content contributions to onli ne communities: Toward a m ore comprehensive the ory. In System Sciences, 2 005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on , 193b-193b. Tifferet, S., R. He rstein. 2012. Gender differen ces in brand commitment, impulse bu ying, and hedonic consumption. Journal of Product & Brand Management , 21 (3), 176-182. Toubia, O., A. T. Stephen. 2013. Intrinsic vs. image-related utility in social media: Why do people contribute content to twitter?. Marketing Science , 32 (3), 368-392. Trusov, M., W. Rand, Y. V. Joshi. 2013. Improving Prelau nch Diffusi on Forecasts: Using Synthetic Networks as Simulated Priors. Journal of Marketing Research, 50 (6), 675-690. Tsiros, M ., V. Mittal. 2000. Regret: A model of its antecedents and consequences i n consu mer decision making. Journal of Consumer Research , 26 (4), 401-417. Tversky, A., D. Kahneman. 1992. Adv ances in prospect theor y: Cumulative representation of uncertainty. Journal of Risk and uncertainty , 5 (4), 297-323. V. Bi ttner, J., J. Shi pper. 2014. Mo tivational effects and age differences of ga mification in product advertising. Journal of Consumer Marketing , 31 (5), 391-400. Van den Bu lte, C., Y. V. Joshi. 2 007. N ew product d iffusion with influentials and imitators. Marketing Science , 26 (3), 400-421. Van Dijk, E ., M. Zeelen berg. 2 005. On the ps ychology of ‘if onl y’: Re gret and the comparison between factual and counterfactual outco mes. Organizat ional Behavior an d Human Decision Processes , 97 (2), 152-160. Van Grove, J. 2011. Gamification: How competition is reinventing business, marketing, and everyday li fe. Mash able website. http://mashable.com/2011/07/28/gamification /. July 28, 2011. Accessed June 7, 2015. Venkatesan, R., T. V. Krishnan, V. Kumar. 2004. Evolutionary estim ation o f macro-level diffusion models using genetic algorithms: An al ternative to no nlinear least squares. Marketing Science , 23 (3), 451-464. Wan, E. A., R. Van Der Merwe. 2001. The unscented Kalman filter. Kalm an filtering and neural networks, Wiley Publishing , Eds. S. Haykin, 221-280. 268 Wei, C., X. A. W ei, K. Z hu. 2 015. engaging the w isdom o f c rowds: structural anal y sis o f dynamic us er contributions in onli ne communities, under review at Infor mation Systems Research . Weiss, A. M., N. H. Lurie, D. J. MacInnis. 20 08. Listening to strangers: W hose response s are valuable, how valuable a re the y, and why?. Journal of Marketing Research , 45 (4), 425- 436. Wilcox, R. T. 2000. Ex perts and am ateurs: T he role of experience in Inter net auctions. Marketing Letters , 11 (4), 363-374. Wilson, D. R., T. R. Martinez. 2003. The general inefficiency of batch training for gradient descent learning. Neural Networks , 16 (10), 1429-1451. Wu, M. 2011. My chapte r on Gamification: From Behavior Model to Busin ess Strategy. Lithium science of so cial blog website. https://community.lithium.com/t5/Science-of-Social- blog/My-Chapter-on-Gamification-From-Behavio r-Model-to-Business/ba-p/33995. November 15, 2011. Accessed June 2, 2015. Xie, J., X. M. Son g, M. Sirbu, Q. W ang. 1 997. Kalman filter estimation of new product diffusion models. Journal of Marketing Research , 34 (), 378-393. Yang, B., A. T. C hing. 2 013. D y namics of consu mer adoption of financia l innovation: The case of ATM cards. Management Science, 60 (4), 903-922. Yao, S., C. F. Mela. 2008. Online auction demand. Marketing Science, 27 (5 ), 861-885. Yeung, K. Y., W. L. Ru zzo 2 001. Details o f the adjusted Rand index and cl ustering algorithms, supplement to the paper “An empirical stud y on principal component an alysis for clustering gene expression data”. Bioinformatics , 17 (9), 763-774. Yoganarasimhan, H. 2013. The value of rep utation in an online freelance marketplace. Marketing Science , 32 (6), 860-891. Young, H. P. 2009. I nnovation diffusion in heterogeneous populations: Conta gion, social influence, and social learnin g. The American economic review , 99 (5), 1899 -1924. Young, H. P., G. M. All enby. 2003. Modeling in terdependent consumer preferences. Journal of Marketing Research, 40 (3), 282-294. Zeelenberg, M., R. Pieters, 2004. B ey ond valence i n cu stomer d issatisfaction: a review and new findings on behavioral re sponses to regret and dis appointment in fail ed services. Journal of business Research , 57 (4), 445-455. 269 Zeithammer, R . 2006. Forward-looking biddin g in online auctions. Journal of Marketing Research , 43 (3), 462-476. Zeithammer, R., C. Adams. 2 010. The sealed-bid abstraction in on line auction s. Marketing Science , 29 (6), 964-987. Zhang, J., E. Breugelmans. 2012 . The im pact of an i tem-based loyalt y p rogram on con sumer purchase behavior. Journal of Marketing research , 49 (1), 50-65. Zhuang, H., K. W ang, Z . S. Roth. 1994. Optimal selection of measurement configura tions for robot calibration u sing simulated annealin g. I Robotics and Aut omation, 1994. Proceedings., 1994 IEEE International Conference on , 393-398. Zichermann, G., C. Cunningham. 2 011. Gamification by design: Implementing game mechanics in web and mobile apps. O'Reilly Media, Inc. Zichermann, G., J. Linder. 2010. Game-based marketing. Inspire customer lo yalty through rewards, challenges, and contests, John Wiley and Sons , New York. VITA Meisam Hejazi Ni a was born in T ehran, Iran. Meisam attended Alla me Helli (ex ceptional talents) hi gh school, durin g 1999-2003. He received the Bronze medal for n ational mathematics Olympiad contents in 20 02. He completed his undergraduate coursework as top 1% at Amir kabir University of Tehran (Pol ytechnic Univ ersity o f Tehran) double majoring in computer engineering and informa tion technology en gineering. He e arned hi s m aster degree i n software system engineering from th e same institute, an d a n MBA from Sharif U niversity of T echnology. He p ublished his research on various information s ystems (e.g., commercial distributed processing) and marketin g ( mobile marketing) is sues i n IJMM, J ET WI, a nd JAC S journals. He has 3 years of experience as information s ystem specialist and manager in an international t extile company, and 2 years of ex perience as product s pecialist and manager in marketing department of an international t ele com op erator (MTN) in Tehran. He entered the Ph.D. pro gram in Management Science (Marketing) at the Naveen Jindal School of Mana gemen t of The Unive rsity of Texas at Dallas i n August 2012. Durin g the course o f his doct oral studies, Meisam present ed his work in Marketing Science, Marketin g Dynamics, International Industr ial Organization, NYU Di gital Bi g Data, AMA and POMS conferences. Prior to the completion of his studi es he also held “senior e-commerce Data Scientist” p osition at Saber Airline Solutions in Southlake, Texas.

Social Big Data Analytics of Consumer Choices: A Two Sided Online Platform Perspective

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment