Recommending Investors for Crowdfunding Projects

Recommending In vestor s f or Cr o wdfunding Pr ojects Jisun An ∗ Qatar Computing Research Institute jan@qf .org.qa Daniele Quercia Y ahoo Labs, Barcelona dquercia@acm.org Jon Crowcroft University of Cambridge Jon.Crowcroft@cl.cam.ac.uk ABSTRA CT T o bring their innov ati ve ideas to mark et, those embarking in ne w ventures ha ve to raise mone y , and, to do so, they hav e often re- sorted to banks and venture capitalists. Now adays, they hav e an additional option: that of crowdfunding. The name refers to the idea that funds come from a network of people on the Internet who are passionate about supporting others’ projects. One of the most popular crowdfunding sites is Kickstarter . In it, creators post de- scriptions of their projects and advertise them on social media sites (mainly T witter), while in vestors look for projects to support. The most common reason for project failure is the inability of founders to connect with a sufﬁcient number of in v estors, and that is mainly because hitherto there has not been an y automatic way of matching creators and in vestors. W e thus set out to propose different ways of recommending inv estors found on T witter for speciﬁc Kickstarter projects. W e do so by conducting hypothesis-dri ven analyses of pledging behavior and translate the corresponding ﬁndings into dif- ferent recommendation strategies. The best strategy achieves, on av erage, 84% of accuracy in predicting a list of potential in v estors’ T witter accounts for any given project. Our ﬁndings also produced key insights about the whys and wherefores of inv estors deciding to support innov ati ve ef forts. Categories and Subject Descriptors J.4 [ Computer Applications ]: Social and behavioral sciences K eywords Kickstarter , T witter , Cro wdfunding, Recommending systems 1. INTR ODUCTION Kickstarter is a crowdfunding website where a founder proposes a project (e.g., smart watch, documentary , video game) and asks the Internet crowd for money . Its use has been growing exponentially: “The amount pledged on Kickstarter alone grew from $28m in 2010 to $320m in 2012” [7]. Howe v er , not all projects are successfully ﬁnanced. One of the most common reasons for failure is that project founders fail to build a community around them and attract in vestors. A recent study has indeed found that “the majority of failed project creators cited the inability to successfully leverage an online audience as a main reason for failing. ” [14]. That is why we set out to propose automatic ways of matching Kickstarter founders with T witter in- vestors. In so doing, we make the following main contrib utions: ∗ This work was conducted when the author was in University of Cambridge. Investors) Founder) Project) Frequent)vs.)Oc casional) - Number'of'supported'projects' Management)skills) - Updates' - Comments' - Fine6grained'reward' - We b' si t es ' Pledging)goal) - Monetary'goal' Local)vs.)Global) - Geographic'dispersion' Growth) - Growth'rate'per'hour' Matching)interests) Figure 1: Aspects hypothesized to affect pledging beha vior . W e hav e three actors: (frequent vs. occasional) in vestors, a founder (with speciﬁc pr oject management skills), and the pr oject itself. • W e derive a set of well-grounded hypotheses related to pledg- ing behavior (Section 3). • W e crawl data from Kickstarter, including detailed project descriptions and lists of in vestors, for a period of 3 months (Section 4). Also, since we need to recommend in v estors from T witter , we gather all the tweets related to the projects we previously cra wled. • Upon those two datasets, we test a set of of the hypothe- ses (Section 5). W e ﬁnd that inv estors behave differently depending on whether they are frequent or occasional sup- porters on the site. As opposed to occasional inv estors (51% of the in vestor base supported less than 4 projects), frequent ones (11% supported more than 32 projects) tend to fund ef- forts that are well-managed and match their own interests. By contrast, occasional inv estors pay less attention to any of those aspects and thus act as donors, mainly on art-related projects. • Upon the quantitative analysis, we build a statistical model to recommend potential in vestors from T witter (Section 6). Our model achiev es 84% of accuracy in predicting an unordered list of inv estors, and an average percentile-ranking of 0.32 (i.e., a 36% gain over the random baseline) in predicting an ordered list. Also, in situations of in vestor cold start (no pre- vious information about an in v estor is av ailable), we are still able to predict who funds what with an accuracy of 69% from T witter-deri v ed activity features, and an average percentile- ranking of 0.40 (20% gain ov er the random baseline). W e conclude by discussing the theoretical and practical implica- tions of our ﬁndings (Section 7). 2. RELA TED WORK The ﬁrst cro wdfunding site started in 2001 and no w there are more than 450 of such sites. The y together hav e raised $2.8 billion and successfully funded more than 1 million projects only in 2012 [17]. Kickstarter is the largest site in USA. In it, a founder proposes a project by posting information about the project’ s purposes, mon- etary goals, time left to reach those goals, the way funds will be used, and potential rewards (e.g., the founder might offer a signed CD in exchange of a donation). T o increase his/her likelihood of success, the founder usually posts videos and pictures to visually explain the project. (S)he also connects the project’ s page to dedi- cated social-networking accounts [18]. As cro wdfunding sites ha ve emerged, small entrepreneurs without an access to traditional ven- ture capital’ s fundings hav e beneﬁted from this new source of cash ﬂow . Crowdfunding has recently attracted the attention of researchers in various disciplines, from business and economics to computer science. Economists have in vestig ated pleading behavior and they , for example, found that crowdfunding eliminates distance-related economic frictions, yet initial ﬁndings tend often to come from family , friends and acquaintances [1]. Most of the work by computer scientists has focused, instead, on predicting whether projects will be successfully funded or not. Mollick found that variables under two categories - preparedness (e.g., existence of video, spelling check, number of updates) and social capital (e.g., number of the founder’ s Facebook friends) - are strongly related to the success of a project [18]. Greenberg et al. found that SVM could predict, at the time of launch, whether a project will fail or succeed with a roughly 68% accuracy [12]. More recently , based only on the use of language in project descriptions, Gilbert et al. were able to predict failure or success - they indeed found that there are speciﬁc phrases that are powerful predictors of success [11]. These phrases are mainly related to six general persuasion principles: 1) reciprocity , 2) scarcity , 3) social proof, 4) social identity , 5) liking, and 6) authority . After launch, one could also track features that change as the project ev olves. In this vein, upon the time series of mone y pledged and tweets, Etter et al. were able to predict success/failure with an accuracy of 85% at early stages - that is, just after 15% of the entire duration of a campaign has passed [9]. Hui et al. conducted a throughout qualitative analysis based on 45 interviews and found that the work behind setting up a crowd- funding project unfolds in ﬁve main steps: 1) prepare; 2) test; 3) publicize; 4) follow through; and 5) contribute. They then went on recommending which tools computer scientists could build for sup- porting each of the steps [14]. The most difﬁcult step was identiﬁed to be the third one: founders repeatedly failed to build a community and attract potential inv estors: “The majority of failed project cre- ators cited the inability to successfully le v erage an online audience as a main reason for failing. ” [14] Based on this literature revie w , we might conclude that an auto- matic way of matching projects with in vestors is needed. T o pro- pose such a way , we carry out an analysis that unfolds in three steps: derive fe w hypotheses concerning pledging beha vior; 2) col- lect data from Kickstarter and T witter to test those hypotheses; and 3) based on the ﬁndings, propose and ev aluate models that match projects with potential in vestors. 3. INVESTORS VS. DONORS Pledging behavior might well differ from one in vestor to another . It has been found that 20-40% of initial fundings in Kickstarter come from family and friends [8]. These individuals tend to be newcomers or occasional inv estors who support projects because of their personal relationships with the founders. By contrast, users who are very activ e in Kickstarter are passionate about the commu- nity and fund a project for different reasons. T o test the extent to which this distinction impacts pledging behavior , we differentiate in vestors depending on whether the y are occasional (they have sup- ported, say , less than 4 projects) or frequent (they have supported more than 32 projects), and formulate a set of hypotheses, which T able 1 collates for conv enience. W e expect that the more activ e a supporter has been, the more (s)he will behave as an in vestor , and the less as a friend donating money . More speciﬁcally , as opposed to occasional in vestors, frequent ones are e xpected to: Pay Attention to F ounder Skills . Successful venture founders are good managers as well: “Many entrepreneurs make the mistake of thinking that venture capitalists are looking for good ideas when, in fact, they are looking for good mangers in particular industry seg- ments. ” [24] W e e xpect that frequent in vestors will pay attention to the way a project is managed. Since management of a Kickstarter project translates into frequent updates after launch and audience interactions, our ﬁrst hypothesis is: [H1] A pr oject is likely to be ﬁnanced by frequent in vestors if its founder: [H1.1] frequently up- dates the project after launching it (i.e., (s)he spends extra effort to make it happen [19]); [H1.2] answers the potential investor s’ r equests (i.e., she interacts with the audience); [H1.3] allows for ﬁne-grained funding le vels ; and [H1.4] sets a dedicated web site . In vest in “High Capital” Projects . Since Kickstarter follows “all- or-nothing” model (i.e., projects that do not reach their pledging goals do not receive a penny), founders tend to set realistic goals for the amount to be raised. It is reasonable to assume that projects with high goals are ambitious (e.g., bringing a new video game to market) and thus tend to be preferentially ﬁnanced by frequent and experienced in vestors [21]: [H2] A project with a high goal is lik ely to be ﬁnanced by fr equent in vestors. In vest in Geographically Global Projects . Since friends and ac- quaintances tend to be geographically close, we expect that those who support (geographically) local projects are occasional in vestors, while frequent ones would also support global projects [1]. W e de- ﬁne the geographic dispersion of project p as: G p = 1 N p X b ∈ B p D ( l f , l b ) (1) which measures the mean distance for all pledges of a project (dis- tance between a project p and an in vestor b ). In Kickstarter , founders and in vestors tend to reveal their location in terms of city . W e thus con vert city names into geographic coordinates of the correspond- ing centroids (latitude and longitude) and measure the Harvesine distance D between the founder’ s location ( l f ) and each in vestor’ s location ( l b ). N p is number of all in vestors for project p . W ith this deﬁnition, low geographic dispersion is associated with projects with inv estors who live close to the founder, while high dispersion is associated with inv estors who live far away . The corresponding hypothesis then reads: [H3] A local pr oject is likely to be supported by occasional in vestors. Pay Attention to Fast Growing Projects . As opposed to occa- sional inv estors, frequent ones are familiar with the site and are Hypothesis [H1] A pr oject is likely to be ﬁnanced by fr equent in vestors if its founder: [H1.1] fr equently updates the pr oject after launching it. [H1.2] answers the potential in vestors’ r equests. [H1.3] allows for ﬁne-grained funding levels. [H1.4] sets a dedicated web site. [H2] A pr oject with a high goal is likely to be ﬁnanced by fr equent in vestors. [H3] A local pr oject is likely to be supported by occasional in vestors. [H4] A fast-gr owing pr oject is lik ely to be ﬁnanced by fr equent in vestors. [H5] Active in vestors tend to fund pr ojects that match their own interests. T able 1: List of Hypotheses in this Study . comics design fashion theater publishing dance photography film food games music art technology Category # of projects 0 50 100 150 200 Figure 2: Frequency distribution f or each category . thus expected to be able to quickly spot fast-growing opportuni- ties [21, 22]: [H4] A fast-gr owing project is likely to be ﬁnanced by fr equent in vestors . In vest in Project Categories of Interest . When deciding whether to fund a project or not, frequent in v estors might well be looking for projects that match their own interests, while occasional in vestors do not concern that as they , for example, tend to support a friend. [H5] Active in vestors tend to fund pr ojects that match their own inter ests. T o capture each in vestor’ s interests, we keep track of: 1) which project cate gories (s)he supports, and 2) the topics classiﬁed using LD A topic modeling [4, 13, 23] (s)he mentions on T witter . 4. D A T ASET Since Kickstarter is not accessible from an API, we need to build a crawler running on Univ ersity servers on which we run our data analysis as well. W e gather all the projects featured on the Re- cently Launched 1 Kickstarter page between July 2013 and Octo- ber 2013. Once a ne w campaign is identiﬁed, we crawl its cat- egory (e.g., Film, Dance, Art, Design and T echnology), funding goal, and deadline. W e then regularly check each project’ s page for any change in the amount of pledged money and total number of in vestors. T o ha ve a comparable dataset of projects, we ha ve elimi- nated 345 Kickstarter projects that happened to be outside USA. In so doing, we hav e collected information about 1,149 projects that were funded by 78,460 inv estors with a total number of 177,882 pledges. These projects are classiﬁed in 13 categories, and the most popular ones are Film, Music, Publishing, and Art (Figure 2). During the same period of time, we collected all tweets 2 contain- ing the keyword “ kickstarter ” from the publicly av ailable T witter search API. If a tweet matches one of our Kickstarter projects (if a tweet contains project title or shortened url directing to the project 1 http://www.kickstarter.com/discover/ recently- launched 2 A server located at Computer Laboratory , University of Cam- bridge was used to collect T witter data. Successful F ailed T otal Projects 520 629 1,149 Proportion 45.3% 54.7% 100% In vestors - - 78,460 Pledges 148,257 29,625 177,882 Pledged ($) 10,517,919 1,872,741 12,390,660 T weets 49,943 21,372 71,315 T able 2: Statistics for the Kickstarter dataset. Successful Failed T otal Goal ($) 11,033.90 30,716.86 20,875.38 Duration (days) 28.56 29.25 28.91 Number of in vestors 285.11 47.09 166.10 Pledge ($) 79.71 60.13 68.99 Final amount 168.93% 19.51% 94.22% Number of tweets 101.93 44.43 73.18 T able 3: Statistics for the Kickstarter projects. All reported are the av erage of each measure f or Kickstarter pr ojects. page), we match the tweet’ s content with the project, resulting in a total of 71,315 tweets. T able 2 reports general statistics of our Kickstarter dataset, and T able 3 reports statistics speciﬁcally about the projects. The num- bers in T able 3 are the average of all projects. Out of our 1,149 projects, $12.3M were pledged and 520 projects (45.2%) were suc- cessfully funded (i.e., they met their pledging goals). This suc- cess rate is similar to the general one published by Kickstarter it- self 3 : 43.85%. On av erage, as opposed to unsuccessful projects, the ones that are successfully funded tend to ha ve lower ﬁnancial goals ($11,033 vs. $30,716); hav e more in vestors (285 vs. 47), raise more funds than their goals would require (169% vs. 19%), and generate more tweets (101 vs. 44) (T able 3). In our dataset, 85% of dona- tions get into successful projects (this is 86% in Kickstarter). The av erage duration of a successful campaign is 28.9 days; howev er , it takes just few days (13 days) to be fully ﬁnanced. On the other hand, unsuccessful campaigns, which are 54.8%, take just 19.5% of the required in vestment (this was 20% by Kickstarter in 2012 [6]). Since the pre vious statistics in our dataset match those in the lar ger sample, we conclude that our dataset is fairly representati ve. People who back a project for the ﬁrst time often go on to back other projects. Among 78,460 people who ha ve backed one of projects in our dataset, 22K (28%) people ha ve backed tw o or more projects (Kickstarter has reported 29% of all backers as repeat back- ers). On av erage, inv estors in our dataset supported three projects. 3 http://www.kickstarter.com/help/stats. All statistics reported are retriev ed on 3rd Nov ember 2013. ~4) [4~8) [8~16) [16~32) [32~ Activity (Number of projects backed) # of backers 0 10000 20000 30000 40000 Figure 3: Distribution of in vestor activity levels. These levels are quantiﬁed using the number of projects supported by each in vestor . Min Max Mean Distribution #updates 0 42 3.5 #comments 0 7298 22 Reward level 1 52 10 W eb site Goal ($) 47 3M 22K Geographic dispersion 0 76 12 Growth rate 0 1.7 0.4 T able 4: Our predictive features: minimum, maximum, mean, and frequency distributions (a-h). The x -axis reports the v alues of each feature (which is log-transf ormed if skewed), and the y - axis reports the number of users. W e segment them into two groups - occasional inv estors who funded less than 4 projects (51%), and frequent ones who funded more than 32 projects (11%). Figure 3 displays the frequency distribu- tion of their activity levels. W e also display distribution of each project feature in T able 4. Since the distributions of the features are ske wed, we shown their log-transformed distributions if it is necessary . 5. PLEDGING BEHA VIOR Having this dataset at hand, we are now able to quantitativ ely ana- lyze in vestors’ pledging behavior . T o do so, we will often resort to the probability that a in vestor of type B (e.g., occasional) will fund a project of type P (e.g., projects of smart watches): p ( B | P ) = p ( B ∩ P ) p ( P ) (2) W e compute this probability by counting the fraction of inv estors of type B who funded projects of type P (e.g., occasional inv estors who funded projects of smart watches) out of all in vestors who backed projects of type P (e.g., in vestors of any type who funded projects of smart watches). When testing our hypotheses, we will compute the probability of funding a project P for dif ferent in- vestor types, and type is deﬁned depending on the le vel of pledging activity: from occasional in vestors who supported less than four projects, to investors who supported less than 8, up to frequent in vestors who supported more than 32. One of the probabilities p ( B | P ) could then be p ( occasional | P ) , which is the probability of an occasional inv estor to fund project P . After clarifying our notation, we are now ready to test each of the main ﬁ ve h ypotheses one by one. [H1] A pr oject is likely to be ﬁnanced by frequent in vestors if the pr oject founder: [H1.1] fr equently updates the pr oject after launc hing it ; [H1.2] answers the potential in vestor s’ r equests ; [H1.3] allows for ﬁne-grained funding le vels ; [H1.4] sets a dedicated web site . W e ﬁnd that frequent in vestors are more likely to pledge projects with frequent updates (Figure 4(a)) and higher le vel of founder en- gagement (number of comments) (Figure 4(b)). As the number of comments increases by an order of 2, the pledging probabil- ity increases by 10%. Funding lev els (Figure 4(c)) and dedicated web sites do matter , but to a lesser extent compared to the previ- ous two features. W e also compute the Pearson’ s correlation co- efﬁcients between a management strategy and inv estor’ s activity lev els. These coefﬁcients range from -1 (strongest negati ve corre- lation) to 1 (strongest positive correlation), and are 0 when there is no correlation. Frequent inv estors seem to decide whether to fund a project or not depending on the number of updates done by the founder ( r = 0 . 26 , p < 0 . 05 ) and on the number of comments the project has received ( r = 0 . 19 , p < 0 . 05 ). The presence of differ- ent rew ard lev els ( r = 0 . 05 , p < 0 . 05 ) and of a dedicated web site ( r = 0 . 10 , p < 0 . 05 ) are considered by occasional and frequent in vestors alik e. Overall, the ﬁrst hypothesis is supported. [H2] A pr oject with high pledging goal is likely to be ﬁnanced by fr equent in vestors. By dividing projects into 5 categories depending on their plead- ing goals, we ﬁnd that, the higher a project’ s goal, the less likely occasional in vestors support it. By contrast, frequent inv estors are more likely to fund high-goal projects (Figure 4(d)). The correla- tion between inv estor activity and the pledging goals of supported projects is indeed positiv e: r = 0 . 21 , p < 0 . 05 . Hence the sec- ond hypothesis is also conﬁrmed. From the perspectiv e of a rec- ommender system that matches inves tors with projects, this result suggests that high-goal projects should be preferentially matched with frequent in vestors. [H3] A local pr oject is likely to be ﬁnanced by occasional in vestors. As we mentioned in Section 3, we compute the average geo- graphic span of a project’ s in vestors to measure the e xtent to which a project attracts local vs. global fundings. W e then plot the pledg- ing probability as a function of geographic span (Figure 4(e)) and ﬁnd that occasional inv estors largely fund projects with low ge- ographic span (i.e., local projects), while frequent inv estors fund projects with high span. The correlation coef ﬁcient between in- vestor activity and geographic span is r = 0 . 32 , p < 0 . 05 , and that supports the third hypothesis. For a recommender system, this result means that local projects should be matched with local Kick- starter users who tend to be occasional in vestors. [H4] A fast-gr owing pr oject is likely to be ﬁnanced by frequent in- vestors . 0.2 0.4 0.6 1 7 Number of updates p(B, P) Backer activity ~4) [4-8) [8-16) [16-32) [32~ Occasional investors Frequent investors (a) [H1.1] #updates                                         0.2 0.4 0.6 0 2 8 32 Number of comments p(B, P) Occasional investors Frequent investors (b) [H1.2] #comments                     0.1 0.2 0.3 0.4 0.5 7 16 Number of lev els in rew ard p(B, P) Occasional investors Frequent investors (c) [H1.3] Rew ard le vel 0.2 0.4 0.6 [1096~2980) [8103~59974) [59974~ Project Goal ($) p(B, P) Backer activity ~4) [4-8) [8-16) [16-32) [32~ Occasional investors Frequent investors (d) [H2] Goal ($)                               0.0 0.2 0.4 0.6 0.8 0 2 8 Geographic trav el p(B, P) Occasional investors Frequent investors (e) [H3] Geographic dispersion                                              0.0 0.2 0.4 0.6 0.2 0.4 0.6 0.8 1 Growth rate of a vg backers per hour ((x,5)=g(0.5)) p(B, P) Occasional investors Frequent investors (f) [H4] Growth rate Figure 4: Probability of in vestor B to fund project P , as a function of (a) the number of updates made by the project’ s founder; (b) the number of comments received by the project; (c) the rewards levels made available to investors; (d) the project’ s pledging goal; (e) the in vestors’ geographic span; and (f) the project’ s gr owth rate. W e conﬁrm this hypothesis as well since we ﬁnd that frequent in vestors tend to indeed support high-growth projects (Figure 4(f)). By contrast, occasional in vestors do not select the projects to sup- port depending on gro wth rate - they just happen to support the majority of projects that are characterized by limited growth. The correlation between in vestor acti vity and project growth is positi ve and is r = 0 . 17 , p < 0 . 05 . [H5] F r equent in vestors tend to support pr ojects that match their own inter ests. T o test this hypothesis, we consider in vestors who are on T witter and crawl 200 tweets (at most) for each of them using the T witter Public API. T o compute the topical similarity between a project’ s description and an inv estor’ s tweets, we run the topic model La- tent Dirichlet Allocation (LD A) on the tweets and the project de- scriptions. As a result, each project is represented by a topic vec- tor , and each in vestor’ s T witter account is represented by another topic vector . T o assess whether a project’ s description matches an in vestor’ s interests, we simply compute the cosine similarity be- tween the project’ s topical vector and the inv estor’ s. W e do so for all project-inv estor pairs and ﬁnd that frequent inv estors do indeed support projects that match their own interests, while occasional in vestors’ topical interests are not really matching to projects they supported. W e also ﬁnd that frequent in vestors fund projects in a variety of categories, while occasional ones tend to stick with the same category , if they happen to fund more than one project. The correlation between in vestor activity and project-inv estor co- sine similarity is positiv e: r = 0 . 20 , p < 0 . 05 . In practice, this suggests that topical matching between projects and inv estors tends to work better for frequent in v estors than for occasional ones. 5.1 Summary W e ﬁnd that frequent in vestors are likely to fund projects that are well-managed; ha ve high pledging goals; are global; grow quickly; and match their interests. Occasional inv estors, instead, do not seem to base their decisions on those aspects. W e might thus infer that those who have supported a considerable number of projects act in ways similar to how in vestors would do, while occasional supporters appear to be behaving as charitable donors. As hinted by a ne ws article [8], we suspect that occasional in v estors are lured into Kickstarter by their own friends and family members who might happen to be on Facebook. W e thus expect that, to be suc- cessful, projects funded by occasional in vestors should be char- acterized by a considerable number of Facebook friends. T o see whether this is the case, we plot the probability that an in vestor sup- ports a project as a function of the number of the project founder’ s Facebook friends (Figure 5), and indeed ﬁnd that projects whose founders ha ve many Facebook friends tend to attract occasional in- vestors, while founders with moderate numbers of Facebook friends attract frequent in vestors, partly conﬁrming our e xpectation. 0.1 0.2 0.3 0.4 0.5 ~54) [54~148) [148~403) [403~1096) [1096~ Number of creator's Facebook friends p(B, P) Backer activity ~4) [4-8) [8-16) [16-32) Frequent investors Occasional investors Figure 5: Probability that in vestor B funds pr oject P as a func- tion of the number of the project f ounder’ s F acebook friends. 6. RECOMMENDING INVESTORS Based on the previous results, we are now able to recommend po- tential in vestors for a speciﬁc project. W e do so by using logistic regression (LR) and Support V ector Machine (SVM). W e use three different SVM kernels: linear, polynomial, and RBF (Radial Ba- sis Function). The latter is more ﬂexible and general than the ﬁrst two as it copes with situations in which the relationships between features are non-linear . T o recommend potential investors who are on T witter , we need to link Kickstarter users to their T witter accounts ﬁrst. W e do so by matching the names of Kickstarter users interested in a project with T witter users mentioning the project. In so doing, we end up with 7,429 inv estors who are on T witter, and with 891 projects they had funded. T o preliminarily test the accuracy of such matching, we randomly select 200 matching and manually inspect them: the resulting accuracy is 92%. 6.1 Experimental Setup W e initially formulate the task of predicting who funds what as a binary classiﬁcation problem. For each project-inv estor pair , we predict whether the in v estor supports the project (prediction is 1) or not (prediction is 0). That translates into having, for each project, an unordered list of T witter users who are likely to fund it. W e run those predictions on input of features that are both static and dynamic . Static features are permanently set at the start of the campaign and include a project’ s pledging goal, reward lev el, and category . They also include an in vestor’ s past supported project categories and his/her interests expressed on T witter . Dynamic fea- tures, instead, change as the campaign unfolds and include pledge growth rate, number of project updates, geographic dispersion of in vestors, and the number of comments exchanged between the founder and the community . #Updates #Comments Reward level Goal Gro wth rate Geographic dispersion #Comments 0.67 ∗ Reward level 0.12 0.03 Goal 0.60 ∗ 0.85 ∗ 0.19 Gro wth rate 0.34 0.12 0.33 0.11 Geo-D 0.12 0.21 0.16 0.23 0.13 Activity lev el 0.26 0.19 0.05 0.21 0.32 0.17 T able 5: Pearson correlation coefﬁcients between each pair of features. Coefﬁcients greater than ± 0 . 5 with statistical signiﬁ- cant level < 0.05 are marked with a ∗ . Model Featur es A CC P R F 1 A UC LR Static 0.57 0.57 0.55 0.56 0.57 Dynamic 0.57 0.58 0.55 0.56 0.57 SVM-linear Static 0.58 0.60 0.51 0.55 0.58 Dynamic 0.58 0.60 0.50 0.55 0.58 SVM-poly Static 0.80 0.81 0.75 0.79 0.80 Dynamic 0.68 0.76 0.54 0.63 0.68 SVM-RBF Static 0.82 0.79 0.83 0.82 0.81 Dynamic 0.73 0.75 0.68 0.71 0.73 T able 6: Prediction results with the balanced test dataset (50/50 split). Since our data only includes positiv e cases – that is, the set of pledges that actually happened – we need to augment our dataset with neg ativ e cases (by under-sampling them): we do so by adding an equal number of negativ e cases – that is, with a set of random project-in vestor pairs. By construction, the resulting sample is bal- anced (the response variable is split 50-50), and the accuracy of a random prediction model would thus be 50%. T o ev aluate the performance of the logistic regression and SVM without running into the problem of o ver -ﬁtting, we perform 5-fold cross validation. That is, we randomly split the projects into two subsets: the ﬁrst contains 80% of the projects and is used for train- ing; the second set contains 20% of the projects and is used for testing. W e repeat this split for 5 rounds and av erage the perfor- mance results across those rounds. As evaluation metrics, we resort to those that are widely-used in classiﬁcation problems: Accuracy (ACC), Precision, Recall, F- score, and Area Under the receiv er-operator characteristic Curve (A UC). 6.2 Experimental Results Before training any of the models, we compute the (Pearson) cor- relation coefﬁcient between each pair of project features (T able 5). W e ﬁnd that few features are correlated with each other (i.e., there are high positi v e correlations (where r > 0 . 50 ) between the pledg- ing goal, the number of updates and the number of comments). Since it is not useful to simultaneously use all the features in the classiﬁcation task, the input for the LR will include only the fea- tures that are not strongly correlated with each other (i.e., we only include the number of comments among those three features). 0.60 0.65 0.70 0.75 0.80 0.85 C CR CRS CRSG CRSGE ALL Feature Sets Prediction accuracy Figure 6: Model trained with different subsets of features Prediction with balanced dataset. T able 6 shows the results for our prediction models for balanced dataset on input of both static and dynamic features. W e ﬁnd that SVM with polynomial and RBF kernels work best, suggesting that our data points are not linearly separated in the hyperplane. Interestingly , on input of static fea- tures, the best classiﬁer achie ves 82% accuracy (A CC); on input of dynamic features, the accuracy slightly degrades (73%); while, as one e xpects, on input of both types of feature, the accuracy slightly increases to 84%. These predictions are done upon the complete set of uncorrelated features. Howe v er , to kno w which feature indi vidually matters and which does not, we re-run our classiﬁcations on input of dif fer - ent combinations of the following features: number of comments (C), reward lev els (R), geographic span (S), growth rate (G), cat- egory matching (E) and topic similarity (TS). Again, we exclude two features: pledging goal and number of updates as they are both strongly correlated with the number of comments. Figure 6 shows the corresponding prediction accuracies: all features individually help to predict pledging behavior , but adding category matching and topical similarity results in considerable performance improve- ments. W e can then conﬁrm that by visual inspection of Figure 7. This plots the probability that an in vestor with a given lev el of ac- tivity funds a project in a given category . W e see that occasional in vestors (red bar segments) support music projects, while active in vestors (purple bar segments) support projects with high pledg- ing goals in, say , the gaming industry . Prediction with imbalanced dataset. Our ev aluation has so far assumed a 50-50 split between positiv e and negati v e cases. As this might not always be the case, we create an alternative test set with a 20/80 split (positive/ne gati ve): we train our models still on the balanced set b ut test them on the newly created unbalanced test set. W e ﬁnd that the results are similar to those obtained before (T able 7), yet with a minor degrade in precision. Howe v er , the accuracy (A CC) still remains as high as 82%. 6.3 Ranking in vestors T o go beyond binary classiﬁcations, we no w set out to rank in- vestors. So the problem is now , giv en a project, to return a ranked list of in vestors. As e v aluation metrics, we resort to tw o measures widely-used in ranking problems: MeanRR (Mean Reciprocal Rank) and MaxRR (Maximum Reciprocal Rank). W e denote r ank i,p the percentile- ranking of in vestor i within the ordered list of in vestors predicted for project P : r ank i,P = 0% if in vestor i is predicted to be the Model Featur es A CC AUC LR Static 0.56 0.57 Dynamic 0.57 0.57 SVM-linear Static 0.60 0.58 Dynamic 0.61 0.59 SVM-poly Static 0.81 0.80 Dynamic 0.77 0.70 SVM-RBF Static 0.82 0.81 Dynamic 0.74 0.73 T able 7: Prediction results with the imbalanced test set (20/80 split). Model Featur es MeanRR MaxRR Random - 0.50 0.87 SVM-RBF Static 0.34 0.39 Dynamic 0.37 0.40 All 0.32 0.38 T able 8: Ranking Results. most desirable for project P . Starting from this deﬁnition of rank, we can then formulate the total av erage percentile-ranking as: r ank = P i,P f unded i,P ˙ r ank i,P P i,P r ank i,P (3) where f unded i,P is a ﬂag that reﬂects whether inv estor i has sup- ported project P : it is 0, if i did not support it; otherwise, it is 1. The lo wer a list’ s r ank , the better the list’ s quality . For ran- dom predictions, the expected value for r ank is 0.5. Therefore, a r ank < 0.5 is associated with an algorithm better than random. Giv en a ranked list, MeanRR returns the a verage rank score for the “correct” inv estors (i.e., in vestors who actually supported the project), while MaxRR returns the score of the highest ranked cor- rect in v estor (i.e., highest ranked among the in v estors who actually supported the project). The lower these two metrics, the better the ranking. T o ranking in vestors, we opt for the model that previously showed the best performance: SVM with RBF kernel. This model returns the probability of the outcome to be “1” (i.e., the probability that an in v estor B will support a project P ). Upon our test set, for each project, we sort all users (a union set of training and test sets, which is the total number of 7,429 T witter users) by this probability and recommend those who score the highest. In this way , it is similar to content-based recommendation. T able 10 compares SVM’ s rank- ing accuracy with random model’ s. W e can see that based only on static features, we can achieve percentile ranking of 0.34, which is lower than random model. 7. DISCUSSION W e now discuss the theoretical and practical implications of our work. 7.1 Theoretical Implications There is a debate about the moti v es of crowdfunding in vestors. Ini- tially , they were seen as donors [15, 20]: “Some crowdfunding ef- forts, such as art or humanitarian projects, vie w their funders as patrons or philanthropists, who expect nothing in return. ” [3]. In the same v ein, Gerber et al. listed the following motiv ations for in- vestors: seek (non-ﬁnancial) rewards, support creators and causes, engage and contribute to a trusting and creative community [10]. 0.00 0.25 0.50 0.75 1.00 comics design fashion theater publishing dance photography film&video food games music art technology Category p(B, P) Backer activity ~4) [4-8) [8-16) [8-16) [8-16) [8-16) [16-32) [16-32) [16-32) [16-32) [32~ [32~ [32~ [32~ ~4) [4-8) ~4) [4-8) ~4) [4-8) Figure 7: Probability that in vestor B (with a given le vel of activity) funds pr oject P across differ ent pr oject categories. More recently ho we ver , crowdfunding sites ha v e been increasingly attracting a variety of founders: from small entrepreneurs who tra- ditionally relied on the 3Fs (friends, family , and fools [2]) to big companies that now use those sites as marketing tools. In line with these changes ov er the years, we hav e found that in v estors also tend to be of dif ferent types: pledging behavior of frequent in- vestors is very different from that of occasional ones. The former act as proper in vestors, while the latter act as donors. They gener- ally support different projects: art projects (e.g., music, dance) are largely funded by occasional inv estors, while projects on technol- ogy , games, and comics are funded by frequent ones. This suggests that pledging campaigns need to identify the right target in vestors to be successful. Artistic projects should rely on the traditional 3Fs (friends, family , and fools), perhaps employing social media sites to efﬁciently reach them. By contrast, technology projects should broaden their search and look for activ e and frequent in v estors. 7.2 Practical Implications The good news is that we hav e shown that it is possible to iden- tify and recommend frequent in vestors. Howe ver , it might not be sustainable to simply recommend inv estors only out of Kickstarter users: such an in vestor pool would limited and we could conse- quently end up recommending the same in v estors over and ov er again. T o see whether we could expand the in vestor pool, it might be beneﬁcial to study whether we could match unknown inv estors (who are not on Kickstarter but only on T witter) with projects to be funded. T o this end, we combine both static and dynamic project features (Section 6.1) with T witter -deri ved features to test whether we can predict potential inv estors for each project. The T witter- deriv ed features are widely-used to measure activity , status [16], and inﬂuence [5], and they are three: 1) the logarithm of the total number of tweets (activity); 2) the logarithm of the total number of followers divided by the number of followees (status); and 3) the sum of the average number of retweets, favorites, and mentions of the account’ s tweets (inﬂuence). Using cross validation on our data in a way similar to what we hav e already done in Section 6.1, we train the SVM-RBF model, which previously showed the best performance, solely on project and T witter-deriv ed features. W e learn that this model achiev es 68% of accuracy (A CC in T able 9) and an average percentile rank- ing of 0.4 (T able 10), making it partly possible to recommend in- vestors in cold-start situations and, as such, considerably expanding the in vestor pool. 8. CONCLUSION Model Featur es A CC P R F 1 A UC SVM-RBF Static 0.68 0.71 0.61 0.66 0.68 Dynamic 0.67 0.72 0.58 0.64 0.67 T able 9: Prediction results f or T witter+Project featur es. model F eatur es MeanRR MaxRR Random - 0.50 0.87 SVM-RBF Static 0.44 0.47 Dynamic 0.44 0.46 All 0.40 0.41 T able 10: Ranking results for T witter+Pr oject features. Everyday there are on average 39 new projects in Kickstarter: not only artists or entrepreneurs are proﬁting from this ne w way of rais- ing funds, but also city councils and political organizations have joined the fray . This is the ﬁrst study to characterize the pledging behavior of micro-funders. W e have established that inv estors be- hav e quite differently depending on whether they are very active in the community or not. Frequent in vestors are attracted by am- bitious projects, yet they carefully diversify their inv estment port- folios. By contrast, occasional investors act as donors, mainly in art-related projects. W e have also shown that it is possible to match ne w projects with willing in vestors, and that is extremely important, not least because the most common reason for failure in Kickstarter is the inability of founders to reach out to the right in vestors. W e are currently working on a website that, on input of a Kick- starter project’ s url, will recommend a list of potential inv estors’ T witter accounts. W e will then test the extent to which Kickstarter founders ﬁnd this application useful. As for additional analysis, we are planning to look at exogenous f actors, as this study has focused mainly on endogenous ones. 9. A CKNO WLEDGMENT Jisun An is supported in part by the Google European Doctoral Fel- lowship in Social Computing. W e thank Nicola Barbieri, Martin Sav eski, Alan Said, and Haewoon Kawk for their valuable com- ments on earlier versions of the draft. This work was done during an internship at Y ahoo Labs in Barcelona. 10. REFERENCES [1] A. K. Agrawal, C. Catalini, and A. Goldf arb . The geography of crowdfunding. NBER W orking Papers 16820, National Bureau of Economic Research, Inc, 2011. [2] P . Belleﬂamme, T . Lambert, and A. Schwienbacher . Crowdfunding: An industrial organization perspecti ve. In Pr oceedings of W orkshop of Digital Business Models: Understanding Strate gies conference , 2010. [3] P . Belleﬂamme, T . Lambert, and A. SCHWIENBA CHER. Crowdfunding: tapping the right crowd. CORE Discussion Papers 2011032, 2011. [4] D. M. Blei, A. Y . Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Mac hine Learning Resear c h , 2003. [5] M. Cha, H. Haddadi, F . Benev enuto, and K. Gummadi. Measuring user inﬂuence in Twitter: The million follower fallacy . In Pr oceedings of ICWSM , 2010. [6] Economist. Crowdfunding: Micro no more. 2012. http://www .economist.com/blogs/babbage/2012/01/crowdfunding. [7] Economist. Is it unfair for famous people to use kickstarter? 2012. http://www .economist.com/blogs/economist- explains/2013/05/economist-e xplains-unfair -fair -famous- people-kickstarter. [8] Economist. The new thundering herd. 2012. http://www .economist.com/node/21556973. [9] V . Etter , M. Grossglauser , and P . Thiran. Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns. In Pr oceedings of COSN , 2013. [10] E. Gerber , J. Hui, and P . yi Kuo. Cro wdfunding: Why people are motiv ated to post and fund projects on cro wdfunding platforms. In Pr oceedings of the workshop on Design Inﬂuence and Social T echnologies: T echniques, Impacts and Ethics , 2012. [11] E. Gilbert. The language that gets people to giv e: Phrases that predict success on kickstarter. In Pr oceedings of CSCW , 2014. [12] M. D. Greenberg, B. Pardo, K. Hariharan, and E. Gerber . Crowdfunding support tools: predicting success & failure. In Pr oceedings of CHI (EA) , 2013. [13] T . L. Grifﬁths and M. Steyv ers. Finding scientiﬁc topics. In Pr oceedings of the National Academy of Sciences of the United States of America 101 , 2004. [14] J. Hui, M. Greenberg, and E. Gerber . Understanding the role of community in crowdfunding w ork. In Pr oceedings of CSCW , 2014. [15] O. M. Lehner . Cro wdfunding social v entures: a model and research agenda. V entur e Capital , 15(4):289–311, 2013. [16] J. Leskov ec, D. Huttenlocher , and J. Kleinber g. Predicting positiv e and neg ativ e links in online social networks. In Pr oceedings of WWW , 2010. [17] Massolution. 2013CF Crowdfunding Industry Reports. 2013. http://research.crowdsourcing.or g/2013cf-cro wdfunding- industry-report. [18] E. R. Mollick. The dynamics of crowdfunding: Determinants of success and failure. Social Science Resear c h Network W orking P aper Series , 2012. [19] J. W . Mullins. Good money after bad? Harvard Business Revie w , Mar . 2007. [20] A. Ordanini, L. Miceli, M. Pizzetti, and A. Parasuraman. Crowd-funding: T ransforming customers into in vestors through innov ati ve service platforms. J ournal of Service Management , 22(4):443–470, 2011. [21] W . A. Sahlman. Ho w to write a great business plan. Harvar d Business Revie w , July 1997. [22] J. M. Stancill. Realistic criteria for judging new v entures. Harvar d Business Revie w , Nov . 1981. [23] M. Steyvers and T . L. Grifﬁths. Probabilistic topic models. Handbook of latent semantic analysis , 427(7):424-440, 2007. [24] B. Zider . Ho w v enture capital works. Harvar d Business Revie w , Nov . 1998.

Recommending Investors for Crowdfunding Projects

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment