Decision Support for Increasing the Efficiency of Crowdsourced Software Development
📝 Abstract
Crowdsourced software development (CSD) offers a series of specified tasks to a large crowd of trustworthy software workers. Topcoder is a leading platform to manage the whole process of CSD. While increasingly accepted as a realistic option for software development, preliminary analysis on Topcoder’s software crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In addition, a substantial number of tasks do not receive any successful submission. In this paper, we report about a methodology to improve the efficiency of CSD. We apply massive data analytics and machine leaning to (i) perform comparative analysis on alternative technique analysis to predict likelihood of winners and quitters for each task, (ii) significantly reduce the amount of non-succeeding development effort in registered but inappropriate tasks, (iii) identify and rank the most qualified registered workers for each task, and (iv) provide reliable prediction of tasks risky to get any successful submission. Our results and analysis show that Random Forest (RF) based predictive technique performs best among the alternative techniques studied. Applying RF, the tasks recommended to workers can reduce the amount of non-succeeding development effort to a great extent. On average, over a period of 30 days, the savings are 3.5 and 4.6 person-days per registered tasks for experienced resp. unexperienced workers. For the task-related recommendations of workers, we can accurately recommend at least 1 actual winner in the top ranked workers, particularly 94.07% of the time among the top-2 recommended workers for each task. Finally, we can predict, with more than 80% F-measure, the tasks likely not getting any submission, thus triggering timely corrective actions from CSD platforms or task requesters.
💡 Analysis
Crowdsourced software development (CSD) offers a series of specified tasks to a large crowd of trustworthy software workers. Topcoder is a leading platform to manage the whole process of CSD. While increasingly accepted as a realistic option for software development, preliminary analysis on Topcoder’s software crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In addition, a substantial number of tasks do not receive any successful submission. In this paper, we report about a methodology to improve the efficiency of CSD. We apply massive data analytics and machine leaning to (i) perform comparative analysis on alternative technique analysis to predict likelihood of winners and quitters for each task, (ii) significantly reduce the amount of non-succeeding development effort in registered but inappropriate tasks, (iii) identify and rank the most qualified registered workers for each task, and (iv) provide reliable prediction of tasks risky to get any successful submission. Our results and analysis show that Random Forest (RF) based predictive technique performs best among the alternative techniques studied. Applying RF, the tasks recommended to workers can reduce the amount of non-succeeding development effort to a great extent. On average, over a period of 30 days, the savings are 3.5 and 4.6 person-days per registered tasks for experienced resp. unexperienced workers. For the task-related recommendations of workers, we can accurately recommend at least 1 actual winner in the top ranked workers, particularly 94.07% of the time among the top-2 recommended workers for each task. Finally, we can predict, with more than 80% F-measure, the tasks likely not getting any submission, thus triggering timely corrective actions from CSD platforms or task requesters.
📄 Content
Decision Support for Increasing the Efficiency of
Crowdsourced Software Development
Muhammad Rezaul Karim
University of Calgary
2500 University Drive NW
Calgary, Alberta T2N 1N4
+1 (403) 220 7692
mrkarim@ucalgary.ca
David Messinger
Topcoder
425 Market Street
San Francisco, 94105, CA
USA
+1 (978) 590-3344
dmessinger@topcoder.
com
Ye Yang
Stevens Inst. of Technology
1 Castle Point Ter
Hoboken, NJ 07030, USA
+1(201)216-8560
ye.yang@stevens.edu
Guenther Ruhe University of Calgary 2500 University Drive NW Calgary, Alberta T2N 1N4 +1 (403) 220 7692 ruhe@ucalgary.ca
ABSTRACT
Crowdsourced software development (CSD) offers a series of
specified tasks to a large crowd of trustworthy software workers.
Topcoder is a leading platform to manage the whole process of
CSD. While increasingly accepted as a realistic option for software
development, preliminary analysis on Topcoder’s software crowd
worker behaviors reveals an alarming task-quitting rate of 82.9%.
In addition, a substantial number of tasks do not receive any
successful submission.
In this paper, we report about a methodology to improve the
efficiency of CSD. We apply massive data analytics and machine
leaning to (i) perform comparative analysis on alternative technique
analysis to predict likelihood of winners and quitters for each task,
(ii)
significantly
reduce
the
amount
of
non-succeeding
development effort in registered but inappropriate tasks, (iii)
identify and rank the most qualified registered workers for each
task, and (iv) provide reliable prediction of tasks risky to get any
successful submission.
Our results and analysis show that Random Forest (RF) based
predictive technique performs best among the alternative
techniques studied. Applying RF, the tasks recommended to
workers can reduce the amount of non-succeeding development
effort to a great extent. On average, over a period of 30 days, the
savings are 3.5 and 4.6 person-days per registered tasks for
experienced resp. unexperienced workers. For the task-related
recommendations of workers, we can accurately recommend at
least 1 actual winner in the top ranked workers, particularly 94.07%
of the time among the top-2 recommended workers for each task.
Finally, we can predict, with more than 80% F-measure, the tasks
likely not getting any submission, thus triggering timely corrective
actions from CSD platforms or task requesters.
CCS Concepts
• Software and its engineering → Software development
process management • Software and its engineering →
Programming teams • Information systems → Data analytics
Keywords Crowdsourced software development; predictive analytics; industrial case study; machine learning; random forest; Topcoder.
- INTRODUCTION AND BACKGROUND
The most expensive part of software development is people. Even
further, the most valuable asset of a company is its human resource.
Treating them accordingly and organizing their work in an efficient
manner is critical for project success. Crowdsourced software
development (CSD) is directed towards higher efficiency,
leveraging a large crowd of trustworthy software workers who are
registering and submitting for their interested tasks in exchange of
financial gains [3]. A general CSD process starts with task
requesting companies distributing tasks with prizes online, and then
crowd software workers browsing and registering to work on
selected tasks, and submitting work products once completion.
Crowd submissions will be evaluated by experts and experienced
developers, through a peer review process, to check the code
quality and/or document quality [1, 2]. The number of submissions
and their evaluated scores reflect the level of success in task
satisfaction or completion [3].
As one of the most successful CSD platforms, Topcoder has over 1
million registered workers from over 190 countries, averagely 80K
logins every 90 days, 7K challenges hosted per year and $80M in
challenges payouts. The size of crowd workers is almost 5 times
more engineers than Microsoft, Facebook, and Twitter combined.
However, utilizing unknown, external developers incurs new issues
related to worker identification and trust management. For example,
an analysis on Topcoder data from 2014-2015 shows an 82.9% of
worker quitting rate, on average 55.8% submission not passing
review, and a task cancellation rate of 15.7% [3]. In his keynote at
the 3rd Workshop on Crowdsourcing in Software Engineering,
Messinger recognized “trust and transparency” as one of three key
elements of good CSD [4]. Accurate and timely analytics to support
trust and transparency is critical for measuring and predicting
worker reliability, process stability, and products quality in CSD
context.
To that end, existing studies have focused on decision support for software crowdsourcing market. Among them, most focused on supporting decision making from the perspectives of task requeste
This content is AI-processed based on ArXiv data.