Decision Support for Increasing the Efficiency of Crowdsourced Software Development

Reading time: 6 minute
...

📝 Abstract

Crowdsourced software development (CSD) offers a series of specified tasks to a large crowd of trustworthy software workers. Topcoder is a leading platform to manage the whole process of CSD. While increasingly accepted as a realistic option for software development, preliminary analysis on Topcoder’s software crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In addition, a substantial number of tasks do not receive any successful submission. In this paper, we report about a methodology to improve the efficiency of CSD. We apply massive data analytics and machine leaning to (i) perform comparative analysis on alternative technique analysis to predict likelihood of winners and quitters for each task, (ii) significantly reduce the amount of non-succeeding development effort in registered but inappropriate tasks, (iii) identify and rank the most qualified registered workers for each task, and (iv) provide reliable prediction of tasks risky to get any successful submission. Our results and analysis show that Random Forest (RF) based predictive technique performs best among the alternative techniques studied. Applying RF, the tasks recommended to workers can reduce the amount of non-succeeding development effort to a great extent. On average, over a period of 30 days, the savings are 3.5 and 4.6 person-days per registered tasks for experienced resp. unexperienced workers. For the task-related recommendations of workers, we can accurately recommend at least 1 actual winner in the top ranked workers, particularly 94.07% of the time among the top-2 recommended workers for each task. Finally, we can predict, with more than 80% F-measure, the tasks likely not getting any submission, thus triggering timely corrective actions from CSD platforms or task requesters.

💡 Analysis

Crowdsourced software development (CSD) offers a series of specified tasks to a large crowd of trustworthy software workers. Topcoder is a leading platform to manage the whole process of CSD. While increasingly accepted as a realistic option for software development, preliminary analysis on Topcoder’s software crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In addition, a substantial number of tasks do not receive any successful submission. In this paper, we report about a methodology to improve the efficiency of CSD. We apply massive data analytics and machine leaning to (i) perform comparative analysis on alternative technique analysis to predict likelihood of winners and quitters for each task, (ii) significantly reduce the amount of non-succeeding development effort in registered but inappropriate tasks, (iii) identify and rank the most qualified registered workers for each task, and (iv) provide reliable prediction of tasks risky to get any successful submission. Our results and analysis show that Random Forest (RF) based predictive technique performs best among the alternative techniques studied. Applying RF, the tasks recommended to workers can reduce the amount of non-succeeding development effort to a great extent. On average, over a period of 30 days, the savings are 3.5 and 4.6 person-days per registered tasks for experienced resp. unexperienced workers. For the task-related recommendations of workers, we can accurately recommend at least 1 actual winner in the top ranked workers, particularly 94.07% of the time among the top-2 recommended workers for each task. Finally, we can predict, with more than 80% F-measure, the tasks likely not getting any submission, thus triggering timely corrective actions from CSD platforms or task requesters.

📄 Content

Decision Support for Increasing the Efficiency of Crowdsourced Software Development Muhammad Rezaul Karim University of Calgary 2500 University Drive NW Calgary, Alberta T2N 1N4 +1 (403) 220 7692 mrkarim@ucalgary.ca David Messinger Topcoder
425 Market Street San Francisco, 94105, CA USA +1 (978) 590-3344 dmessinger@topcoder. com Ye Yang Stevens Inst. of Technology 1 Castle Point Ter Hoboken, NJ 07030, USA +1(201)216-8560 ye.yang@stevens.edu

Guenther Ruhe University of Calgary 2500 University Drive NW Calgary, Alberta T2N 1N4 +1 (403) 220 7692 ruhe@ucalgary.ca

ABSTRACT Crowdsourced software development (CSD) offers a series of specified tasks to a large crowd of trustworthy software workers. Topcoder is a leading platform to manage the whole process of CSD. While increasingly accepted as a realistic option for software development, preliminary analysis on Topcoder’s software crowd worker behaviors reveals an alarming task-quitting rate of 82.9%. In addition, a substantial number of tasks do not receive any successful submission. In this paper, we report about a methodology to improve the efficiency of CSD. We apply massive data analytics and machine leaning to (i) perform comparative analysis on alternative technique analysis to predict likelihood of winners and quitters for each task, (ii) significantly reduce the amount of non-succeeding development effort in registered but inappropriate tasks, (iii) identify and rank the most qualified registered workers for each task, and (iv) provide reliable prediction of tasks risky to get any successful submission.
Our results and analysis show that Random Forest (RF) based predictive technique performs best among the alternative techniques studied. Applying RF, the tasks recommended to workers can reduce the amount of non-succeeding development effort to a great extent. On average, over a period of 30 days, the savings are 3.5 and 4.6 person-days per registered tasks for experienced resp. unexperienced workers. For the task-related recommendations of workers, we can accurately recommend at least 1 actual winner in the top ranked workers, particularly 94.07% of the time among the top-2 recommended workers for each task. Finally, we can predict, with more than 80% F-measure, the tasks likely not getting any submission, thus triggering timely corrective actions from CSD platforms or task requesters. CCS Concepts • Software and its engineering → Software development process management • Software and its engineering → Programming teams • Information systems → Data analytics

Keywords Crowdsourced software development; predictive analytics; industrial case study; machine learning; random forest; Topcoder.

  1. INTRODUCTION AND BACKGROUND The most expensive part of software development is people. Even further, the most valuable asset of a company is its human resource. Treating them accordingly and organizing their work in an efficient manner is critical for project success. Crowdsourced software development (CSD) is directed towards higher efficiency, leveraging a large crowd of trustworthy software workers who are registering and submitting for their interested tasks in exchange of financial gains [3]. A general CSD process starts with task requesting companies distributing tasks with prizes online, and then crowd software workers browsing and registering to work on selected tasks, and submitting work products once completion. Crowd submissions will be evaluated by experts and experienced developers, through a peer review process, to check the code quality and/or document quality [1, 2]. The number of submissions and their evaluated scores reflect the level of success in task satisfaction or completion [3]. As one of the most successful CSD platforms, Topcoder has over 1 million registered workers from over 190 countries, averagely 80K logins every 90 days, 7K challenges hosted per year and $80M in challenges payouts. The size of crowd workers is almost 5 times more engineers than Microsoft, Facebook, and Twitter combined. However, utilizing unknown, external developers incurs new issues related to worker identification and trust management. For example, an analysis on Topcoder data from 2014-2015 shows an 82.9% of worker quitting rate, on average 55.8% submission not passing review, and a task cancellation rate of 15.7% [3]. In his keynote at the 3rd Workshop on Crowdsourcing in Software Engineering, Messinger recognized “trust and transparency” as one of three key elements of good CSD [4]. Accurate and timely analytics to support trust and transparency is critical for measuring and predicting worker reliability, process stability, and products quality in CSD context.
    To that end, existing studies have focused on decision support for software crowdsourcing market. Among them, most focused on supporting decision making from the perspectives of task requeste

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut