An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples

Reading time: 6 minute
...

📝 Original Info

  • Title: An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples
  • ArXiv ID: 1910.01321
  • Date: 2021-01-21
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 69 vulnerable code snippets categorized into 29 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 69 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow, we developed a browser extension that allow Stack Overflow users to check for vulnerabilities in code snippets when they upload them on the platform.

💡 Deep Analysis

Deep Dive into An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples.

Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 69 vulnerable code snippets categorized into 29 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 69 vulnerable code snippets found

📄 Full Content

1

An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples

Morteza Verdi, Ashkan Sami, Jafar Akhondali, Foutse Khomh, Gias Uddin, and Alireza Karami Motlagh

Abstract—Software developers share programming solutions in Q&A sites like Stack Overflow, Stack Exchange, Android forum, and so on. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 99 vulnerable code snippets categorized into 31 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 99 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow, we developed a browser extension that allows Stack Overflow users to be notified for vulnerabilities in code snippets when they see them on the platform.

Index Terms—Stack Overflow, Software Security, C++, SOTorrent, Vulnerability Migration, GitHub, Vulnerability Evolution

F

1 INTRODUCTION

A major goal of software development is to deliver high quality software in timely and cost-efficient manner. Code reuse is an accepted practice and an essential approach to achieve this premise [1]. The reused code snippets come from many different sources and in different forms, e.g., third-party library [2], open source software [3], and Ques- tion and Answer (Q&A) websites such as Stack Overflow [4], [5]. Sharing code snippets and code examples is also a common learning practice [6]. Novices and even more senior developers leverage code examples and explanations shared on platforms like Stack Overflow, to learn how to perform new programming tasks or use certain APIs [1], [7], [8], [9]. Multiple studies [10], [11], [12] have investigated knowledge flow and knowledge sharing from Stack Overflow answers to repositories of open source software hosted in GitHub. They report that code snippets found on Stack Overflow can be toxic, i.e., of poor quality, and can potentially lead to license violations [12], [13]. An important aspect of quality that has not been investigated in details by the research com-munity is security. If vulnerable codes snippets are migrated from Stack Overflow to applications, these applications will be prone to attacks.

M. Verdi is with Shiraz University, Iran. E-mail: m.verdi@shirazu.ac.ir

A. Sami (Corresponding Author) is with Shiraz University, Iran. E-mail: sami@shirazu.ac.ir

J. Akhondali is with Shiraz University, Iran. E-mail: ja- far.akhondali@yahoo.com F. Khomh is with Polytechnique Montreal, Quebec Canada. E-mail: foutse.khomh@polymtl.ca G. Uddin is with University of Calgary, Alberta Canada. Email: gias.uddin@ucalgary.ca A. Karami Motlagh is with Shiraz University, Iran. E-mail: alireza.karami.m@gmail.com

The danger of copy pasting insecure code from Stack Overflow was recently raised by Fischer et al. [8], who found that vulnerable Android code snippets from Stack Overflow are reused in popular Android apps. We are, however, aware of no previous study that specifically fo-cused on the vulnerability of C++ code snippets shared in Stack Overflow and whether and how such vulnerable code snippets may have migrated to open source software repositories in GitHub. This insight is important because such vulnerable software repositories then can be reused by other software repositories, which given the popularity of GitHub, is entirely possible. C++ is the fourth most popular programming language [14]. C++ is the language of choice for embedded, resource-constrained programs. It is exten-sively used in large and distributed systems. Vulnerabilities in C++ code snippets are therefore likely to have a major impact. However, to the best of our knowledge, no study has examined the security aspects of C++ Stack Overflow code snippets and their impact on open source software projects. This paper aims to fill this gap in the literature. More specif-ically, we aim to understand the nature and the prevalence of security vulnerabilities in code examples shared on Stack Overflow. To achieve this goal, we empirically study C++ vulnerabilities in code examples shared on Stack Overflow along the following two dimensions.

Prevalence:] We revi

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut