Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 69 vulnerable code snippets categorized into 29 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 69 vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code snippets shared on Stack Overflow, we developed a browser extension that allow Stack Overflow users to check for vulnerabilities in code snippets when they upload them on the platform.
Deep Dive into An Empirical Study of C++ Vulnerabilities in Crowd-Sourced Code Examples.
Software developers share programming solutions in Q&A sites like Stack Overflow. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders, we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines. From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 69 vulnerable code snippets categorized into 29 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 69 vulnerable code snippets found
1
An Empirical Study of C++ Vulnerabilities in
Crowd-Sourced Code Examples
Morteza Verdi, Ashkan Sami, Jafar Akhondali, Foutse Khomh, Gias Uddin, and Alireza Karami Motlagh
Abstract—Software developers share programming solutions in Q&A sites like Stack Overflow, Stack Exchange, Android forum, and
so on. The reuse of crowd-sourced code snippets can facilitate rapid prototyping. However, recent research shows that the shared
code snippets may be of low quality and can even contain vulnerabilities. This paper aims to understand the nature and the
prevalence of security vulnerabilities in crowd-sourced code examples. To achieve this goal, we investigate security vulnerabilities in
the C++ code snippets shared on Stack Overflow over a period of 10 years. In collaborative sessions involving multiple human coders,
we manually assessed each code snippet for security vulnerabilities following CWE (Common Weakness Enumeration) guidelines.
From the 72,483 reviewed code snippets used in at least one project hosted on GitHub, we found a total of 99 vulnerable code
snippets categorized into 31 types. Many of the investigated code snippets are still not corrected on Stack Overflow. The 99
vulnerable code snippets found in Stack Overflow were reused in a total of 2859 GitHub projects. To help improve the quality of code
snippets shared on Stack Overflow, we developed a browser extension that allows Stack Overflow users to be notified for
vulnerabilities in code snippets when they see them on the platform.
Index Terms—Stack Overflow, Software Security, C++, SOTorrent, Vulnerability Migration, GitHub, Vulnerability Evolution
F
1 INTRODUCTION
A major goal of software development is to deliver high
quality software in timely and cost-efficient manner. Code
reuse is an accepted practice and an essential approach to
achieve this premise [1]. The reused code snippets come
from many different sources and in different forms, e.g.,
third-party library [2], open source software [3], and Ques-
tion and Answer (Q&A) websites such as Stack Overflow
[4], [5]. Sharing code snippets and code examples is also a
common learning practice [6]. Novices and even more
senior
developers
leverage
code
examples
and
explanations shared on platforms like Stack Overflow, to
learn how to perform new programming tasks or use certain
APIs [1], [7], [8], [9]. Multiple studies [10], [11], [12] have
investigated knowledge flow and knowledge sharing from
Stack Overflow answers to repositories of open source
software hosted in GitHub. They report that code snippets
found on Stack Overflow can be toxic, i.e., of poor quality,
and can potentially lead to license violations [12], [13]. An
important aspect of quality that has not been investigated in
details by the research com-munity is security. If vulnerable
codes snippets are migrated from Stack Overflow to
applications, these applications will be prone to attacks.
M. Verdi is with Shiraz University, Iran. E-mail: m.verdi@shirazu.ac.ir
A. Sami (Corresponding Author) is with Shiraz University, Iran. E-mail:
sami@shirazu.ac.ir
J.
Akhondali
is
with
Shiraz
University,
Iran.
E-mail:
ja-
far.akhondali@yahoo.com
F. Khomh is with Polytechnique Montreal, Quebec Canada. E-mail:
foutse.khomh@polymtl.ca
G. Uddin is with University of Calgary, Alberta Canada. Email:
gias.uddin@ucalgary.ca
A. Karami
Motlagh is with
Shiraz University,
Iran.
E-mail:
alireza.karami.m@gmail.com
The danger of copy pasting insecure code from Stack
Overflow was recently raised by Fischer et al. [8], who found
that vulnerable Android code snippets from Stack Overflow are
reused in popular Android apps. We are, however, aware of no
previous study that specifically fo-cused on the vulnerability of
C++ code snippets shared in Stack Overflow and whether and
how such vulnerable code snippets may have migrated to open
source software repositories in GitHub. This insight is
important because such vulnerable software repositories then
can be reused by other software repositories, which given the
popularity of GitHub, is entirely possible. C++ is the fourth most
popular programming language [14]. C++ is the language of
choice for embedded, resource-constrained programs. It is
exten-sively
used
in
large
and
distributed
systems.
Vulnerabilities in C++ code snippets are therefore likely to have
a major impact. However, to the best of our knowledge, no
study has examined the security aspects of C++ Stack
Overflow code snippets and their impact on open source
software projects. This paper aims to fill this gap in the
literature. More specif-ically, we aim to understand the nature
and the prevalence of security vulnerabilities in code examples
shared on Stack Overflow. To achieve this goal, we empirically
study C++ vulnerabilities in code examples shared on Stack
Overflow along the following two dimensions.
Prevalence:] We revi
…(Full text truncated)…
This content is AI-processed based on ArXiv data.