ROSF: Leveraging Information Retrieval and Supervised Learning for Recommending Code Snippets
📝 Abstract
When implementing unfamiliar programming tasks, developers commonly search code examples and learn usage patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features, a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-Kcode snippets for a givenfree-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the-state-of-the-art methods by 20%-41% in Precision and 13%-33% in NDCG
💡 Analysis
When implementing unfamiliar programming tasks, developers commonly search code examples and learn usage patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features, a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-Kcode snippets for a givenfree-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the-state-of-the-art methods by 20%-41% in Precision and 13%-33% in NDCG
📄 Content
PAGE 1
ROSF: Leveraging Information Retrieval and
Supervised Learning for Recommending
Code Snippets
He Jiang*, Liming Nie, Zeyi Sun, Zhilei Ren, Weiqiang Kong, Tao Zhang, and Xiapu Luo
Abstract—when implementing unfamiliar programming tasks, developers commonly search code examples and learn usage
patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code
examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this
paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features,
a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-K code
snippets for a given free-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we
generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we
predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model
from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results
to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world
code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the-
state-of-the-art methods by 20% - 41% in Precision and 13% - 33% in NDCG.
Index Terms—Code snippets recommendation, information retrieval, supervised learning, topic model, feature.
—————————— ——————————
1 INTRODUCTION
NTERNETWARE is a software paradigm consisting
of self-contained, autonomous entities in Internet
computing environment [30]. As mentioned in
previous work, both desktop software and mobile
applications
(apps)
are
possible
entities
in
Internetware systems [30], [22]. In the development
process for these software, developers often have to
implement unfamiliar programming tasks. They either
reuse code examples by copy-pasting and modifying
[23], or learn the correct ways to employ an unfamiliar
Application Programming Interface (API) relying on
code examples [54]. As one of the most common ways
for reuse, code reuse can save time and resources and
reduce redundancy [32].
A code snippet refers to a piece of code, which can
accomplish one or more specific programming tasks
[17]. Typically, a programming task, for example
“record sound audio”, is a short text that describes the
requirements on the program to be constructed. To
find high-quality code examples for programming
tasks, developers may search the publicly available
code repositories on the Internet or locally available
projects [28]. Some Internet-scale code search engines,
such as Open Hub [4], can provide code examples for a
given task. However, the dominant measure used by
these engines is textual similarity [11]. Previous studies
show that these results are usually complicated and
not sufficient [17].
In recent years, some researchers propose several
methods to recommend code snippets for free-form
queries[7], [17], [29]. These methods rank the code
snippets in a corpus and return Top-K related code
snippets to developers. An earlier study [23] shows
that the performance of these methods has room for
improvement. The possible reasons may include that a
signal feature is used for ranking and the weights of
features cannot be adjusted automatically. The features
employed in these methods contain textual similarity
between a query and code snippets [29], code metrics
such as the lines of code [36], etc. For achieving better
performance, it is necessary to employ multiple
features and assign different weights for these features
automatically [7], [36]. Supervised learning can handle
this scenario above, which is the machine learning task
of inferring a model from labeled training set. Using
the learned prediction model, one can determine the
class labels for unseen instances in a test set for a new
query [31], [50], and further recommend relevant code
snippets.
In this paper, we propose Recommending cOde
I
————————————————
H. Jiang is with the School of Software, Dalian University of Technology,
Dalian, China and the Key Laboratory for Ubiquitous Network and
Service Software of Liaoning Province, Dalian, China, is also with the
State Key Laboratory of Software Engineering, Wuhan University,
Wuhan, China. E-mail: jianghe@dlut.edu.cn.
L. Nie, Z. Sun, Z. Ren, and W. Kong are with the School of Software,
Dalian University of Technology, Dalian, China and the Key Laboratory
for Ubiquitous Network and Service Software of Liaoning Province,
Dalian, China. E-mail: limingnie@mail.dlut.edu.cn;
sunzeyidlut@gmail.com; {zren, wqkong}@dlut.edu.cn.
T.
This content is AI-processed based on ArXiv data.