ROSF: Leveraging Information Retrieval and Supervised Learning for Recommending Code Snippets

Reading time: 6 minute
...

📝 Abstract

When implementing unfamiliar programming tasks, developers commonly search code examples and learn usage patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features, a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-Kcode snippets for a givenfree-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the-state-of-the-art methods by 20%-41% in Precision and 13%-33% in NDCG

💡 Analysis

When implementing unfamiliar programming tasks, developers commonly search code examples and learn usage patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features, a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-Kcode snippets for a givenfree-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the-state-of-the-art methods by 20%-41% in Precision and 13%-33% in NDCG

📄 Content

PAGE 1

ROSF: Leveraging Information Retrieval and Supervised Learning for Recommending Code Snippets
He Jiang*, Liming Nie, Zeyi Sun, Zhilei Ren, Weiqiang Kong, Tao Zhang, and Xiapu Luo Abstract—when implementing unfamiliar programming tasks, developers commonly search code examples and learn usage patterns of APIs from the code examples or reuse them by copy-pasting and modifying. For providing high-quality code examples, previous studies present several methods to recommend code snippets mainly based on information retrieval. In this paper, to provide better recommendation results, we propose ROSF, Recommending cOde Snippets with multi-aspect Features, a novel method combining both information retrieval and supervised learning. In our method, we recommend Top-K code snippets for a given free-form query based on two stages, i.e., coarse-grained searching and fine-grained re-ranking. First, we generate a code snippet candidate set by searching a code snippet corpus using an information retrieval method. Second, we predict probability values of the code snippets for different relevance scores in the candidate set by the learned prediction model from a training set, re-rank these candidate code snippets according to the probability values, and recommend the final results to developers. We conduct several experiments to evaluate our method in a large-scale corpus containing 921,713 real-world code snippets. The results show that ROSF is an effective method for code snippets recommendation and outperforms the- state-of-the-art methods by 20% - 41% in Precision and 13% - 33% in NDCG. Index Terms—Code snippets recommendation, information retrieval, supervised learning, topic model, feature.
——————————  —————————— 1 INTRODUCTION NTERNETWARE is a software paradigm consisting of self-contained, autonomous entities in Internet computing environment [30]. As mentioned in previous work, both desktop software and mobile applications (apps) are possible entities in Internetware systems [30], [22]. In the development process for these software, developers often have to implement unfamiliar programming tasks. They either reuse code examples by copy-pasting and modifying [23], or learn the correct ways to employ an unfamiliar Application Programming Interface (API) relying on code examples [54]. As one of the most common ways for reuse, code reuse can save time and resources and reduce redundancy [32]. A code snippet refers to a piece of code, which can accomplish one or more specific programming tasks [17]. Typically, a programming task, for example “record sound audio”, is a short text that describes the requirements on the program to be constructed. To find high-quality code examples for programming tasks, developers may search the publicly available code repositories on the Internet or locally available projects [28]. Some Internet-scale code search engines, such as Open Hub [4], can provide code examples for a given task. However, the dominant measure used by these engines is textual similarity [11]. Previous studies show that these results are usually complicated and not sufficient [17].
In recent years, some researchers propose several methods to recommend code snippets for free-form queries[7], [17], [29]. These methods rank the code snippets in a corpus and return Top-K related code snippets to developers. An earlier study [23] shows that the performance of these methods has room for improvement. The possible reasons may include that a signal feature is used for ranking and the weights of features cannot be adjusted automatically. The features employed in these methods contain textual similarity between a query and code snippets [29], code metrics such as the lines of code [36], etc. For achieving better performance, it is necessary to employ multiple features and assign different weights for these features automatically [7], [36]. Supervised learning can handle this scenario above, which is the machine learning task of inferring a model from labeled training set. Using the learned prediction model, one can determine the class labels for unseen instances in a test set for a new query [31], [50], and further recommend relevant code snippets. In this paper, we propose Recommending cOde I ————————————————  H. Jiang is with the School of Software, Dalian University of Technology, Dalian, China and the Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, China, is also with the State Key Laboratory of Software Engineering, Wuhan University, Wuhan, China. E-mail: jianghe@dlut.edu.cn.  L. Nie, Z. Sun, Z. Ren, and W. Kong are with the School of Software, Dalian University of Technology, Dalian, China and the Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian, China. E-mail: limingnie@mail.dlut.edu.cn; sunzeyidlut@gmail.com; {zren, wqkong}@dlut.edu.cn.  T.

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut