Organ transplantation, which is the utilization of codes directly related to some specific functionalities to complete ones own program, provides more convenience for developers than traditional component reuse. However, recent techniques are challenged with the lack of organs for transplantation. Hence, we conduct an empirical study on extracting organs from GitHub repository to explore transplantation based on large-scale dataset. We analyze statistics from 12 representative GitHub projects and get the conclusion that 1) there are abundant practical organs existing in commits with add as a key word in the comments; 2) organs in this repository mainly possess four kinds of contents; 3) approximately 70% of the organs are easy-to-transplant. Implementing our transplantation strategy for different kinds of organs, we manually extract 30 organs in three different programming languages, namely Java, Python, and C, and make unit tests for them utilizing four testing tools (two for Java, one for Python, and one for C). At last, we transplant three Java organs into a specific platform for a performance check to verify whether they can work well in the new system. All the 30 organs extracted by our strategy possess good performances in unit test with the highest passing rate reaching 97% and the lowest one still passing 80% and the three Java organs work well in the new system, providing three new functionalities for the host. All the results indicate the feasibility of organ transplantation based on open-source repository, bringing new idea for code reuse.
With the increasing number of software being developed, several engineers proposed about extending the functionality of their individual software by facilitating others' codes. This method is called code reuse, which has been extensively studied [11]. For example, software component, referred to a set of classes, is considered as the basic unit for reuse [23].
In 2015, Harman et al. [13] proposed a new concept, organ, which refers to all codes associated with the feature of interest, bringing a new chance for software reuse. Different from components underlining high relationship between several classes, organs emphasize on the integrity on functionality. Organs do not have to be several classes. It can be several lines of code, a function, or one class, as long as it finishes a specific functionality independently. If one function alone can fulfill the needs of a software whereas a set of classes is added into the software, it is then highly time-consuming to remove the extra codes, not to mention the defects which might be induced by the redundancy of codes. This redundancy problem would be resolved by retrieval of the functional codes and the corresponding transplantation of them into the target software. Hence, the overburdened code transplantation and its negative impact is expectedly avoided. That is why organ is a more flexible unit for reuse and brings much more convenience than traditional software reuse based on component. However, the practice [14,15] in this area is restricted to a small-scale and specific experimental context. The general exploration relating to organ extraction and transplantation based on the large-scale dataset has not been well studied.
Open-source movement is becoming popular recently [26]. Many developers are joining the collaborative development community to develop projects iteratively. These developers continue committing their codes to the repository. Thus, the repository keeps track of the progress of a project. Those codes which are added into the repository by contributors may contain some functionalities that are remarkable to other developers. Whether we can obtain several practical organs for transplantation through complete analysis of repository remains unknown. New ideas and direction will be brought to researchers if it is feasible.
On this basis, we put forward a strategy for transplanting organs from repositories and present an empirical study on extracting practical organs from GitHub repository, aiming at remedying the problem of lacking of organs. The main contributions of this paper are:
• We divide commits in GitHub into eight categories based on the keywords in the comments and get the conclusion that abundant practical organs are in the adding commits.
We find that there are four kinds of common contents in the organs and calculate their percentages.
We define our criteria for dividing organs into two categories: easy-to-transplant and difficult-totransplant and then display the statistics which show the percentages for both kind.
We put forward a strategy for extracting and transplanting organs from open-source repositories for both types of organs (easy-to-transplant and difficult-to-transplant).
We conduct an empirical study on extracting and transplanting organs using our methodology and the feasibility of our approach has been proved. The remaining part of this paper is organized as follows. Section Ⅱ introduces the design of our study. Section Ⅲ provides the answers to each research question in our study and makes detailed analyses. Section Ⅳ begins by presenting the discussion about the threats to validity in our study. Section Ⅴ provides a review on projects and studies related to this topic. Section Ⅵ presents the conclusion to our study and our future work.
The study design discusses our research questions and datasets, aiming at ensuring that the design is appropriate for the objectives of the study.
In particular, we address the following research questions in our study: RQ1: Is there any evident access of organ extraction?
The bulk of information in the open-source repository adds to the difficulty in locating the practical organs accurately and efficiently. To answer this question, we analyze 12 projects from GitHub repository to find whether there existing evident symbols for extraction. The answer to this question will provide reference for locating practical organ, thereby making preparation for automatic extraction of organs in the future.
To answer this question, we divide organs into different types based on their contents. The answer will reveal the portion of the organs with different contents and make response to questions about organ abundance.
To answer this question, we define the concepts of easy-totransplant and difficult-to-transplant organs based on the degree of relevance of the organ with the source codes, which will provide a theoretical basis for extraction method.
RQ4: How to transplant these organs?
This content is AI-processed based on open access ArXiv data.