Automatic derivation of domain terms and concept location based on the analysis of the identifiers

Reading time: 6 minute
...

📝 Original Info

  • Title: Automatic derivation of domain terms and concept location based on the analysis of the identifiers
  • ArXiv ID: 1003.1399
  • Date: 2010-03-13
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as "concepts" and their placement in the code as "concept location". Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in "classes" (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class' appropriate identifiers. We measured the following software products with our tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2.

💡 Deep Analysis

Deep Dive into Automatic derivation of domain terms and concept location based on the analysis of the identifiers.

Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as “concepts” and their placement in the code as “concept location”. Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in “classes” (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class’ a

📄 Full Content

Acta Univ. Sapientiae, Informatica, 2, 1 (2010) 40–50 Automatic derivation of domain terms and concept location based on the analysis of the identifiers Peter V´aclav´ık Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: Peter.Vaclavik@tuke.sk Jaroslav Porub¨an Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: Jaroslav.Poruban@tuke.sk Marek Mezei Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: marekmezei@gmail.com Abstract. Developers express the meaning of the domain ideas in specifi- cally selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the com- prehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as “concepts” and their placement in the code as “concept location”. Application developers may also benefit from the ob- tained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in “classes” (OO paradigm) and Computing Classification System 1998: D.2.8 Mathematics Subject Classification 2010: 68N99 Key words and phrases: program comprehension, domain knowledge, program quality, software measurement 40 arXiv:1003.1399v1 [cs.CL] 6 Mar 2010 Domain terms and concept location based on identifiers’ analysis 41 the obtained vocabulary of terms supports the selection and the compre- hension of the class’ appropriate identifiers. We measured the following software products with our tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2. 1 Introduction Program comprehension is an essential part of software evolution and software maintenance: software that is not comprehended cannot be changed [5, 6, 7, 8]. Among the earliest results are the two classic theories of program com- prehension, called top-down and bottom-up theories [9]. Bottom-up theory: Consider that understanding a program is obtained from source code reading and then mentally chunking or grouping the statements or control structures into higher abstract level, i.e. from bottom up. Such information is further ag- gregated until high-level abstraction of the program is obtained. Chunks are described as code fragments in programs. Available literature shows chunks to be used during the bottom-up approach of software comprehension. Chunks vary in size. Several chunks can be combined into larger chunks [1]. On the other hand, the top-down approach starts the comprehension process with a hypothesis concerning a high-level abstraction, which then will be further re- fined, leading to a hierarchical comprehension structure. The understanding of the program is developed from the confirmation or refutation of hypotheses. An important task in program comprehension is to understand where and how the relevant concepts are located in the code. Concept location is the start- ing point for the desired program change. Concept location means a process where we assume that programmer understands the concept of the program domain, but does not know where is it located within the code. All domain concepts should map onto one or more fragments of the code. In other words, process of concept location is the process that finds that code-fragment [5]. Developers who are new to a project know little about the identifiers or comments in the source code, but it is likely that they have some knowledge about the problem domain of the software. In this paper, we present a new way of program comprehension that is based on naming of identifiers. When trying to understand the source code of a software system, developers usually start by locating familiar concepts in the source code. Keyword search is one of the most popular methods for this kind of task, but the success is strictly tied to the quality of the user queries and the words used to construct the identifiers and comments. We present a way how to create a domain vocabulary automatically as a 42 P. V´aclav´ık, J. Porub¨an, M. Mezei result of source code analysis. We classify the parts of speech and measure their occurrence in the source code. 2 Motivation Domain level knowledge is important when programmers attempt to under- stand a program. Programmer inspects source code structure that is directed by identifiers. The quality and the “orthogonality” of the identifiers in the source code affects the time of program comprehension. Next kinds of quality could be measured: 1. percentage o

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut