📝 Original Info
- Title: Automatic derivation of domain terms and concept location based on the analysis of the identifiers
- ArXiv ID: 1003.1399
- Date: 2010-03-13
- Authors: Researchers from original ArXiv paper
📝 Abstract
Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as "concepts" and their placement in the code as "concept location". Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in "classes" (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class' appropriate identifiers. We measured the following software products with our tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2.
💡 Deep Analysis
Deep Dive into Automatic derivation of domain terms and concept location based on the analysis of the identifiers.
Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as “concepts” and their placement in the code as “concept location”. Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in “classes” (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class’ a
📄 Full Content
Acta Univ. Sapientiae, Informatica, 2, 1 (2010) 40–50
Automatic derivation of domain terms and
concept location based on the analysis of
the identifiers
Peter V´aclav´ık
Technical University of Koˇsice
Faculty of Electrical Engineering and Informatics
Department of Computers and Informatics
email: Peter.Vaclavik@tuke.sk
Jaroslav Porub¨an
Technical University of Koˇsice
Faculty of Electrical Engineering and
Informatics
Department of Computers and
Informatics
email: Jaroslav.Poruban@tuke.sk
Marek Mezei
Technical University of Koˇsice
Faculty of Electrical Engineering and
Informatics
Department of Computers and
Informatics
email: marekmezei@gmail.com
Abstract. Developers express the meaning of the domain ideas in specifi-
cally selected identifiers and comments that form the target implemented
code. Software maintenance requires knowledge and understanding of the
encoded ideas. This paper presents a way how to create automatically
domain vocabulary. Knowledge of domain vocabulary supports the com-
prehension of a specific domain for later code maintenance or evolution.
We present experiments conducted in two selected domains: application
servers and web frameworks. Knowledge of domain terms enables easy
localization of chunks of code that belong to a certain term. We consider
these chunks of code as “concepts” and their placement in the code as
“concept location”. Application developers may also benefit from the ob-
tained domain terms. These terms are parts of speech that characterize a
certain concept. Concepts are encoded in “classes” (OO paradigm) and
Computing Classification System 1998: D.2.8
Mathematics Subject Classification 2010: 68N99
Key words and phrases: program comprehension, domain knowledge, program quality,
software measurement
40
arXiv:1003.1399v1 [cs.CL] 6 Mar 2010
Domain terms and concept location based on identifiers’ analysis
41
the obtained vocabulary of terms supports the selection and the compre-
hension of the class’ appropriate identifiers. We measured the following
software products with our tool: JBoss, JOnAS, GlassFish, Tapestry,
Google Web Toolkit and Echo2.
1
Introduction
Program comprehension is an essential part of software evolution and software
maintenance: software that is not comprehended cannot be changed [5, 6, 7, 8].
Among the earliest results are the two classic theories of program com-
prehension, called top-down and bottom-up theories [9]. Bottom-up theory:
Consider that understanding a program is obtained from source code reading
and then mentally chunking or grouping the statements or control structures
into higher abstract level, i.e. from bottom up. Such information is further ag-
gregated until high-level abstraction of the program is obtained. Chunks are
described as code fragments in programs. Available literature shows chunks to
be used during the bottom-up approach of software comprehension. Chunks
vary in size. Several chunks can be combined into larger chunks [1]. On the
other hand, the top-down approach starts the comprehension process with a
hypothesis concerning a high-level abstraction, which then will be further re-
fined, leading to a hierarchical comprehension structure. The understanding of
the program is developed from the confirmation or refutation of hypotheses.
An important task in program comprehension is to understand where and
how the relevant concepts are located in the code. Concept location is the start-
ing point for the desired program change. Concept location means a process
where we assume that programmer understands the concept of the program
domain, but does not know where is it located within the code. All domain
concepts should map onto one or more fragments of the code. In other words,
process of concept location is the process that finds that code-fragment [5].
Developers who are new to a project know little about the identifiers or
comments in the source code, but it is likely that they have some knowledge
about the problem domain of the software. In this paper, we present a new
way of program comprehension that is based on naming of identifiers. When
trying to understand the source code of a software system, developers usually
start by locating familiar concepts in the source code. Keyword search is one
of the most popular methods for this kind of task, but the success is strictly
tied to the quality of the user queries and the words used to construct the
identifiers and comments.
We present a way how to create a domain vocabulary automatically as a
42
P. V´aclav´ık, J. Porub¨an, M. Mezei
result of source code analysis. We classify the parts of speech and measure
their occurrence in the source code.
2
Motivation
Domain level knowledge is important when programmers attempt to under-
stand a program. Programmer inspects source code structure that is directed
by identifiers. The quality and the “orthogonality” of the identifiers in the
source code affects the time of program comprehension. Next kinds of quality
could be measured:
1. percentage o
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.