Automatic derivation of domain terms and concept location based on the analysis of the identifiers

February 23, 2026

Reading time: 6 minute

...

📝 Original Info

Title: Automatic derivation of domain terms and concept location based on the analysis of the identifiers
ArXiv ID: 1003.1399
Date: 2010-03-13
Authors: Researchers from original ArXiv paper

📝 Abstract

Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as "concepts" and their placement in the code as "concept location". Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in "classes" (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class' appropriate identifiers. We measured the following software products with our tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2.

💡 Deep Analysis

Deep Dive into Automatic derivation of domain terms and concept location based on the analysis of the identifiers.

Developers express the meaning of the domain ideas in specifically selected identifiers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the comprehension of a specific domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as “concepts” and their placement in the code as “concept location”. Application developers may also benefit from the obtained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in “classes” (OO paradigm) and the obtained vocabulary of terms supports the selection and the comprehension of the class’ a

📄 Full Content

Acta Univ. Sapientiae, Informatica, 2, 1 (2010) 40–50 Automatic derivation of domain terms and concept location based on the analysis of the identiﬁers Peter V´aclav´ık Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: Peter.Vaclavik@tuke.sk Jaroslav Porub¨an Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: Jaroslav.Poruban@tuke.sk Marek Mezei Technical University of Koˇsice Faculty of Electrical Engineering and Informatics Department of Computers and Informatics email: marekmezei@gmail.com Abstract. Developers express the meaning of the domain ideas in speciﬁ- cally selected identiﬁers and comments that form the target implemented code. Software maintenance requires knowledge and understanding of the encoded ideas. This paper presents a way how to create automatically domain vocabulary. Knowledge of domain vocabulary supports the com- prehension of a speciﬁc domain for later code maintenance or evolution. We present experiments conducted in two selected domains: application servers and web frameworks. Knowledge of domain terms enables easy localization of chunks of code that belong to a certain term. We consider these chunks of code as “concepts” and their placement in the code as “concept location”. Application developers may also beneﬁt from the ob- tained domain terms. These terms are parts of speech that characterize a certain concept. Concepts are encoded in “classes” (OO paradigm) and Computing Classiﬁcation System 1998: D.2.8 Mathematics Subject Classiﬁcation 2010: 68N99 Key words and phrases: program comprehension, domain knowledge, program quality, software measurement 40 arXiv:1003.1399v1 [cs.CL] 6 Mar 2010 Domain terms and concept location based on identiﬁers’ analysis 41 the obtained vocabulary of terms supports the selection and the compre- hension of the class’ appropriate identiﬁers. We measured the following software products with our tool: JBoss, JOnAS, GlassFish, Tapestry, Google Web Toolkit and Echo2. 1 Introduction Program comprehension is an essential part of software evolution and software maintenance: software that is not comprehended cannot be changed [5, 6, 7, 8]. Among the earliest results are the two classic theories of program com- prehension, called top-down and bottom-up theories [9]. Bottom-up theory: Consider that understanding a program is obtained from source code reading and then mentally chunking or grouping the statements or control structures into higher abstract level, i.e. from bottom up. Such information is further ag- gregated until high-level abstraction of the program is obtained. Chunks are described as code fragments in programs. Available literature shows chunks to be used during the bottom-up approach of software comprehension. Chunks vary in size. Several chunks can be combined into larger chunks [1]. On the other hand, the top-down approach starts the comprehension process with a hypothesis concerning a high-level abstraction, which then will be further re- ﬁned, leading to a hierarchical comprehension structure. The understanding of the program is developed from the conﬁrmation or refutation of hypotheses. An important task in program comprehension is to understand where and how the relevant concepts are located in the code. Concept location is the start- ing point for the desired program change. Concept location means a process where we assume that programmer understands the concept of the program domain, but does not know where is it located within the code. All domain concepts should map onto one or more fragments of the code. In other words, process of concept location is the process that ﬁnds that code-fragment [5]. Developers who are new to a project know little about the identiﬁers or comments in the source code, but it is likely that they have some knowledge about the problem domain of the software. In this paper, we present a new way of program comprehension that is based on naming of identiﬁers. When trying to understand the source code of a software system, developers usually start by locating familiar concepts in the source code. Keyword search is one of the most popular methods for this kind of task, but the success is strictly tied to the quality of the user queries and the words used to construct the identiﬁers and comments. We present a way how to create a domain vocabulary automatically as a 42 P. V´aclav´ık, J. Porub¨an, M. Mezei result of source code analysis. We classify the parts of speech and measure their occurrence in the source code. 2 Motivation Domain level knowledge is important when programmers attempt to under- stand a program. Programmer inspects source code structure that is directed by identiﬁers. The quality and the “orthogonality” of the identiﬁers in the source code aﬀects the time of program comprehension. Next kinds of quality could be measured: 1. percentage o

…(Full text truncated)…

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

Automatic derivation of domain terms and concept location based on the analysis of the identifiers

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Sentiment Analysis of Code-Mixed Languages leveraging Resource Rich Languages

Sentiment Analysis on Speaker Specific Speech Data

Multi-Level Analysis and Annotation of Arabic Corpora for Text-to-Sign Language MT

Start searching

No results found