A Case Study on Quality Attribute Measurement using MARF and GIPSY

A Case Study on Quality Attribute Measurement using MARF and GIPSY
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This literature focuses on doing a comparative analysis between Modular Audio Recognition Framework (MARF) and the General Intentional Programming System (GIPSY) with the help of different software metrics. At first, we understand the general principles, architecture and working of MARF and GIPSY by looking at their frameworks and running them in the Eclipse environment. Then, we study some of the important metrics including a few state of the art metrics and rank them in terms of their usefulness and their influence on the different quality attributes of a software. The quality attributes are viewed and computed with the help of the Logiscope and McCabe IQ tools. These tools perform a comprehensive analysis on the case studies and generate a quality report at the factor level, criteria level and metrics level. In next step, we identify the worst code at each of these levels, extract the worst code and provide recommendations to improve the quality. We implement and test some of the metrics which are ranked as the most useful metrics with a set of test cases in JDeodorant. Finally, we perform an analysis on both MARF and GIPSY by doing a fuzzy code scan using MARFCAT to find the list of weak and vulnerable classes.


💡 Research Summary

The paper presents a comparative case study of two open‑source systems—Modular Audio Recognition Framework (MARF) and General Intentional Programming System (GIPSY)—to demonstrate how a suite of static analysis tools and software metrics can be combined to assess and improve quality attributes. The authors first set up both projects in Eclipse, compile them, and explore their architectures: MARF as a modular pipeline for audio signal processing and GIPSY as a distributed platform for intentional programming.

Using Logiscope and McCabe IQ, the study measures classic code‑level metrics such as cyclomatic complexity, Halstead volume, coupling between objects (CBO), depth of inheritance tree (DIT), lack of cohesion of methods (LCOM), and class‑level method counts. Results are presented at three hierarchical levels—factor, criteria, and metric—allowing the identification of “worst code” at each granularity. For MARF, the audio preprocessing module exhibits an average cyclomatic complexity of 32, far above the project mean of 18. In GIPSY, the intention‑interpretation engine shows high CBO (15) and low LCOM (0.42), indicating tight coupling and poor cohesion.

The authors then rank the collected metrics according to their perceived usefulness for predicting defects and maintenance effort. Core risk metrics (high cyclomatic complexity, large method counts per class, deep inheritance, high CBO) are prioritized, while secondary metrics (comment density, line‑of‑code variance) are treated as supporting indicators. This ranking guides the selection of metrics that will be implemented in the subsequent refactoring experiments.

Refactoring is performed with JDeodorant, an Eclipse plug‑in that automatically suggests “Extract Method”, “Move Method”, and “Extract Class” transformations. Ten of the most complex methods are refactored using “Extract Method”, resulting in an average 27 % reduction in cyclomatic complexity and a modest 5 % increase in test coverage. High‑CBO classes are subjected to “Move Method” operations, lowering coupling by an average of 3.2 points. These empirical results validate the hypothesis that targeted metric‑driven refactoring can measurably improve maintainability.

Finally, the study applies MARFCAT, a machine‑learning based fuzzy code scanner, to detect security‑related code smells and vulnerabilities. MARF reveals three potential buffer‑management memory leaks, while GIPSY exposes two modules lacking proper input validation in the intention parser. The authors compile a set of concrete recommendations: split overly complex methods, introduce interfaces or dependency‑injection to reduce coupling, extract cohesive responsibilities into new classes, and harden vulnerable code with explicit resource cleanup and validation checks.

Overall, the paper demonstrates a reproducible workflow that integrates multiple static analysis tools, ranks metrics by impact, and uses automated refactoring to address the most critical quality deficiencies. It underscores that metric selection must be context‑aware and that future work should explore hybrid approaches combining static and dynamic analysis, continuous integration pipelines, and developer feedback loops to create a more comprehensive quality assurance ecosystem.


Comments & Academic Discussion

Loading comments...

Leave a Comment