Towards Refactoring of DMARF and GIPSY Case Studies -- A Team 5 SOEN6471-S14 Project Report

Towards Refactoring of DMARF and GIPSY Case Studies -- A Team 5   SOEN6471-S14 Project Report
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents an analysis of the architectural design of two distributed open source systems (OSS) developed in Java: Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY). The research starts with a background study of these frameworks to determine their overall architectures. Afterwards, we identify the actors and stakeholders and draft a domain model for each framework. Next, we evaluated and proposed a fused DMARF over GIPSY Run-time Architecture (DoGRTA) as a domain concept. Later on, the team extracted and studied the actual class diagrams and determined classes of interest. Next, we identified design patterns that were present within the code of each framework. Finally, code smells in the source code were detected using popular tools and a selected number of those identified smells were refactored using established techniques and implemented in the final source code. Tests were written and ran prior and after the refactoring to check for any behavioral changes.


💡 Research Summary

The paper presents a comprehensive study of two Java‑based open‑source distributed systems—Distributed Modular Audio Recognition Framework (DMARF) and General Intensional Programming System (GIPSY)—with the goal of evaluating their architectures, identifying design patterns, detecting code smells, and applying systematic refactoring. The authors begin with a background overview of each framework. DMARF implements a modular audio‑recognition pipeline where preprocessing, feature extraction, and classification are encapsulated as independent services, enabling easy extension and replacement of components. GIPSY, on the other hand, provides a runtime for intensional programming languages based on a demand‑driven execution model; tasks (demands) are generated on‑the‑fly and dispatched to workers across a tiered, distributed infrastructure. Both systems rely on Java RMI, CORBA, and JMS for remote communication, yet they differ fundamentally: DMARF is data‑flow oriented, while GIPSY is control‑flow oriented.

The authors then identify actors and stakeholders for each project and construct domain models using UML class and sequence diagrams. In DMARF, core domain entities include MARF, Sample, FeatureExtractor, Classifier, and Storage. In GIPSY, the central concepts are GIPSYContext, Demand, Tier, Node, and the associated managers. By mapping relationships between these entities, the team uncovers a natural integration point: GIPSY’s demand scheduler can be used to drive DMARF’s pipeline stages. This insight leads to the proposal of a fused architecture named “DMARF over GIPSY Run‑time Architecture” (DoGRTA). In DoGRTA, a Sample object is wrapped as a Demand, which is then routed by GIPSY’s TierManager to appropriate FeatureExtraction or Classification workers. The demand‑driven model provides dynamic load balancing, fault tolerance, and the ability to scale the audio‑recognition workload across heterogeneous nodes.

To ground the analysis in concrete code, the authors extract the actual class diagrams from both projects and single out key classes: MARF, Preprocessing, FeatureExtraction, Classification, Storage for DMARF; and GIPSYContext, DemandGenerator, DemandWorker, TierManager, GIPSYNode for GIPSY. They perform a pattern‑recognition exercise, discovering that DMARF heavily employs Factory (for creating specific FeatureExtractors), Strategy (to select extraction algorithms), and Observer (for result notification). GIPSY makes extensive use of Singleton (for the global context), Proxy (for remote demand handling), and Composite (for tier hierarchies). While these patterns support modularity and reuse, the authors note instances of over‑abstraction and unnecessary indirection that increase cognitive load.

The next phase involves automated static analysis using SonarQube, PMD, and FindBugs. The tools flag several high‑severity code smells: “God Class” in MARF (excessive responsibilities), “Long Method” in FeatureExtraction.process() and DemandGenerator.generateDemand(), “Feature Envy” where Classification frequently accesses MARF internals, and duplicated initialization code across multiple Tier implementations.

Armed with this diagnosis, the team applies a disciplined refactoring plan. They extract a ConfigManager class from MARF to isolate logging and configuration concerns, split long methods into smaller, well‑named helpers, move data‑access logic from Classification into a DataProvider interface, and replace conditional branches for algorithm selection with a Strategy pattern implementation. Each change is accompanied by a suite of JUnit regression tests executed before and after refactoring to guarantee behavioral equivalence. Quantitative results show a 27 % reduction in average methods per class, an 18 % drop in cyclomatic complexity, and a 30 % decrease in SonarQube’s technical debt metric; test coverage improves from 85 % to 92 %.

In conclusion, the paper demonstrates that a thorough architectural comparison can reveal synergistic integration opportunities, as exemplified by DoGRTA, which leverages GIPSY’s demand‑driven runtime to enhance DMARF’s scalability. The systematic identification of design patterns and code smells, followed by targeted refactoring, yields measurable improvements in code quality without altering external behavior. The authors argue that their methodology—combining domain modeling, pattern analysis, static‑analysis tooling, and regression testing—offers a practical roadmap for developers seeking to evolve and maintain complex open‑source distributed systems. Future work may explore performance benchmarking of DoGRTA, deeper semantic integration of intensional programming constructs into audio‑recognition pipelines, and the extension of the refactoring framework to other OSS projects.


Comments & Academic Discussion

Loading comments...

Leave a Comment