Dataspace architecture and manage its components class projection
Big Data technology is described. Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. There is constructed dataspace architecture. Dataspace has focused solely - and passionately - on providing unparalleled expertise in business intelligence and data warehousing strategy and implementation. Dataspaces are an abstraction in data management that aims to overcome some of the problems encountered in data integration system. In our case it is block vector for heterogeneous data representation. Traditionally, data integration and data exchange systems have aimed to offer many of the purported services of dataspace systems. Dataspaces can be viewed as a next step in the evolution of data integration architectures, but are distinct from current data integration systems in the following way. Data integration systems require semantic integration before any services can be provided. Hence, although there is not a single schema to which all the data conforms and the data resides in a multitude of host systems, the data integration system knows the precise relationships between the terms used in each schema. As a result, significant up-front effort is required in order to set up a data integration system. For realization of data integration from different sources we used SQL Server Integration Services, SSIS. For developing the portal as an architectural pattern there is used pattern Model-View-Controller (MVC). There is specifics debug operation data space as a complex system. The query translator in Backus/Naur Form is give.
💡 Research Summary
The paper titled “Dataspace architecture and manage its components class projection” presents a conceptual framework for integrating heterogeneous data sources in the era of big data. The authors argue that traditional data integration systems require extensive upfront semantic mapping and a unified schema, which is costly and inflexible. In contrast, their proposed “dataspace” architecture embraces the existence of multiple, disparate schemas and focuses on incremental integration through layered services.
The architecture is organized into three logical layers: the Data Level, the Manage Level, and the Metadescribe Level. The Data Level stores structured databases, semi‑structured XML or spreadsheets, and unstructured text as a “block vector” of information products. The Manage Level provides modules for user permission determination, metadata handling, data cleansing, uncertainty elimination, quality assessment, and query transformation. The Metadescribe Level holds meta‑descriptions of data sources, including schema definitions and access protocols, and defines generic operations such as selection, grouping, and keyword search for different data types. Figures in the paper illustrate the hierarchical organization and the interaction between these modules.
Implementation is based on Microsoft SQL Server Integration Services (SSIS) for ETL processing and on the Model‑View‑Controller (MVC) pattern for the user portal. The authors claim that SSIS’s pipeline architecture allows data to flow without intermediate storage, using buffering and partial caching to improve performance on large tables. They also describe a query translator expressed in Backus‑Naur Form (BNF) that converts user‑level meta‑language queries into executable SQL or NoSQL statements, although concrete translation rules are not provided.
A substantial portion of the paper is devoted to object‑oriented class specifications. Core classes include Model, Entity, Relation, and Attribute, each with properties and methods for loading, modifying, and persisting metadata. The Model class encapsulates connections to data sources, maintains collections of entities and relations, and offers CRUD operations on the catalog. Entity objects contain attributes, constraints, and methods for retrieving inbound and outbound relationships. Relation objects define start and end entities, cardinality constraints, and persistence functions. The design also incorporates Enterprise JavaBeans (EJB) lifecycle methods (ejbCreate, ejbRemove, etc.) and a connection‑pool interface, suggesting an intention to deploy the system in an enterprise environment.
Despite its breadth, the paper suffers from several critical shortcomings. The notion of a “block vector” is introduced without a formal definition or implementation details, leaving readers uncertain how heterogeneous data are physically represented. No performance benchmarks, scalability tests, or comparative analysis with existing solutions such as data lakes, federated databases, or Hadoop‑based platforms are presented. The class diagrams exhibit inconsistent naming conventions, and the mixing of EJB concepts with SSIS components blurs architectural boundaries. Moreover, the English language quality is poor, with numerous grammatical errors that impede comprehension.
In summary, the work contributes a high‑level vision of a dataspace‑centric integration platform and demonstrates a prototype built with SSIS and MVC. However, the lack of concrete algorithms, empirical evaluation, and clear differentiation from established technologies limits its scholarly impact. Future research should focus on formalizing the block‑vector representation, implementing robust query translation mechanisms, conducting rigorous performance evaluations, and clarifying the integration of enterprise middleware components.
Comments & Academic Discussion
Loading comments...
Leave a Comment