Interpolating scattered CFD datasets onto a uniform Cartesian grid can distort the true geometry, producing a convex-hull type envelope and activating nonphysical regions. This work presents a reconstruction framework that recovers physically consistent masks before exporting CNN-ready fields. It introduces two novel strategies, distance-based masking and an adaptive alpha-shape formulation that normalizes alpha using local data resolution, and evaluates them against classical alpha-shape boundary recovery. A quantitative, topology-aware metric suite is introduced to assess retention, suppression of unsupported regions, overlap consistency, and connectivity. The novel distance-based method is robust across the geometries considered under the same threshold rule, with tau set to the minimum CFD grid spacing, and achieves 500-800 times speedups over classical alpha-shapes. The adaptive alpha-shape remains stable when its control parameter is set to 1 and is 1.7-2.6 times faster than the classical variant, which requires geometry-specific alpha tuning. A lightweight boundary inflation post-process using a minimal dilation further improves retention by up to 2.96% with negligible unsupported activation (less than 0.08%). Overall, the distance-based method is recommended as the default due to its accuracy, stability, minimal tuning, and low cost, while the adaptive alpha-shape is a strong alternative when grid-spacing information for threshold selection is unavailable. A companion web application operationalizes the workflow end to end, enabling 2D ASCII dataset upload, parameter tuning, mask and boundary generation, and export of CNN-ready outputs.
Convolutional Neural Networks (CNNs) have emerged as one of the most powerful architectures for extracting hierarchical and spatially correlated features from structured data [1][2][3]. Originally developed for computer vision [4,5], they have proven highly effective in learning multi-scale representations from complex scientific and engineering datasets [6][7][8]. Their strength lies in the convolutional operation, which employs local receptive fields and weight sharing to capture spatial dependencies while ensuring translational invariance and computational efficiency. By integrating convolutional, pooling, normalization, and activation layers, CNNs progressively learn both lowand high-level abstractions of physical fields and patterns. When trained on properly structured and normalized datasets, CNNs can act as fast, data-driven surrogates for computationally expensive simulations and as versatile tools for feature extraction, dimensionality reduction, and pattern recognition [9][10][11][12]. Their ability to approximate nonlinear mappings between boundary conditions and physical responses enables near-real-time prediction and optimization across many domains. In fluid dynamics and heat transfer, CNNs have been used to reconstruct velocity [13,14], pressure [15,16], and temperature [17,18] fields, predict turbulent structures [19,20], and model thermo-fluid phenomena such as convection, boundary-layer development, and flow separation [21][22][23]. They accelerate high-fidelity simulations and support the design of efficient thermal systems, including heat exchangers, energy storage units, and phase-change processes [24][25][26]. Beyond thermo-fluids, CNNs play a major role in materials science for microstructural analysis and property prediction [27,28], medical imaging for segmentation and diagnosis [29,30], and geosciences for weather forecasting, seismic interpretation, and groundwater modeling [31][32][33]. By efficiently capturing spatial dependencies and preserving geometric fidelity, CNNs bridge data-driven inference with physical interpretability, motivating the development of preprocessing and domain-reconstruction frameworks for generating CNN-ready structured datasets.
The performance of CNNs fundamentally depends on the availability of structured and gridaligned datasets, as convolution operations assume spatial regularity with uniformly distributed neighboring points. Such representations enable consistent kernel application, efficient computation, and progressive feature extraction across multiple scales. In contrast, most scientific and engineering datasets are unstructured or scattered, obtained from irregular meshes, experiments, or point clouds with missing regions [34,35]. To address this, point-based neural networks such as PointNet [36], PointNet++ [37], PointConv [38], and Kernel Point Convolution (KPConv) [39], along with graph-based architectures such as Graph Convolutional Network (GCN) [40], Graph SAmple and aggreGatE (GraphSAGE) [41], ChebNet [42], Graph Attention Network (GAT) [43], and MeshCNN [44], have been developed to process irregular or non-Euclidean data directly. These methods effectively preserve geometric and topological relationships without requiring data regularization. However, they often entail high computational cost, complex neighborhood construction, and limited efficiency in capturing local spatial correlations for large-scale multidimensional fields [45,46]. In comparison, CNNs leverage structured grids with well-defined receptive fields, offering efficient convolution operations, spatial translation invariance, and straightforward parallelization on modern Graphics Processing Units (GPUs). Their scalability, computational efficiency, and training stability make them particularly suitable for large scientific and engineering datasets. Consequently, transforming unstructured or scattered data into structured, grid-aligned representations through data reconstruction and domain regularization remains a practical and powerful approach [5,47]. Yet, accurately recovering domain boundaries, especially near irregular or disconnected regions, remains a central challenge for maintaining geometric fidelity and physical consistency. This difficulty arises because uniform-grid interpolation inherently distorts the geometry, often creating a convex envelope that includes spurious regions. This necessitates reconstruction methods capable of resolving the true concave physical boundaries from these distorted representations [48].
Distance-based boundary recovery methods, often implemented through thresholding or level-set formulations, provide a simple yet highly effective strategy for reconstructing geometric domains from scattered scientific or engineering datasets. The core idea is to compute a distance field between each point on a structured grid and the nearest scattered sample, typically using efficient Nearest-Neighbor (NN) searches such as K-Dimensional (KD) tr
This content is AI-processed based on open access ArXiv data.