Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying

Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Fruit drying is widely used in food manufacturing to reduce product moisture, ensure product safety, and extend product shelf life. Accurately predicting final moisture content (MC) is critically needed for quality control of drying processes. State-of-the-art methods can build deterministic relationships between process parameters and MC, but cannot adequately account for inherent process variabilities that are ubiquitous in fruit drying. To address this gap, this paper presents a novel multi-modal data fusion framework to effectively fuse two modalities of data: tabular data (process parameters) and high-dimensional image data (images of dried apple slices) to enable accurate MC prediction. The proposed modeling architecture permits flexible adjustment of information portion from tabular and image data modalities. Experimental validation shows that the multi-modal approach improves predictive accuracy substantially compared to state-of-the-art methods. The proposed method reduces root-mean-squared errors by 19.3%, 24.2%, and 15.2% over tabular-only, image-only, and standard tabular-image fusion models, respectively. Furthermore, it is demonstrated that our method is robust in varied tabular-image ratios and capable of effectively capturing inherent small-scale process variabilities. The proposed framework is extensible to a variety of other drying technologies.


💡 Research Summary

The paper introduces a novel multimodal data‑fusion framework for predicting the final moisture content (MC) of apple slices undergoing hot‑air drying. Recognizing that traditional models rely solely on structured process parameters (temperature, air velocity, drying time) and thus overlook inherent sample‑specific variations such as color, thickness, and weight, the authors propose to combine these tabular inputs with high‑dimensional in‑situ images of the dried slices. The experimental campaign uses three temperature levels (60 °C, 70 °C, 80 °C) and two air‑velocity settings (1.5 m/s, 2.5 m/s), generating 84 data points. Each point includes three tabular variables and a calibrated RGB image captured under controlled lighting. Images are first color‑balanced and segmented using the Segment‑Anything Model (SAM) to isolate the fruit, then fed into a ResNet‑18 backbone that outputs a 512‑dimensional visual embedding. Parallelly, the tabular data pass through a three‑layer fully‑connected (FC) network, also producing a 512‑dimensional embedding. The two embeddings are concatenated with an adjustable weighting ratio and processed by an additional FC layer (1024 units) to predict MC. Baseline comparisons involve a tabular‑only regression model, an image‑only ResNet‑18, and a standard multimodal model that simply concatenates raw features. Results show that the proposed architecture reduces root‑mean‑square error (RMSE) by 19.3 % relative to the tabular‑only model, 24.2 % relative to the image‑only model, and 15.2 % relative to the standard multimodal baseline. Ablation studies confirm that the adjustable fusion ratio allows the network to capture subtle sample‑level variability that is invisible to process parameters alone. The authors also emphasize a novel data‑splitting strategy that prevents overlap between training and evaluation sets, improving robustness on the limited industrial dataset. Limitations include the modest sample size and sensitivity to lighting conditions, suggesting the need for further calibration in real‑world settings. Future work is outlined to extend the framework to other drying technologies (e.g., freeze‑drying), incorporate additional sensor modalities (thermal imaging, acoustic emissions), and explore transformer‑based cross‑modal attention mechanisms for even richer feature integration. Overall, the study demonstrates that a carefully designed multimodal fusion pipeline can substantially enhance predictive accuracy and robustness in food‑processing quality control.


Comments & Academic Discussion

Loading comments...

Leave a Comment