UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction

Reading time: 1 minute
...

📝 Original Info

  • Title: UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction
  • ArXiv ID: 2601.01655
  • Date: 2026-01-04
  • Authors: Emiliya Khidirova, Oktay Karakuş

📝 Abstract

Accurate crop yield prediction increasingly relies on diverse data streams, including satellite observations, meteorological reanalysis, soil composition, and topographic information. However, despite rapid advances in machine learning, most existing approaches remain crop-or region-specific and require substantial bespoke data engineering efforts. This limits scalability, reproducibility, and operational deployment. This study introduces UniCrop, a universal and reusable data pipeline designed to automate the acquisition, cleaning, harmonisation, and feature engineering of multi-source environmental data for crop yield prediction. For any given location, crop type, and temporal window, UniCrop automatically retrieves, harmonises, and engineers over 200 environmental variables from heterogeneous satellite, climate, soil, and topographic sources (Sentinel-1/2, MODIS, ERA5-Land, NASA POWER, SoilGrids, and SRTM), reducing them to a compact, analysis-ready feature set utilising a structured feature reduction workflow with minimum redundancy maximum relevance (mRMR). To validate the pipeline, UniCrop was applied to a rice yield dataset comprising 557 field observations. Using only the selected 15 features, four baseline machine-l...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut