PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds

February 20, 2026

Reading time: 6 minute

...

#Computer Science #Learning #Computer Vision

📝 Original Info

Title: PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds
ArXiv ID: 2512.24193
Date: 2025-12-30
Authors: ** - Pieter M. Blok* (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Haozhou Wang (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Hyun Kwon Suh (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Peicheng Wang (Department of Integrative Biological Sciences and Industry, Sejong University) - James Burridge (Department of Integrative Biological Sciences and Industry, Sejong University) - Wei Guo (Graduate School of Agricultural and Life Sciences, The University of Tokyo) **

📝 Abstract

Potato yield is a key indicator for optimizing cultivation practices in precision agriculture. Potato yield can be estimated directly on a harvester using RGB-D cameras, which capture three-dimensional (3D) information of individual tubers moving along the conveyor belt. A major challenge, however, is that the 3D point clouds reconstructed from RGB-D images are incomplete due to self-occlusion, leading to systematic underestimation of tuber weight. To overcome this limitation, we introduce PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing complete 3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty is an object height embedding that incorporates tuber height as an additional geometric cue, improving regression performance under practical harvesting conditions. PointRAFT was trained and evaluated on a dataset of 26,688 partial point clouds collected from 859 potato tubers across four cultivars and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds representing 172 unique tubers, PointRAFT achieved a mean absolute error (MAE) of 12.0 g and a root mean squared error (RMSE) of 17.2 g, substantially outperforming a linear regression baseline with an MAE of 23.0 g and an RMSE of 31.8 g. The proposed height embedding reduced RMSE by 30% compared to a standard PointNet++ regression network. With an average analysis time of 6.3 ms per point cloud, PointRAFT enables processing rates of up to 150 tubers per second, meeting the high-throughput requirements of commercial potato harvesters. Beyond potato weight estimation, PointRAFT provides a versatile regression network applicable to a wide range of 3D phenotyping and robotic perception tasks. The code, network weights, and a subset of the dataset are publicly available at https://github.com/pieterblok/pointraft.git.

💡 Deep Analysis

📄 Full Content

PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds⋆ Pieter M. Bloka,∗, Haozhou Wanga, Hyun Kwon Suhb, Peicheng Wanga, James Burridgea and Wei Guoa aGraduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Midori-cho, Nishitokyo-city, 188-0002, Tokyo, Japan bDepartment of Integrative Biological Sciences and Industry, Sejong University, 209 Neungdong-ro, 05006, Seoul, Republic of Korea A R T I C L E I N F O Keywords: Potato 3D Deep Learning RGB-D Point Cloud Regression A B S T R A C T Potato yield is a key indicator for optimizing cultivation practices in precision agriculture. Potato yield can be estimated directly on a harvester using RGB-D cameras, which capture three-dimensional (3D) information of individual tubers moving along the conveyor belt. A major challenge, however, is that the 3D point clouds reconstructed from RGB-D images are incomplete due to self-occlusion, leading to systematic underestimation of tuber weight. To overcome this limitation, we introduce PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing complete 3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty is an object height embedding that incorporates tuber height as an additional geometric cue, improving regression performance under practical harvesting conditions. PointRAFT was trained and evaluated on a dataset of 26,688 partial point clouds collected from 859 potato tubers across four cultivars and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds representing 172 unique tubers, PointRAFT achieved a mean absolute error (MAE) of 12.0 g and a root mean squared error (RMSE) of 17.2 g, substantially outperforming a linear regression baseline with an MAE of 23.0 g and an RMSE of 31.8 g. The proposed height embedding reduced RMSE by 30% compared to a standard PointNet++ regression network. With an average analysis time of 6.3 ms per point cloud, PointRAFT enables processing rates of up to 150 tubers per second, meeting the high-throughput requirements of commercial potato harvesters. Beyond potato weight estimation, PointRAFT provides a versatile regression network applicable to a wide range of 3D phenotyping and robotic perception tasks. The code, network weights, and a subset of the dataset are publicly available at https://github.com/pieterblok/pointraft.git. 1. Introduction Potatoes (Solanum tuberosum) are an important compo- nent of the human diet, as they provide high-energy car- bohydrates, vitamin C, and dietary fibers (Camire, Kubow and Donnelly, 2009). To safeguard the role of potatoes in human nutrition, further optimization of potato production is needed (Zhang, Xu, Wu, Hu and Dai, 2017). A major step toward this improvement is through precision agriculture. Precision agriculture enables site-specific application of fer- tilizers and crop protection products, which leads to higher yields, lower costs, and reduced environmental pressure (Bullock, Lowenberg-DeBoer and Swinton, 2002; Van Ev- ert, Gaitán-Cremaschi, Fountas and Kempenaar, 2017). To steer precision agriculture practices, detailed information on potato yield is required. In current practice, potato yield mapping can be performed using load cells attached to the harvester’s conveyor belt to measure the mass of harvested produce in real time (Zamani, Ghoşamiparashkohi, Faghavi and Ghezavati, 2014; Kabir, Myat Swe, Kim, Chung, Jeong and Lee, 2018). Although load-cell systems are easy to use and maintain, they suffer a major limitation: they measure ⋆This study is funded by the Sarabetsu Village "Endowed Chair for Field Phenomics" project in Hokkaido, Japan. ∗Corresponding author: pieter.blok@fieldphenomics.com (P.M. Blok). ORCID(s): 0000-0001-9535-5354 (P.M. Blok); 0000-0001-6135-402X (H. Wang); 0000-0003-4771-9365 (H.K. Suh); 0000-0002-2194-9894 (J. Burridge); 0000-0002-3017-5464 (W. Guo) gross mass, including tare such as soil clods, stones, and plant residue. The inclusion of tare can lead to overestima- tion of tuber yield, particularly in areas where large amounts of soil or crop residue are harvested together with the potato tubers. A more accurate alternative is the use of camera-based yield monitoring systems, which can visually distinguish potato tubers from tare. Such systems have been explored in the scientific literature since the early 2000s (Noordam, Ot- ten, Timmermans and van Zwol, 2000; Hofstee and Molema, 2003; ElMasry, Cubero, Moltó and Blasco, 2012; Razmjooy, Mousavi and Soleymani, 2012; Lee, Kim, Lee and Shin, 2018; Long, Wang, Zhai, Wu, Li, Sun and Su, 2018; Si, Sankaran, Knowles and Pavek, 2018; Su, Kondo, Li, Sun, Al Riza and Habaragamuwa, 2018; Pandey, Kumar and Pandey, 2019; Cai, Jin, Xu and Yang, 2020; Lee and Shin, 2020; Dolata,

📄 Read Full PDF on ArXiv