Title: PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds
ArXiv ID: 2512.24193
Date: 2025-12-30
Authors: ** - Pieter M. Blok* (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Haozhou Wang (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Hyun Kwon Suh (Graduate School of Agricultural and Life Sciences, The University of Tokyo) - Peicheng Wang (Department of Integrative Biological Sciences and Industry, Sejong University) - James Burridge (Department of Integrative Biological Sciences and Industry, Sejong University) - Wei Guo (Graduate School of Agricultural and Life Sciences, The University of Tokyo) **
📝 Abstract
Potato yield is a key indicator for optimizing cultivation practices in precision agriculture. Potato yield can be estimated directly on a harvester using RGB-D cameras, which capture three-dimensional (3D) information of individual tubers moving along the conveyor belt. A major challenge, however, is that the 3D point clouds reconstructed from RGB-D images are incomplete due to self-occlusion, leading to systematic underestimation of tuber weight. To overcome this limitation, we introduce PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing complete 3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty is an object height embedding that incorporates tuber height as an additional geometric cue, improving regression performance under practical harvesting conditions. PointRAFT was trained and evaluated on a dataset of 26,688 partial point clouds collected from 859 potato tubers across four cultivars and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds representing 172 unique tubers, PointRAFT achieved a mean absolute error (MAE) of 12.0 g and a root mean squared error (RMSE) of 17.2 g, substantially outperforming a linear regression baseline with an MAE of 23.0 g and an RMSE of 31.8 g. The proposed height embedding reduced RMSE by 30% compared to a standard PointNet++ regression network. With an average analysis time of 6.3 ms per point cloud, PointRAFT enables processing rates of up to 150 tubers per second, meeting the high-throughput requirements of commercial potato harvesters. Beyond potato weight estimation, PointRAFT provides a versatile regression network applicable to a wide range of 3D phenotyping and robotic perception tasks. The code, network weights, and a subset of the dataset are publicly available at https://github.com/pieterblok/pointraft.git.
💡 Deep Analysis
📄 Full Content
PointRAFT: 3D deep learning for high-throughput prediction of potato
tuber weight from partial point clouds⋆
Pieter M. Bloka,∗, Haozhou Wanga, Hyun Kwon Suhb, Peicheng Wanga, James Burridgea and
Wei Guoa
aGraduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Midori-cho, Nishitokyo-city, 188-0002, Tokyo, Japan
bDepartment of Integrative Biological Sciences and Industry, Sejong University, 209 Neungdong-ro, 05006, Seoul, Republic of Korea
A R T I C L E I N F O
Keywords:
Potato
3D Deep Learning
RGB-D
Point Cloud
Regression
A B S T R A C T
Potato yield is a key indicator for optimizing cultivation practices in precision agriculture. Potato
yield can be estimated directly on a harvester using RGB-D cameras, which capture three-dimensional
(3D) information of individual tubers moving along the conveyor belt. A major challenge, however,
is that the 3D point clouds reconstructed from RGB-D images are incomplete due to self-occlusion,
leading to systematic underestimation of tuber weight. To overcome this limitation, we introduce
PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D
shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing complete
3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty
is an object height embedding that incorporates tuber height as an additional geometric cue, improving
regression performance under practical harvesting conditions. PointRAFT was trained and evaluated
on a dataset of 26,688 partial point clouds collected from 859 potato tubers across four cultivars
and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds
representing 172 unique tubers, PointRAFT achieved a mean absolute error (MAE) of 12.0 g and a
root mean squared error (RMSE) of 17.2 g, substantially outperforming a linear regression baseline
with an MAE of 23.0 g and an RMSE of 31.8 g. The proposed height embedding reduced RMSE
by 30% compared to a standard PointNet++ regression network. With an average analysis time of
6.3 ms per point cloud, PointRAFT enables processing rates of up to 150 tubers per second, meeting
the high-throughput requirements of commercial potato harvesters. Beyond potato weight estimation,
PointRAFT provides a versatile regression network applicable to a wide range of 3D phenotyping and
robotic perception tasks. The code, network weights, and a subset of the dataset are publicly available
at https://github.com/pieterblok/pointraft.git.
1. Introduction
Potatoes (Solanum tuberosum) are an important compo-
nent of the human diet, as they provide high-energy car-
bohydrates, vitamin C, and dietary fibers (Camire, Kubow
and Donnelly, 2009). To safeguard the role of potatoes in
human nutrition, further optimization of potato production
is needed (Zhang, Xu, Wu, Hu and Dai, 2017). A major step
toward this improvement is through precision agriculture.
Precision agriculture enables site-specific application of fer-
tilizers and crop protection products, which leads to higher
yields, lower costs, and reduced environmental pressure
(Bullock, Lowenberg-DeBoer and Swinton, 2002; Van Ev-
ert, Gaitán-Cremaschi, Fountas and Kempenaar, 2017). To
steer precision agriculture practices, detailed information on
potato yield is required. In current practice, potato yield
mapping can be performed using load cells attached to the
harvester’s conveyor belt to measure the mass of harvested
produce in real time (Zamani, Ghoşamiparashkohi, Faghavi
and Ghezavati, 2014; Kabir, Myat Swe, Kim, Chung, Jeong
and Lee, 2018). Although load-cell systems are easy to use
and maintain, they suffer a major limitation: they measure
⋆This study is funded by the Sarabetsu Village "Endowed Chair for
Field Phenomics" project in Hokkaido, Japan.
∗Corresponding author: pieter.blok@fieldphenomics.com (P.M. Blok).
ORCID(s): 0000-0001-9535-5354 (P.M. Blok); 0000-0001-6135-402X (H.
Wang); 0000-0003-4771-9365 (H.K. Suh); 0000-0002-2194-9894 (J.
Burridge); 0000-0002-3017-5464 (W. Guo)
gross mass, including tare such as soil clods, stones, and
plant residue. The inclusion of tare can lead to overestima-
tion of tuber yield, particularly in areas where large amounts
of soil or crop residue are harvested together with the potato
tubers.
A more accurate alternative is the use of camera-based
yield monitoring systems, which can visually distinguish
potato tubers from tare. Such systems have been explored in
the scientific literature since the early 2000s (Noordam, Ot-
ten, Timmermans and van Zwol, 2000; Hofstee and Molema,
2003; ElMasry, Cubero, Moltó and Blasco, 2012; Razmjooy,
Mousavi and Soleymani, 2012; Lee, Kim, Lee and Shin,
2018; Long, Wang, Zhai, Wu, Li, Sun and Su, 2018; Si,
Sankaran, Knowles and Pavek, 2018; Su, Kondo, Li, Sun, Al
Riza and Habaragamuwa, 2018; Pandey, Kumar and Pandey,
2019; Cai, Jin, Xu and Yang, 2020; Lee and Shin, 2020;
Dolata,