Convolutional neural network models (CNNs) have made major advances in computer vision tasks in the last five years. Given the challenge in collecting real world datasets, most studies report performance metrics based on available research datasets. In scenarios where CNNs are to be deployed on images or videos from mobile devices, models are presented with new challenges due to lighting, angle, and camera specifications, which are not accounted for in research datasets. It is essential for assessment to also be conducted on real world datasets if such models are to be reliably integrated with products and services in society. Plant disease datasets can be used to test CNNs in real time and gain insight into real world performance. We train a CNN object detection model to identify foliar symptoms of diseases (or lack thereof) in cassava (Manihot esculenta Crantz). We then deploy the model on a mobile app and test its performance on mobile images and video of 720 diseased leaflets in an agricultural field in Tanzania. Within each disease category we test two levels of severity of symptoms - mild and pronounced, to assess the model performance for early detection of symptoms. In both severities we see a decrease in the F-1 score for real world images and video. The F-1 score dropped by 32% for pronounced symptoms in real world images (the closest data to the training data) due to a drop in model recall. If the potential of smartphone CNNs are to be realized our data suggest it is crucial to consider tuning precision and recall performance in order to achieve the desired performance in real world settings. In addition, the varied performance related to different input data (image or video) is an important consideration for the design of CNNs in real world applications.
A landmark in computer vision occurred in 2012 when a deep convolutional neural network (CNN) won the Imagenet competition to classify over 1 million images from 1,000 categories, almost halving the error rates of its competition 1 . This success brought about a revolution in computer vision with CNNs dominating the approach for a variety of classification and detection tasks. Large tech companies and startups have capitalized on these advances to design realtime computer vision products and services while companies such as NVIDIA, Intel, Qualcomm and Samsung are developing CNN chips to enable real-time vision applications in smartphones, cameras, robots and self-driving cars 1 .
As CNNs become the standard computer vision model to be deployed in real-time vision applications, assessing and reporting whether the results of their performance translates from research datasets to real time scenarios is crucial. Results of different CNN architectures are usually reported on standard large scale computer vision datasets of a million and more static images 2,3,4,5 . Domain specific datasets like medical imagery or plant diseases, where transfer learning is often applied to CNN models, comprise smaller datasets as expert labeled images are more challenging to acquire 6,7 .In a recent assessment for a skin lesion classification task, researchers reported the performance of the deep learning model matched at least 21 dermatologists tested across three critical diagnostic tasks 8 . This study was done on a labelled dataset of 129,450 clinical images and the researchers concluded that the technology is deployable on a mobile device but further evaluation in real-world settings is needed. Similar conclusions have been drawn by researchers 7,9,10,11,12 . Deploying on mobile devices would also be beneficial in democratizing access to algorithms while maintaining user privacy by running inference offline.
Despite the ubiquity of smartphones there are few examples of CNNs deployed on these phones categorizing visual scenes in the real world where performance is affected by input data type and compounded by wide extremes in lighting as is normal in outdoor settings. Clear examples of computer vision in real world settings such as autonomous vehicles (cars and drones) leverage multiple sensors in both the visible and non-visible spectrum 13,14 . If smartphone CNNs are to achieve their promise it is important to recognize the constraint of a single sensor (i.e. camera) and test the performance of CNNs on mobile devices in conditions they are intended to be used in.
Here, we investigate plant disease diagnostics on a mobile device. We deploy and test the performance of a CNN object detection model in a mobile app for real-time plant disease diagnosis in an agricultural field. Working with plant diseases provides a demanding case study because it is an outdoor setting with lighting conditions that could affect computer vision performance.
We use the Tensorflow platform to deploy a smartphone CNN object detection model designed to identify foliar symptoms of three diseases, two types of pest damage, and nutrient deficiency (or lack thereof) in cassava (Manihot esculenta Crantz). We utilize the Single Shot Multibox (SSD) model with the MobileNet detector and classifier pre-trained on the COCO dataset (Common Objects in Context) of 1.5 million images (80 object categories). For simplicity, we refer to the CNN object detector model as the CNN model. We employ transfer learning to fine tune the model parameters to our dataset which comprised 2,415 cassava leaf images for pronounced symptoms of each class. The cassava leaf dataset was built with images taken in experimental fields of the International Institute of Tropical Agriculture (IITA), in Bagamoyo District, Tanzania. Complete details of this dataset were previously reported in Ramcharan et al. (2017). In addition to the 6 image classes implemented in Ramcharan et al. (2017), an additional nutrient deficiency class of 336 images was included in this work and examples of all image classes are shown in Figure 1.
For this study, three cassava disease experts reviewed images and agreed on classifications. Images were then annotated at Penn State University. Initially three different annotation styles were tested to identify class objects: (1) whole leaflet -object bounding boxes are drawn around leaflets with visible symptoms and boxes contain the leaf edges, (2) within leaflet -object bounding boxes are drawn around visible symptoms, inside of leaflets only, and do not contain leaf edges, and (3) combined inside and whole leaflet -annotation style (1) and ( 2) are combined with the same class labels for whole leaflet and within leaflet bounding boxes. Based on training results to 500 epochs on two 16Gb NVIDIA V100 GPUs reported in Table S1, the whole leaflet annotation style recorded the lowest overall loss and was selected to test on a mobile device in the field.
We select
This content is AI-processed based on open access ArXiv data.