Predicting Global Variations in Outdoor PM2.5 Concentrations using Satellite Images and Deep Convolutional Neural Networks

Pr edicting Global V ariations in Outdoor PM 2 . 5 Concentrations using Satellite Images and Deep Con volutional Neural Netw orks Kris Y . Hong McGill Uni versity Montreal, QC, Canada Pedro O. Pinheiro Element AI Montreal, QC, Canada Scott W eichenthal McGill Uni versity Montreal, QC, Canada scott.weichenthal@mcgill.ca Abstract Her e we pr esent a new method of estimating global variations in outdoor PM 2 . 5 concentrations using satel- lite images combined with gr ound-level measurements and deep con volutional neur al networks. Speciﬁcally , ne w deep learning models wer e trained over the global PM 2 . 5 con- centration range ( < 1-436 µ g/m 3 ) using a lar ge database of satellite images pair ed with gr ound level PM 2 . 5 measur e- ments available fr om the W orld Health Or ganization. F i- nal model selection was based on a systematic evaluation of well-known ar chitectur es for the con volutional base in- cluding InceptionV3, Xception, and VGG16. The Xception ar chitectur e performed best and the ﬁnal global model had a r oot mean squar e err or (RMSE) value of 13.01 µ g/m 3 (R 2 =0.75) in the disjoint test set. The pr edictive perfor- mance of our new global model (called IMAGE-PM 2 . 5 ) is similar to the current state-of-the-art model used in the Global Bur den of Disease study but relies only on satel- lite images as input. As a r esult, the IMA GE-PM 2 . 5 model offer s a fast, cost-effective means of estimating global vari- ations in long-term avera ge PM 2 . 5 concentrations and may be particularly useful for r e gions without gr ound monitor- ing data or detailed emissions in ventories. The IMAGE- PM 2 . 5 model can be used as a stand-alone method of global exposur e estimation or incorporated into mor e complex hi- erar chical model structur es. 1. Introduction En vironmental pollution is a global health concern with economic impacts measured in billions of dollars each year [ 12 ]. In particular , ambient ﬁne particulate air pollu- tion (PM 2 . 5 ) kills millions of people around the world an- nually and is consistently ranked among the leading global burden of disease risk f actors [ 23 ]. In recent years, incredible progress has been made in es- timating global variations in outdoor PM 2 . 5 concentrations Count 1 − 20 21 − 40 41 − 60 61 − 80 81 − 100 >100 Figure 1. Locations of global monitoring sites for PM 2 . 5 . through the combined use of multiple complex data streams including remote sensing estimates of aerosol optical depth, chemical transport models, and ground-le vel geographic in- formation [ 3 , 27 , 20 ]. Other approaches to air pollution exposure assessment include the use of statistical models ( e.g . land use regression models) that combine geographic information system (GIS) data with ground monitoring data to predict e xposures in locations without measurements. While this approach gen- erally works well [ 29 , 18 ], detailed GIS data are often av ail- able on a limited spatial scale and land use regression mod- els are not generalizable across cities [ 17 ]. Alternativ ely , information on traf ﬁc, land use, the built environment, and other potential sources of exposure can also be captured in digital images both locally and through satellite imagery . As such, large databases of paired pollutant-image samples may provide an alternativ e, cost-ef fecti ve means of training deep con volutional neural networks [ 13 ] for the purpose of estimating environmental exposures across broad geographic areas including regional v ariations in am- bient PM 2 . 5 concentrations [ 28 ]. While deep learning image analysis is increasingly used for computer vision applications in medicine [ 9 , 10 , 7 , 2 ], little work has focused on combining digital images with deep con volutional neural networks for the purpose of es- timating environmental exposures [ 14 ]. Ne vertheless, re- 1 cent applications of deep learning in en vironmental health research hav e provided encouraging results including reli- able predictions of spatial dif ferences in obesity pre valence based on built en vironment characteristics [ 14 ]. In this study , our goal was to explore the use of deep con volutional neural networks in estimating global v aria- tions in annual av erage outdoor PM 2 . 5 concentrations using only satellite images. Speciﬁcally , we examined the perfor- mance of a series of deep con volutional neural networks in estimating outdoor PM 2 . 5 concentrations across the global exposure range as well as over the more limited exposure range of North America. While the global PM 2 . 5 database cov ered the entire North American exposure range, a sepa- rate North American model was dev eloped to examine the applicability of this method across a narro w concentration gradient. 2. Method 2.1. Long-T erm A verage Outdoor PM 2 . 5 Data A global database of annual a verage ground-le vel PM 2 . 5 measurements and corresponding latitude-longitude coor- dinates was compiled from the W orld Health Org aniza- tion [ 16 ]. These data were collected primarily between 2010 and 2016 (89 samples were collected in 2017 and 142 sam- ples were collected between 2000 and 2009) and included approximately 20,000 measurements from approximately 6,000 unique monitoring sites in 98 countries. W e did not include PM 2 . 5 measurements estimated from PM 10 values in training our models. The locations of global monitoring sites are shown in Figure 1 . In North America, a database of long-term a verage out- door PM 2 . 5 concentrations (2010-2012) was obtained at a 0.01 decimal degree grid resolution (approximately 1km apart) from the Atmospheric Composition Analysis Group at Dalhousie Univ ersity , Canada [ 1 ]. These data were estimated by combining aerosol optical depth informa- tion with the GEOS-Chem chemical transport model with subsequent calibration to ground-based observations using geographically-weighted regression [ 27 , 1 ]. In total, the North American database included between approximately 87,000 and 623,000 ground le vel PM 2 . 5 estimates depend- ing on the zoom le vel used for satellite images (described below). 2.2. Satellite Images Satellite images centred on each latitude-longitude pair for ground-le vel PM 2 . 5 data were downloaded from Google Static Maps using the ggmap package in the R statisti- cal computing en vironment [ 25 , 11 ]. Four satellite images were downloaded for each monitoring site in the global database, dif fering by integer zoom lev els ranging from 13 (cov ering approximately 10 × 10 k m ) to 16 (approximately 1 . 5 × 1 . 5 k m ). All images were sav ed at a resolution of 256 × 256 × 3 to maintain a reasonable training time; zoom lev el 16 was excluded from the North American database owing to the excessiv e training times required. All satellite images were dated between September-December 2018. 2.3. Data Processing All latitude-longitude coordinates for PM 2 . 5 -image pairs in the global database were ﬁrst geohashed to a precision of three [ 15 ]. The geohashing process maps each latitude- longitude pair onto a global grid of rectangular cells, where each cell is deﬁned by a unique geohash code. The resolu- tion of the global grid depends on the precision le vel: the selected precision level of three corresponds to cells with areas less than approximately 156 × 156 k m , with widths de- creasing moving from the equator to the poles. The global database was then randomly split into training (80%), val- idation (10%), and test sets (10%) such that the three sets were disjoint by geohash codes. This ensured that satellite images were disjoint between sets, allowing us to ev aluate the generalizability of model estimates to data not encoun- tered during the training process. During the model dev elopment phase, the training set was used to ﬁt model weights, and the validation set was used for hyperparameter tuning ( i.e . choosing the optimal con volutional base, optimizer , and learning rate; and also adjusting the learning rate between epochs and deciding when to cease training using callbacks as described below). As both the training and validation sets were used to build and select the ﬁnal model, an independent test set (which played no role in model training or selection) was used to ev aluate ﬁnal model performance. Multiple ground-level measurements were available for annual average PM 2 . 5 concentrations for some sites in the global database. This meant that multiple exposure v alues ( i.e . year-to-year changes in annual av erage PM 2 . 5 concen- tration at the same location over time) could be assigned to the same satellite image. W e approached this issue in two ways: 1) Models were developed a veraging all av ail- able exposure data for each latitude-longitude pair; 2) Mod- els were dev eloped without averaging allo wing individual images to hav e different exposure values based on changes in annual av erage PM 2 . 5 concentrations over time. Prelim- inary models fav oured the second approach ( i.e . allo wing the same image to ha ve dif ferent PM 2 . 5 concentrations o ver different years) and therefore only the second approach was explored in detail. As a result, global model e valuation was also based on single year annual a verage ground-lev el mea- surements. For the North American database, we extracted PM 2 . 5 - image pairs from the complete dataset at decimal degree resolutions of 0.15, 0.10, and 0.05, for zoom lev els 13, 14, and 15, respectively . These resolutions were selected such that satellite images did not ov erlap in area within each zoom lev el. These data were then randomly split into train- ing, validation, and test sets. 2.4. Model T raining and Evaluation Models were de veloped to predict spatial v ariations in outdoor PM 2 . 5 concentrations on a continuous scale using linear activ ations as well as across deciles of exposure (ten ordinal categories of exposure split by deciles) using soft- max [ 4 ] activ ations. All models included a conv olutional base for feature extraction with an input size of 256 × 256 × 3 ( i.e . width x height x color channels). Dropout layers with rates of 0.5 were included after the conv olutional base and after the densely connected network to minimize ov erﬁt- ting. ImageNet weights were used for model initialization, and all models were trained using a batch size of 64 im- ages (16 images per GPU) for up to 100 epochs. During model training, callback functions were used to: 1) De- crease the learning rate by a factor of 0.1 if the v alidation accuracy did not impro ve for 10 epochs; and 2) Stop model training if the validation accuracy did not improve for 20 epochs. F or each of the four tasks of predicting continu- ous/categorical PM 2 . 5 on the global/North American scales, the model with the highest validation classiﬁcation accuracy (for decile predictions) or the lo west validation root mean square error (RMSE) (for continuous predictions) was re- tained. For cate gorical models, we also report the “one- off accurac y” which reﬂects the proportion of the time the model predicts the correct class or one category aw ay from the correct class ( e.g . predicting decile 9 when the true value is decile 10). Final model selection was based on a systematic ev alu- ation of sev eral well-known architectures for the con volu- tional base including InceptionV3 [ 24 ], Xception [ 6 ], and VGG16 [ 22 ]. In addition, se veral optimizers were tested in- cluding RMSProp [ 26 ] and Nadam [ 8 ] with learning rates of 0.001 and 0.0001. A detailed leaderboard was main- tained, tracking the performance of different combinations of model architectures and hyper-parameters; the model that performed best on the validation dataset was selected as the ﬁnal model. Generally , the InceptionV3 and Xception architectures combined with the Nadam optimizer at a learning rate of 0.0001 performed best on the data, and these results are de- scribed in detail. F or the ﬁnal models, gradient-weighted class acti v ation maps [ 19 ] and ﬁlter visualizations were used to examine speciﬁc portions of images used to make predictions and to ev aluate which features were learned at various layers of the model. All analyses were con- ducted using the Keras package [ 5 ] in R and Python with two Lambda Quad W orkstations (Lambda Labs, San Fran- sisco, CA) containing 4 GPUs each (NVIDIA T itan Xp or 1080 T i). On av erage, global model training took 2- minutes/epoch whereas the North America model took 10, 20, or 60-minutes/epoch for zoom lev els 13, 14, and 15, respectiv ely . As an additional model ev aluation step, we compared continuous PM 2 . 5 estimates from our ﬁnal global model (called IMA GE-PM 2 . 5 ) to those of the Data Integration Model for Air Quality (DIMA Q) used by the Global Bur- den of Disease study [ 20 , 21 ]. This comparison was conducted for approximately 9000 locations (113 countries) between 2010 and 2016 (approx- imately 4000-6000 measurements per year) with 34,794 annual average measurements ranging from < 1 µ g/m 3 to 332 µ g/m 3 (mean=20.04 µ g/m3, SD=18.76 µ g/m 3 ). In ad- dition, we compared our global model estimates to mean DIMA Q estimates av eraged over the entire 2010-2016 pe- riod. Finally , we calculated site-speciﬁc differences be- tween our IMAGE-PM 2 . 5 estimates and mean DIMA Q esti- mates (2010-2016) to ev aluate potential geographic patterns in the magnitude of disagreement between the two models. 2.5. Data A vailability All PM 2 . 5 data, code, image ﬁles, and ﬁnal deep learning models are freely av ailable upon request. 3. Results The global database contained approximately 19,650 pollution-image pairs with annual mean PM 2 . 5 concentra- tions ranging from less than 1 µ g/m 3 to 436 µ g/m 3 with a mean v alue of 23.2 µ g/m3 (SD= 22.9 µ g/m 3 ) (T able 1 ). Es- timated PM 2 . 5 concentrations were much lower for North American ranging from less than 1 µ g/m 3 to 16.4 µ g/m 3 with a mean value of 4.36 µ g/m 3 (SD=2.30). In models classifying PM 2 . 5 concentrations across deciles, the Xception model architecture performed best in both the global and North American databases (T able 2 ). Speciﬁcally , the ﬁnal global categorical model (using the Xception base and zoom level-13 for satellite images) had a validation accuracy of 35.33% across deciles (10% accu- racy would be e xpected by random chance) (T able 2 ). The confusion matrix presented in Figure 2 A illustrates model performance on the test set and indicates that cat- egorical predictions were best at lower and upper deciles with decreasing performance to wards the inner classes. Overall, the global categorical model achieved a test accu- racy of 33.69% and a one-off test accuracy of 65.71%. The ﬁnal categorical model for North America (using the Xcep- tion base and zoom lev el-15 satellite images) achiev ed a validation accuracy of 50.95% (T able 2 ). As with the global categorical model, the North America model performed bet- ter at the extremes (Figure 3 A) with poorer accurac y for the central classes. The test accuracy of the model was 47.07% and its one-off test accurac y was 78.41%. Database Zoom n Mean SD Decile Min 1 2 3 4 5 6 7 8 9 Max Global 13-16 19,657 23.24 22.94 0.50 7.00 8.49 9.79 11.51 14.03 17.64 24.05 35.78 54.81 436.44 13 87,104 3.98 2.23 0.00 1.90 2.30 2.70 3.00 3.40 3.80 4.40 5.30 7.80 14.00 N. America 14 194,739 3.98 2.23 0.00 1.90 2.30 2.70 3.00 3.40 3.80 4.40 5.30 7.80 15.50 15 623,759 4.36 2.30 0.00 2.10 2.60 3.00 3.30 3.70 4.20 4.80 6.10 8.30 16.40 T able 1. Descriptive statistics for the PM 2 . 5 ( µ g/m 3 ) data in the Global and North American databases. Model Architecture Zoom Decile Class. Accuracy (%) SD (PM 2 . 5 ) RMSE Global Model InceptionV3 13 32.38 23.70 13.87 14 30.16 23.70 13.86 15 30.32 23.70 14.03 16 30.00 23.70 14.22 Xception 13 35.33 23.70 13.63 14 33.06 23.70 14.18 15 31.61 23.70 13.64 16 31.61 23.70 14.31 North American Model InceptionV3 13 42.28 2.21 0.85 14 43.81 2.24 0.83 15 48.88 2.31 0.77 Xception 13 44.66 2.21 0.77 14 45.95 2.24 0.74 15 50.95 2.31 0.72 T able 2. Model performance on the validation set across differ- ent model architectures and zoom lev els. The standard de viation of PM 2 . 5 values in the v alidation set are sho wn as a baseline for ev aluating RMSE values. F i g ur e 1 . Measured versus predic ted g lo bal PM 2. 5 con cen t r a t i on s in the test set fo r 10 - ca t eg or y cl a s s i f i ca t i on ( A) a n d r eg r es s i on ( B) . T he final mo d e l u s e s t h e Xcep t i on base with z o o m l e v e l - 13 s a t e l l i t e i ma g e s . 0.41 0.31 0.15 0.02 0.01 0 0.04 0 0.02 0.04 0.13 0.31 0.4 0.07 0.02 0.01 0.02 0.02 0 0.02 0.09 0.26 0.38 0.15 0.03 0.04 0.01 0.01 0 0.03 0.05 0.18 0.3 0.21 0.08 0.08 0.02 0.03 0.01 0.04 0.07 0.12 0.14 0.09 0.15 0.23 0.06 0.06 0.03 0.05 0.09 0.1 0.08 0.11 0.08 0.19 0.13 0.07 0.06 0.09 0.19 0.1 0.03 0.09 0.08 0.05 0.19 0.14 0.06 0.07 0.16 0.01 0 0.01 0.07 0.01 0.23 0.21 0.18 0.12 0.01 0 0 0 0 0 0.01 0.11 0.53 0.34 0.01 0 0 0 0 0 0 0.03 0.22 0.74 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Predicted Actual A 0 20 40 60 80 0 20 40 60 80 Predicted Actual Region Canada/USA Europe China India Other B Figure 2. Measured versus predicted global PM 2 . 5 concentrations in the test set for 10-category classiﬁcation (A) and re gression (B). The ﬁnal model uses the Xception base with zoom lev el-13 satel- lite images. The Xception model architecture also performed best for continuous models in both the global and North American databases. For the global IMA GE-PM 2 . 5 model (using the Xception base model and zoom lev el-13 satellite images), the lowest validation RMSE value was 13.63 µ g/m 3 (T a- ble 2 ). On the test dataset, the global model achie ved an RMSE v alue of 13.01 µ g/m 3 with an R 2 value of 0.75 (Fig- ure 1 2 B); howe ver , model predictions tended to underesti- mate measured values at higher concentrations as indicated F i g ur e 2 . M e a s u r e d v e r s u s p r e d i c t e d N o r t h A me r i c a n PM 2. 5 con cen t r a t i on s i n t h e t es t s et fo r 10 - ca t eg or y cl a s s i f i ca t i on ( A) a n d r eg r es s i on ( B) . T h e f i n a l mo d e l u s e s the Xcep t i on base with z o o m level - 15 s a t e l l i t e i ma g e s . 0.52 0.21 0.12 0.04 0.02 0.04 0.04 0.01 0 0 0.14 0.42 0.24 0.06 0.02 0.03 0.07 0.01 0.01 0 0.08 0.21 0.32 0.11 0.06 0.08 0.11 0.02 0.01 0 0.04 0.1 0.22 0.19 0.09 0.14 0.18 0.03 0.01 0 0.03 0.07 0.11 0.13 0.12 0.25 0.22 0.04 0.03 0 0.02 0.03 0.05 0.07 0.08 0.26 0.38 0.07 0.04 0 0.01 0.03 0.03 0.03 0.04 0.24 0.45 0.1 0.06 0.01 0.01 0.01 0.02 0.01 0.02 0.1 0.31 0.28 0.22 0.02 0 0 0 0 0.01 0.03 0.09 0.11 0.59 0.17 0 0 0 0 0 0 0.01 0.01 0.16 0.82 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Predicted Actual A 0 5 10 0 5 10 Predicted Actual 200 400 600 Count B Figure 3. Measured versus predicted North American PM 2 . 5 con- centrations in the test set for 10-category classiﬁcation (A) and regression (B). The ﬁnal model uses the Xception base with zoom lev el-15 satellite images. by the dashed ﬁt-line in Figure 2 B. In North America, the best continuous model (using the Xception base and zoom lev el-15 satellite images) had a v alidation RMSE of 0.72 µ g/m 3 (T able 2 ). This model achiev ed an RMSE of 0.74 µ g/m 3 on the test set with an R 2 value of 0.89. A plot of measured versus predicted values in the test set is shown for North American in Figure 3 B with the predictions gen- erally following the 1:1 line. Gradient-weighted class acti vation maps and ﬁlter visu- alizations were used to identify speciﬁc portions of images used for predictions and to examine patterns learned by models in con volution layers, respectiv ely . Class-acti v ation maps are presented in Figure 4 for ﬁve locations that were correctly classiﬁed across deciles of long-term PM 2 . 5 con- centrations. From this ﬁgure it is clear that localized por- tions of each satellite image are generally being used to make predictions; howe ver , the speciﬁc ground-lev el fea- tures that are playing the most important role remain un- clear . Continuous estimates of annual average PM 2 . 5 concen- trations from our global IMA GE-PM 2 . 5 model were highly correlated (R 2 =0.79; slope = 1.019, 95% CI: 1.014, 1.025) with those predicted by the Data Integration Model for Air Quality (DIMA Q) used by the Global Burden of Dis- ease (GBD) study (Figure 5 ). Agreement between the two models improved slightly when we compared our global IMA GE-PM 2 . 5 predictions to DIMA Q model estimates av- eraged ov er the entire sev en-year period tested (2010-2016): R 2 =0.81; slope=1.022 (95% CI: 1.012, 1.025). Figure 6 F i g ur e 3 . Gr a d i en t - w e i g h t e d c l a s s a c t i v a t i o n ma p s (G rad - CAM s ) f or i ma g e s c o r r e c t l y c l a s s i f i e d b y the final glo bal c a t e g o r i c a l mo d e l (using the Xcep t i on base and z o o m l e v e l - 1 3 s a t e l l i t e i ma g e s ) . T h e f i r s t c o l u mn i s t h e o r i g i n a l i n p u t i ma g e . T h e s e c o n d t h r o u g h s i x t h c o l u mn s a r e t h e G r a d - CAM s for cl a s s es 2, 4, 6, 8, a n d 10, r es p ect i vel y. N u me r i c a l v alues o n the to p - right indic ate the pre dic te d pro ba b i l i t y t h a t t h e i ma g e b e l o n g s t o t h e r e s p e c t i v e c l a s s . T he c itie s are Minneapo lis, U S ( C 2 ) ; K a n s a s C i t y , U S ( C 4 ) ; A ms t e r d a m, N L ( C 6 ) ; T e l A v i v , I L ( C 8 ) ; a n d B e i ji n g , C N ( C 1 0 ) . Figure 4. Gradient-weighted class activation maps (Grad-CAMs) for images correctly classiﬁed by the ﬁnal global categorical model (using the Xception base and zoom le vel-13 satellite im- ages). The ﬁrst column is the original input image. The second through sixth columns are the Grad- CAMs for classes 2, 4, 6, 8, and 10, respectively . Numerical values on the top-right indicate the predicted probability that the image belongs to the respecti ve class. The cities are Minneapolis, US (C2); Kansas City , US (C4); Amsterdam, NL (C6); T el A viv , IL (C8); and Beijing, CN (C10). shows the global distribution of dif ferences between long- term estimates of mean PM 2 . 5 concentrations (2010-2016) at the 9000 sites compared in this analysis. Agreement was best in North America, Europe, and China. The largest differences were observed in regions where ground lev el PM 2 . 5 values (used in DIMA Q) were based predominantly ( > 70% of values) on PM 10 data including India, T urkey , Romania, and Lithuania. 4. Discussion In this study we explored the use of deep conv olutional neural networks as an alternativ e, cost-ef fective means of estimating global variations in long-term a verage outdoor PM 2 . 5 concentrations. In particular , we examined this ap- proach across the global concentration range using ground monitoring data a vailable from the WHO as well as across the more limited concentration range in North America us- ing PM 2 . 5 predictions based on remote sensing [ 27 ]. T o our knowledge, this is the ﬁrst study to explore the use of deep con volutional neural networks in estimating global v aria- tions in annual av erage outdoor PM 2 . 5 concentrations and we noted sev eral interesting ﬁndings. First, the predictiv e performance of the global IMA GE- PM 2 . 5 model presented in this study was similar to that of current state-of-the-art Bayesian hierarchical models em- 0 25 50 75 100 0 25 50 75 100 IMAGE- P M 2.5 ( μ g/m 3 ) DIMAQ P M 2.5 ( μ g/m 3 ) 200 400 600 Count Figure 5. Relationship between annual average PM 2 . 5 concentra- tions predicted by DIMA Q PM 2 . 5 and IMA GE-PM 2 . 5 ploying combinations of remote sensing, chemical transport models, land use, and other information [ 20 , 21 ]. This is somewhat surprising gi ven the wealth of source/emissions information included in state-of-the-art models. Specif- ically , Shaddick et al. [ 20 , 21 ] reported a population- weighted RMSE v alue of 12.10 µ g/m 3 (R 2 =0.91) for the DIMA Q model used in the Global Burden of Disease Study whereas the IMA GE-PM 2 . 5 in our in vestigation achieved an RMSE value of 13.01 µ g/m 3 (R 2 =0.75) over a similar con- centration range. In addition, our direction comparison of DIMA Q and IMA GE-PM 2 . 5 predictions indicated a strong correlation between model estimates with a slope close to 1. Interestingly , the largest discrepancies between the two models occurred in regions where ground le vel PM 2 . 5 data were derived from PM 10 measurements. As the DIMA Q model incorporated PM 2 . 5 data deriv ed from PM10 mea- surements and the IMA GE-PM 2 . 5 model did not, this dif- ference may explain the lar ger discrepancies in these areas. The North American model presented here is not directly comparable to values reported by Shaddick et al. [ 20 , 21 ] because it covers a narro w exposure range and is in fact a model of modelled v alues [ 27 , 1 ]. Speciﬁcally , the total er- ror in our North American model compared to ground mea- surements would be the sum of errors in remote sensing es- timates (compared to measured ground-le vel PM 2 . 5 concen- trations) plus the additional error contrib uted by our model. Nev ertheless, our ﬁndings from North America are impor- tant in that the y suggest that deep con volutional neural net- works may be used to estimate spatial variations in long- term av erage PM 2 . 5 concentrations ov er a narrow range of concentrations. Moreov er , our ﬁndings indicate that deep F i g ur e 5 . Differences in pre dic te d lo ng - t e r m a v e r a g e P M 2. 5 con cen t r a t i on s ( 2 0 1 0 - 2016) using the IMA G E - PM 2. 5 mo d e l a n d t h e D I M A Q mo d e l . 5, 28 DIMAQ-IMAGE PM 2.5 ( μ g/m 3 ) > 30 ( 10, 30] (-10, 10] (-30, -10] ≤ -30 Figure 6. Differences in predicted long-term average PM 2 . 5 concentrations (2010-2016) using the IMA GE-PM 2 . 5 model and the DIMA Q model [ 20 , 21 ]. learning model estimates based on satellite images may offer an additional source of information for state-of-the- art Bayesian hierarchical models such as DIMA Q [ 20 , 21 ] which inte grate multiple complex data streams. In particu- lar , our IMA GE-PM 2 . 5 model may offer useful prior infor- mation for Bayesian models when ground le vel measure- ments or emissions data are not av ailable. One of the clear disadvantages of deep learning models is the lack of transparenc y in ho w model predictions are generated. Deep con volutional neural networks are some- what less opaque in that class activ ation maps and ﬁlter visualizations can be used to in vestigate image character- istics/patterns used to make predictions. Our results sug- gest that model predictions of ground-le vel PM 2 . 5 concen- trations were based on localized portions of satellite images and that both color and combinations of colors and geo- metric features ( i.e . lines/edges) were used in making pre- dictions. Ho we ver , it was not possible to identify speciﬁc aspects of the built en vironment that played an important role in generating model estimates. Interestingly , the zoom lev el of satellite images had an important impact on model performance and future studies should explore other image characteristics that could be optimized to reduce model er- rors. Like wise, as deep con volutional neural networks can hav e multiple inputs, it may be possible to incorporate ad- ditional ground-level information ( e.g . sources, businesses, population density , etc.) within each image to capture more detailed data on local sources of PM 2 . 5 and thus improve model performance. A second limitation of our analysis was that the timing of satellite images did not o verlap exactly with the timing of PM 2 . 5 measurements/estimates. This may have contrib uted to error to our predictions if major infrastructure changes were made between the time of PM 2 . 5 measurements and satellite imaging. Moreov er, our IMA GE-PM 2 . 5 model is also limited in that it does not contain a temporal compo- nent: predictions only change if the image changes. There- fore, the IMA GE-PM 2 . 5 model cannot be used to estimate short-term ( i.e . year to year) changes in outdoor PM 2 . 5 con- centrations and this limitation will be addressed in our on- going work. In summary , we dev eloped a ne w method of estimat- ing global variations in long-term average outdoor PM 2 . 5 concentrations using deep con volutional neural networks trained with large databases of satellite images and ground lev el measurements. Our ne w global IMA GE-PM 2 . 5 model relies on a single input (a satellite image) and can provide fast, cost-ef fecti ve estimates of PM 2 . 5 concentrations with predictiv e performance comparable to modern Bayesian hi- erarchical models currently used by the Global Burden of Disease Project [ 20 , 21 ]. These ﬁndings represent an im- portant adv ance in our current understanding of ho w global variations in long-term av erage PM 2 . 5 concentrations can be modelled for global health applications. The IMA GE- PM 2 . 5 model can be used as a stand-alone method of global exposure estimation or incorporated into more comple x hi- erarchical model structures. References [1] Atmospheric composition analysis group. satellite-deriv ed pm2.5 with gwr, north american, 2010-2012, at 35% rh [ug/m3]. http://fizz.phys.dal.ca/ ˜ atmos/ martin/?page_id=140 . Accessed: 2019-03-24. 2 , 5 [2] Christof Angermueller, T anel P ¨ arnamaa, Leopold Parts, and Oliv er Stegle. Deep learning for computational biology . Molecular systems biology , 2016. 1 [3] Michael Brauer , Markus Amann, Rick T Burnett, Aaron Cohen, Frank Dentener , Majid Ezzati, Sarah B Henderson, Michal Krzyzano wski, et al. Exposure assessment for esti- mation of the global burden of disease attributable to outdoor air pollution. En vir onmental science & technolo gy , 2012. 1 [4] John Bridle. Probabilistic interpretation of feedforward clas- siﬁcation network outputs, with relationships to statistical pattern recognition. Neur ocomputing: Algorithms, Ar chitec- tur es and Applications , 1990. 3 [5] Franc ¸ ois Chollet. Keras. https://keras.io . Accessed: 2019-03-24. 3 [6] Franc ¸ ois Chollet. Xception: Deep learning with depthwise separable con volutions. In Pr oceedings of the IEEE confer- ence on computer vision and pattern r ecognition , 2017. 3 [7] Angel Cruz-Roa, Hannah Gilmore, Ajay Basavanhally , Michael Feldman, Shridar Ganesan, Natalie NC Shih, John T omaszewski, Fabio A Gonz ´ alez, and Anant Madabhushi. Accurate and reproducible in vasi ve breast cancer detection in whole-slide images: A deep learning approach for quanti- fying tumor extent. Scientiﬁc reports , 2017. 1 [8] Timothy Dozat. Incorporating nesterov momentum into adam. 2016. 3 [9] Andre Estev a, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan M Swetter , Helen M Blau, and Sebastian Thrun. Dermatologist-lev el classiﬁcation of skin cancer with deep neural networks. Nature , 2017. 1 [10] V arun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek W u, Arunachalam Narayanaswamy , Subhashini V enu- gopalan, Kasumi W idner , T om Madams, Jorge Cuadros, et al. Dev elopment and v alidation of a deep learning algo- rithm for detection of diabetic retinopathy in retinal fundus photographs. J ama , 2016. 1 [11] David Kahle and Hadley W ickham. ggmap: Spatial visual- ization with ggplot2. The R J ournal , 2013. 2 [12] Philip J Landrigan, Richard Fuller , Nereus JR Acosta, Olu- soji Adeyi, et al. The lancet commission on pollution and health. The Lancet , 2018. 1 [13] Y ann LeCun, L ´ eon Bottou, Y oshua Bengio, Patrick Haf fner , et al. Gradient-based learning applied to document recogni- tion. Pr oceedings of the IEEE , 1998. 1 [14] Adyasha Maharana and Elaine Okanyene Nsoesie. Use of deep learning to e xamine the association of the b uilt en viron- ment with prev alence of neighborhood adult obesity . J AMA , 2018. 1 , 2 [15] G. Niemeyer . Geohash. http://geohash.org . Ac- cessed: 2019-03-24. 2 [16] W orld Health Or ganization. Who global ur - ban ambient air pollution database (update 2016). https://whoairquality.shinyapps.io/ AmbientAirQualityDatabase/ . Accessed: 2019- 03-24. 2 [17] Allison P Patton, Wig Zamore, Elena N Naumov a, Jonathan I Levy , Doug Brugge, and John L Durant. T ransferability and generalizability of regression models of ultraﬁne particles in urban neighborhoods in the boston area. En vir onmental sci- ence & technology , 2015. 1 [18] Patrick H Ryan and Grace K LeMasters. A re view of land- use re gression models for characterizing intraurban air pol- lution exposure. Inhalation toxicology , 2007. 1 [19] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna V edantam, Devi Parikh, and Dhruv Batra. Grad-cam: V isual explanations from deep networks via gradient-based localization. In CVPR , 2017. 3 [20] Gavin Shaddick, Matthew L Thomas, Heresh Amini, David Broday , Aaron Cohen, Joseph Frostad, Amelia Green, So- phie Gumy , Y ang Liu, Randall V Martin, et al. Data integra- tion for the assessment of population e xposure to ambient air pollution for global burden of disease assessment. En vir on- mental science & technology , 2018. 1 , 3 , 5 , 6 [21] Gavin Shaddick, Matthe w L Thomas, Amelia Green, Michael Brauer , Aaron Donkelaar , Rick Burnett, Howard H Chang, Aaron Cohen, Rita V an Dingenen, Carlos Dora, et al. Data inte gration model for air quality: a hierarchical ap- proach to the global estimation of e xposures to ambient air pollution. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 2018. 3 , 5 , 6 [22] Karen Simonyan and Andrew Zisserman. V ery deep conv o- lutional networks for large-scale image recognition. ICLR , 2015. 3 [23] Jeffrey D Stanaway , Ashkan Afshin, Emmanuela Gakidou, Stephen S Lim, Degu Abate, , et al. Global, regional, and national comparativ e risk assessment of 84 behavioural, en- vironmental and occupational, and metabolic risks or clus- ters of risks for 195 countries and territories, 1990–2017: a systematic analysis for the global burden of disease study 2017. The Lancet , 2018. 1 [24] Christian Szegedy , V incent V anhoucke, Sergey Ioffe, Jon Shlens, and Zbignie w W ojna. Rethinking the inception ar- chitecture for computer vision. In CVPR , 2016. 3 [25] R Development Core T eam. R: A language and en vironment for statistical computing . R Foundation for Statistical Com- puting, 2010. 2 [26] T . Tieleman and G. Hinton. Lecture 6.5—RmsProp: Di- vide the gradient by a running av erage of its recent magni- tude. COURSERA: Neural Networks for Machine Learning, 2012. 3 [27] Aaron V an Donkelaar , Randall V Martin, Michael Brauer, N Christina Hsu, Ralph A Kahn, Robert C Levy , Alex ei L yapustin, Andrew M Sayer , and David M Wink er . Global estimates of ﬁne particulate matter using a combined geophysical-statistical method with information from satel- lites, models, and monitors. En vir onmental science & tech- nology , 2016. 1 , 2 , 5 [28] Scott W eichenthal, Marianne Hatzopoulou, and Michael Brauer . A picture tells a thousand exposures: Opportunities and challenges of deep learning image analyses in exposure science and environmental epidemiology . Envir onment in- ternational , 2019. 1 [29] Scott W eichenthal, Keith V an Ryswyk, Alon Goldstein, Maryam Shekarrizf ard, and Marianne Hatzopoulou. Charac- terizing the spatial distribution of ambient ultraﬁne particles in toronto, canada: A land use regression model. Envir on- mental pollution , 2016. 1

Predicting Global Variations in Outdoor PM2.5 Concentrations using Satellite Images and Deep Convolutional Neural Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment