Satellite aerosol optical depth prediction using data mining of climate parameters

Document Type : Full length article


1 PhD student, Department of Remote Sensing and GIS, Faculty of Geography, University of Tehran, Tehran, Iran

2 Assistant Professor, Department of Remote Sensing and GIS, Faculty of Geography, University of Tehran, Tehran, Iran

3 Msc student, Department of Remote Sensing and GIS, Faculty of Humanities, Tarbiat Modares University, Tehran, Iran


Tropospheric aerosol particles play an important role in the Earth's radiative energy balance both directly by scattering and absorbing solar radiation and indirectly by modulating the microphysical and radiative properties of clouds. Aerosol optical depth (AOD) based on satellite remote sensing data is a quantitative estimate of the amount of aerosol in the atmosphere and can be used as an indicator of aerosol particle concentration. In general, the review of previous studies indicates the high importance of remote sensing aerosol products in modeling the spatial-temporal patterns of dust storms and in particular the identification of dust sources. One advantage of using satellite AOD for identifying dust events is that it can provide satisfactory results in arid areas with relatively little cloud cover. The presence of clouds in the sky also severely limits AOD terrestrial and satellite measurements. Thus, AOD datasets sometimes have a gap due to factors such as cloudiness. Since the possibility of monitoring and measuring aerosols in cloudy conditions is limited, the use of proxy datasets to fill the gap is also another advantage. In this regard, several studies based on the analysis of satellite data have emphasized the association between climatic parameters and dust events (specifically AOD) in different regions. Therefore, considering the relationship between climatic parameters and AOD, these parameters can be used as a proxy data set to estimate AOD values for areas without data or with cloud cover. Moreover, AOD values can be predicted using the predicted values of climatic parameters. Accordingly, in order to achieve reliable AOD prediction results, it is necessary to use a generalizable approach that can model the complex relationships between large data sets. For this purpose, an efficient data mining algorithm called M5P was adopted to analyze and extract the relationships between climatic parameters and AOD to obtain predictive models. The M5P algorithm is a combination of tree and regression models with capabilities such as high prediction accuracy and ease of result interpretation.
Materials and methods
In this study, M5P data mining algorithm, which is based on tree structure and multivariate linear regression analysis, was used to derive AOD predictive models based on climatic parameters. Accordingly, a spatial database of remote sensing time series data related to four climatic parameters (as independent variables) including surface air temperature (SAT), precipitation (P), surface relative humidity (SRH) and wind speed (WS), and AOD (as dependent variable) was generated. WEKA[1] was used to implement the M5P model. After analyzing the relationships between independent and dependent variables through the tree model structure and linear multivariate regression, AOD predictive rules were extracted. Statistical indicators including Pearson Correlation Coefficient, Mean Absolute Error (MAE), and Root Mean Square Error (RMSE) were used to validate the linear predictive models.   
Results and discussion
After pre-processing the time series data of climatic parameters and AOD as training data set, the input independent and dependent variables of the M5P were defined. Implementation steps of the M5P algorithm were performed in WEKA, including homogenization of independent input data sets by forming decision-making trees based on a series of "if-then" rules, multivariate linear regression analysis in homogeneous classes, and finally validation of the model results. Thus, a total of four linear models (LM) or predictive rules for estimating AOD based on the values of climatic parameters were extracted. Finally, the AOD value can be estimated based on the thresholds defined by the M5P algorithm by placing the values of climatic parameters in the obtained linear models. The obtained linear models can predict AOD values in different conditions (based on climatic parameters). Validation of the results of the M5P algorithm was performed based on correlation analysis between input variables and the evaluation of prediction errors through MAE and RMSE statistics, which showed the acceptable performance and accuracy of linear models in AOD prediction. Given the dynamics of aerosol particles (especially dust) and their transportability by the wind even to very far distances from their source of emission, it is likely that the amount of AOD for a pixel, as measured by a satellite sensor, does not exactly belong to the same location on earth. Therefore, the prediction error of the models may be due to the transportability of the aerosol particles. This may be a reason for possible discrepancies, especially considering the strong correlation between AOD and climatic parameters. Because a dust storm arising from a source may have no relation with the values of the climatic parameters at the destination.
Aerosol optical depth (AOD), as an indicator of the state of the atmospheric aerosol, is of great importance for studies on dust storms. Access to AOD data is restricted in some parts of the world and some seasons due to limitations such as cloud cover. On the other hand, it is important to be aware of future spatial-temporal patterns of dust storms in order to adopt crisis management measures.  
This study evaluated the capability of M5P data mining algorithm in AOD prediction based on climatic parameters. Here, four linear predictive models were extracted based on inductive learning and a set of "if-then" rules. Predictive models were extracted and validated using a remote sensing time series dataset for Ahvaz, Iran. Using the obtained predictor linear models in this study, it is possible to make an acceptable estimation of AOD in areas with restrictions on access to AOD. Furthermore, it is possible to estimate the future spatial-temporal patterns of AOD using the predicted values of climatic parameters.
Dust storms generally occur as a function of a wide range of environmental conditions, including atmospheric properties and surface parameters such as vegetation, soil moisture, and soil texture. With this background, merely considering the atmospheric conditions and their impacts on the spatial-temporal patterns of AOD may fail to produce the desired results. Therefore, future AOD modeling studies are recommended to use ground surface parameters in addition to climatic parameters, which are mostly indicators of the atmospheric condition. This can increase the accuracy of linear models for predicting AOD.
[1] Waikato Environment for Knowledge Analysis


Main Subjects

Alsultanny, Y. (2020). Machine Learning by Data Mining REPTree and M5P for Predicating Novel Information for PM10. Cloud Computing and Data Science, 40-48.
Andina, D. and Pham, D. T. (2007). Computational intelligence: For engineering and manufacturing. Springer.
Bellinger, C.; Jabbar, M. S. M.; Zaïane, O. and Osornio-Vargas, A. (2017). A systematic review of data mining and machine learning for air pollution epidemiology. BMC Public Health, 17(1): 1-19.
Chu, D. A.; Kaufman, Y. J.; Ichoku, C.; Remer, L. A.; Tanré, D. and Holben, B. N. (2002). Validation of MODIS aerosol optical depth retrieval over land. Geophysical Research Letters, 29(12): MOD2-1.
Darvishi Boloorani, A.; Papi, R.; Soleimani, M.; Karami, L.; Amiri, F. and Samany, N. N. (2021). Water bodies changes in Tigris and Euphrates basin has impacted dust storms phenomena. Aeolian Research, 50: 100698.
Du, M.; Yonemura, S.; Den, H.; Shen, Z. and Shen, Y. (2009). Relationship between the climate change and dust storm occurrence in China. J. Arid Land Stud, 19(1): 149-152.
Frank, E.; Wang, Y.; Inglis, S.; Holmes, G. and Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1): 63-76.
Gholami, H.; Mohamadifar, A. and Collins, A. L. (2020). Spatial mapping of the provenance of storm dust: Application of data mining and ensemble modelling. Atmospheric Research, 233: 104716.
Ghorbanzadeh, O.; Rostamzadeh, H.; Blaschke, T.; Gholaminia, K. and Aryal, J. (2018). A new GIS-based data mining technique using an adaptive neuro-fuzzy inference system (ANFIS) and k-fold cross-validation approach for land subsidence susceptibility mapping. Natural Hazards, 94(2): 497-517.
Ginoux, P.; Garbuzov, D. and Hsu, N. C. (2010). Identification of anthropogenic and natural dust sources using Moderate Resolution Imaging Spectroradiometer (MODIS) Deep Blue level 2 data. Journal of Geophysical Research: Atmospheres, 115(D5).
Ginoux, P.; Prospero, J. M.; Gill, T. E.; Hsu, N. C. and Zhao, M. (2012). Global‐scale attribution of anthropogenic and natural dust sources and their emission rates based on MODIS Deep Blue aerosol products. Reviews of Geophysics, 50(3).
Goswami, S.; Chakraborty, S.; Ghosh, S.; Chakrabarti, A. and Chakraborty, B. (2018). A review on application of data mining techniques to combat natural disasters. Ain Shams Engineering Journal, 9(3): 365-378.
Goudie, A. S. (2009). Dust storms: Recent developments. Journal of Environmental Management, 90(1): 89-94.
Hsu, N. C.; Tsay, S.-C.; King, M. D. and Herman, J. R. (2004). Aerosol properties over bright-reflecting source regions. IEEE Transactions on Geoscience and Remote Sensing, 42(3): 557-569.
Kaufman, Y. J.; Tanré, D.; Remer, L. A.; Vermote, E. F.; Chu, A. and Holben, B. N. (1997). Operational remote sensing of tropospheric aerosol over land from EOS moderate resolution imaging spectroradiometer. Journal of Geophysical Research: Atmospheres, 102(D14): 17051-17067.
King, M. D.; Kaufman, Y. J.; Menzel, W. P. and Tanre, D. (1992). Remote sensing of cloud, aerosol, and water vapor properties from the moderate resolution imaging spectrometer(MODIS). IEEE Transactions on Geoscience and Remote Sensing, 30(1): 2-27.
Lee, S.; Lee, M.-J. and Jung, H.-S. (2017). Data mining approaches for landslide susceptibility mapping in Umyeonsan, Seoul, South Korea. Applied Sciences, 7(7): 683.
Li, C.; Lau, A.-H.; Mao, J. and Chu, D. A. (2005). Retrieval, validation, and application of the 1-km aerosol optical depth from MODIS measurements over Hong Kong. IEEE Transactions on Geoscience and Remote Sensing, 43(11): 2650-2658.
Nabavi, S. O.; Haimberger, L. and Samimi, C. (2016). Climatology of dust distribution over West Asia from homogenized remote sensing data. Aeolian Research, 21: 93-107.
Najafi, M. S.; Khoshakhllagh, F.; Zamanzadeh, S. M.; Shirazi, M. H.; Samadi, M. and Hajikhani, S. (2014). Characteristics of TSP Loads during the Middle East Springtime Dust Storm (MESDS) in Western Iran. Arabian Journal of Geosciences, 7(12): 5367-5381.
Namdari, S.; Karimi, N.; Sorooshian, A.; Mohammadi, G. and Sehatkashani, S. (2018). Impacts of climate and synoptic fluctuations on dust storm activity over the Middle East. Atmospheric Environment, 173: 265-276.
Oprea, M.; Dragomir, E. G.; Popescu, M. and Mihalache, S. F. (2016). Particulate matter air pollutants forecasting using inductive learning approach. Rev. Chim, 67: 2075-2081.
Pal, M. (2006). M5 model tree for land cover classification. International Journal of Remote Sensing, 27(4): 825-831.
Prospero, J. M.; Ginoux, P.; Torres, O.; Nicholson, S. E. and Gill, T. E. (2002). Environmental characterization of global sources of atmospheric soil dust identified with the Nimbus 7 Total Ozone Mapping Spectrometer (TOMS) absorbing aerosol product. Reviews of Geophysics, 40(1): 1-2.
Rahimikhoob, A.; Asadi, M. and Mashal, M. (2013). A comparison between conventional and M5 model tree methods for converting pan evaporation to reference evapotranspiration for semi-arid region. Water Resources Management, 27(14): 4815-4826.
Rahmati, O.; Mohammadi, F.; Ghiasi, S. S.; Tiefenbacher, J.; Moghaddam, D. D.; Coulon, F.; ... and Bui, D. T. (2020). Identifying sources of dust aerosol using a new framework based on remote sensing and modelling. Science of the Total Environment737: 139508.
Samadi, M.; Darvishi Boloorani, A.; Alavipanah, S.; Mohamadi, H. and Najafi, M. (2014). Global dust Detection Index (GDDI); a new remotely sensed methodology for dust storms detection. Journal of Environmental Health Science and Engineering, 12(1): 20.
Sayer, A. M.; Hsu, N. C.; Bettenhausen, C. and Jeong, M. (2013). Validation and uncertainty estimates for MODIS Collection 6 “Deep Blue” aerosol data. Journal of Geophysical Research: Atmospheres, 118(14): 7864-7872.
Schober, P.; Boer, C. and Schwarte, L. A. (2018). Correlation coefficients: appropriate use and interpretation. Anesthesia & Analgesia, 126(5): 1763-1768.
Shaban, K. B.; Kadri, A. and Rezk, E. (2016). Urban air pollution monitoring system with forecasting models. IEEE Sensors Journal, 16(8): 2598-2606.
Siwek, K. and Osowski, S. (2016). Data mining methods for prediction of air pollution. International Journal of Applied Mathematics and Computer Science, 26(2): 467-478.
Srinivasan, D. B. and Mekala, P. (2014). Mining social networking data for classification using reptree. International Journal of Advance Research in Computer Science and Management Studies, 2(10).
Tan, F.; San Lim, H.; Abdullah, K. and Holben, B. (2016). Estimation of aerosol optical depth at different wavelengths by multiple regression method. Environmental Science and Pollution Research, 23(3): 2735-2748.
Trigo, R. M.; Gouveia, C. M. and Barriopedro, D. (2010). The intense 2007–2009 drought in the Fertile Crescent: Impacts and associated atmospheric circulation. Agricultural and Forest Meteorology, 150(9): 1245-1257.
Willmott, C. J. and Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82.
Witten, I. H. and Frank, E. (2002). Data mining: practical machine learning tools and techniques with Java implementations. Acm Sigmod Record, 31(1): 76-77.
Xu, H. and  Deng, Y. (2017). Dependent evidence combination based on shearman coefficient and pearson coefficient. IEEE Access, 6: 11634-11640.
Yu, Y.; Kalashnikova, O. V.; Garay, M. J.; Lee, H. and Notaro, M. (2018). Identification and characterization of dust source regions across North Africa and the Middle East using MISR satellite observations. Geophysical Research Letters, 45(13): 6690-6701.
Zhao, C.; Dabu, X. and Li, Y. (2004). Relationship between climatic factors and dust storm frequency in Inner Mongolia of China. Geophysical Research Letters, 31(1).
Zhou, Z.-H. (2003). Three perspectives of data mining. Artificial Intelligence, 143(1): 139-146.
Volume 53, Issue 3
December 2021
Pages 319-333
  • Receive Date: 06 February 2021
  • Revise Date: 04 July 2021
  • Accept Date: 17 July 2021
  • First Publish Date: 24 July 2021