Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods

Rahmati O; Choubin B; Fathabadi A; Coulon F; Soltani E; Shahabi H; Mollaefar E; Tiefenbacher J; Cipullo S; Ahmad BB; Tien Bui D

doi:10.1016/j.scitotenv.2019.06.320

Predicting uncertainty of machine learning models for modelling nitrate pollution of groundwater using quantile regression and UNEEC methods

Rahmati O ¹ , Choubin B ² , Fathabadi A ³ , Coulon F ⁴ , Soltani E ⁵ , Shahabi H ⁶ Show all authors , Mollaefar E ⁷ , Tiefenbacher J ⁸ , Cipullo S ⁴ , Ahmad BB ⁹ , Tien Bui D ¹⁰

Affiliations

¹ Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam. Electronic address: [email protected]
² Faculty of Natural Resources, University of Tehran, Karaj, Iran
³ Department of Range and Watershed Management, Gonbad Kavous University, Gonbad Kavous, Golestan Province, Iran
⁴ Cranfield University, School of Water, Energy and Environment, Cranfield MK43 0AL, UK
⁵ Department of Natural Resources and Environmental Engineering, College of Agriculture, Shiraz University, Shiraz, Iran
⁶ Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran
⁷ Department of Natural Resources and Watershed Management of Golestan Province, Iran
⁸ Department of Geography, Texas State University, San Marcos, TX 78666, USA
⁹ Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), 81310 Johor Bahru, Malaysia
¹⁰ Institute of Research and Development, Duy Tan University, Da Nang 550000, Viet Nam. Electronic address: [email protected]

Sci Total Environ, 2019 Oct 20;688:855-866.

PMID: 31255823 DOI: 10.1016/j.scitotenv.2019.06.320

Abstract

Although estimating the uncertainty of models used for modelling nitrate contamination of groundwater is essential in groundwater management, it has been generally ignored. This issue motivates this research to explore the predictive uncertainty of machine-learning (ML) models in this field of study using two different residuals uncertainty methods: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Prediction-interval coverage probability (PICP), the most important of the statistical measures of uncertainty, was used to evaluate uncertainty. Additionally, three state-of-the-art ML models including support vector machine (SVM), random forest (RF), and k-nearest neighbor (kNN) were selected to spatially model groundwater nitrate concentrations. The models were calibrated with nitrate concentrations from 80 wells (70% of the data) and then validated with nitrate concentrations from 34 wells (30% of the data). Both uncertainty and predictive performance criteria should be considered when comparing and selecting the best model. Results highlight that the kNN model is the best model because not only did it have the lowest uncertainty based on the PICP statistic in both the QR (0.94) and the UNEEC (in all clusters, 0.85-0.91) methods, but it also had predictive performance statistics (RMSE = 10.63, R2 = 0.71) that were relatively similar to RF (RMSE = 10.41, R2 = 0.72) and higher than SVM (RMSE = 13.28, R2 = 0.58). Determining the uncertainty of ML models used for spatially modelling groundwater-nitrate pollution enables managers to achieve better risk-based decision making and consequently increases the reliability and credibility of groundwater-nitrate predictions.

* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.

MeSH terms

Similar publications