METHODS: This study proposed an end-to-end air quality predictive model for smart city applications, utilizing four machine learning techniques and two deep learning techniques. These include Ada Boost, SVR, RF, KNN, MLP regressor and LSTM. The study was conducted in four different urban cities in Selangor, Malaysia, including Petaling Jaya, Banting, Klang, and Shah Alam. The model considered the air quality data of various pollution markers such as PM2.5, PM10, O3, and CO. Additionally, meteorological data including wind speed and wind direction were also considered, and their interactions with the pollutant markers were quantified. The study aimed to determine the correlation variance of the dependent variable in predicting air pollution and proposed a feature optimization process to reduce dimensionality and remove irrelevant features to enhance the prediction of PM2.5, improving the existing LSTM model. The study estimates the concentration of pollutants in the air based on training and highlights the contribution of feature optimization in air quality predictions through feature dimension reductions.
RESULTS: In this section, the results of predicting the concentration of pollutants (PM2.5, PM10, O3, and CO) in the air are presented in R2 and RMSE. In predicting the PM10 and PM2.5concentration, LSTM performed the best overall high R2values in the four study areas with the R2 values of 0.998, 0.995, 0.918, and 0.993 in Banting, Petaling, Klang and Shah Alam stations, respectively. The study indicated that among the studied pollution markers, PM2.5,PM10, NO2, wind speed and humidity are the most important elements to monitor. By reducing the number of features used in the model the proposed feature optimization process can make the model more interpretable and provide insights into the most critical factor affecting air quality. Findings from this study can aid policymakers in understanding the underlying causes of air pollution and develop more effective smart strategies for reducing pollution levels.