Home / Regular Issue / JST Vol. 29 (4) Oct. 2021 / JST-2776-2021

 

Evaluation of Single Missing Value Imputation Techniques for Incomplete Air Particulates Matter (PM10) Data in Malaysia

Zuraira Libasin, Wan Suhailah Wan Mohamed Fauzi, Ahmad Zia ul-Saufie, Nur Azimah Idris and Noor Azizah Mazeni

Pertanika Journal of Science & Technology, Volume 29, Issue 4, October 2021

DOI: https://doi.org/10.47836/pjst.29.4.46

Keywords: Air pollution, imputation, linear interpolation, missing data, performance indicator

Published on: 29 October 2021

The missing value in the dataset has always been the critical issue of accurate prediction. It may lead to a misleading understanding of the scenario of air pollution. There might only be a small number of missing (5% to 10%) answers to each problem, but the missing details may vary. This research is focused mainly on solving long gap missing data. Single missing value imputation means replacing blank space in the monitoring dataset from chosen Department of Environment (DoE) monitoring station with the calculated value from the best technique for long gap hours. The variable that is mainly being a monitor is PM10. The technique focused on this research is the single imputation technique. Furthermore, this technique was tested on the Tanjung Malim monitoring station dataset by fitting with five performance indicators. The result was compared with the previous study, whether it is the best used for long gap hour data. Four stages need to be followed to complete this research. The steps are data acquisitions, characteristic analysis of missing value, single imputation approach, verification of approach and suggestion of the best technique. This research used four existing imputation techniques: series mean (SM), mean of nearby points (MNP), linear trend (LT), and linear interpolation (LIN). This research shows that the interpolation technique is the best technique to apply particulate matter missing data replacement with the least mean absolute error and better performance accuracy.

  • Ali, S., & Dacey, S. (2017). Technical review: performance of existing imputation methods for missing data in SVM ensemble creation. International Journal of Data Mining & Knowledge Management Process (IJDKP), 7(6), 75-91. https://doi.org/10.5121/ijdkp.2017.7606

  • Anh, N. T. N., Kim, S. H., Yang, H. J., & Kim, S. H. (2011). Hidden dynamic learning for long-interval consecutive missing values reconstruction in EEG time series. In 2011 IEEE International Conference on Granular Computing (pp. 653-658). IEEE Publishing. https://doi.org 10.1109/grc.2011.6122674

  • Cokluk, O., & Kayri, M. (2011). The effects of methods of imputation for missing values on the validity and reliability of scales. Educational Sciences: Theory and Practice, 11(1), 303-309.

  • De Leeuw, J., & Meijer, E. (2008). Introduction to multilevel analysis. In Handbook of multilevel analysis (pp. 1-75). Springer. https://doi.org/10.1007/978-0-387-73186-5_1

  • Department of Environment. (2018). Malaysia environmental quality report 2018. DoE Publication.

  • Hirabayashi, S., & Kroll, C. N. (2017). Single imputation method of missing air quality data for i-tree eco analyses in the conterminous United States. Retrieved January 1, 2021, from https://www.itreetools.org/documents/51/Single_imputation_method_of_missing_air_quality_data_for_i-Tree_Eco_analyses_in_the_conterminous_United_States.pdf

  • Latif, M. T., Othman, M., Idris, N., Juneng, L., Abdullah, A. M., Hamzah, W. P., Khan, M. F., Sulaiman, N. M. N., Jewaratnam, J., Aghamohammadi, N., Sahani, M., Xiang, C. J., Ahamad, F., Amil, N., Darus, M., Varkkey, H., Tangang, F., & Jaafar, A. B. (2018). Impact of regional haze towards air quality in Malaysia: A review. Atmospheric Environment, 177, 28-44. https://doi.org/10.1016/j.atmosenv.2018.01.002

  • Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). John Wiley & Sons.

  • Noor, N. M., Yahaya, A. S., Ramli, N. A., & Abdullah, M. M. A. (2006). The replacement of missing values of continuous air pollution monitoring data using mean top bottom imputation technique. Journal of Engineering Research & Education, 3, 96-105.

  • Norazian, M. N., Shukri, Y. A., & Azam, R. N. (2008). Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia, 34(3), 341-345. http://doi.org/10.2306/scienceasia1513-1874.2008.34.341

  • Plaia, A., & Bondi, A. L. (2006). Single imputation method of missing values in environmental pollution data sets. Atmospheric Environment, 40(38), 7316-7330. https://doi.org/10.1016/j.atmosenv.2006.06.040

  • Sukatis, F. F., Noor, N. M., Zakaria, N. A., Ul-Saufie, A. Z., & Suwardi, A. (2019). Estimation of missing values in air pollution dataset by using various imputation methods. International Journal of Conservation Science, 10(4), 791-804

  • Ul-Saufie, A. Z., Yahya, A. S., Ramli, N. A., & Hamid, H. A. (2011). Comparison between multiple linear regression and feed forward back propagation neural network models for predicting PM10 concentration level based on gaseous and meteorological parameters. International Journal of Applied, 1(4), 42-49.

  • Ward, N. (2019). Air pollution. Retrieved January 1, 2021, from https://prezi.com/wyokg7n0uuru/air-pollution/

  • Zainudin, M. L., & Noor, N. M. (2009, June 20-22). The single interpolation and statistical technique: A review of application in air quality data sets. In Proceedings of Malaysian Technical Universities Conference on Engineering and Technology (MUCEET2009) (pp. 1-4). Pahang, Malaysia

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST-2776-2021

Download Full Article PDF

Share this article

Recent Articles