Imputation, based on the multivariate Normal distribution, of missing records of fine particulate matter in air
DOI:
https://doi.org/10.22517/23447214.24734Keywords:
Air Pollution, Little's Test, Mardia's Test, Missing Data, PM2.5, R2, RMSE, SimulationAbstract
We propose and evaluate two imputation methods for missing data of fine particulate matter on air. We assume a 24-variate normal distribution, one per weekday. From this distribution properties, the imputation methods are based on the conditional distributions for missing hours, starting from hours with available records. We estimate the weekday variance-covariance matrix using two methods: maximum likelihood (denoted by ∑), and shrinkage (denoted ∑*). Afterwards, we verify the missing completely at random (MCAR) assumption using the Little’s test, and also de multivariate normality using the Mardia´s test. Finally, we evaluate the proposed methods through a simulation trial, generating suitable scenarios for this kind of problems. We use two evaluation criteria: the coefficient of determination (R2) and the square root of the mean square error (RMSE). We use a 2018 data set from Cali, Colombia, to illustrate how to use the proposed methods. We reach R2 values of around 0.70 and 0.49, and RMSE values of around 5.7 and 8.5, for the methods based on ∑ and ∑*, respectively.
Downloads
References
[1] OMS, "9 de cada 10 personas en todo el mundo respiran aire contaminado, pero más países están tomando medidas", Organización Mundial de la Salud. Ginebra Suiza, Comunicado de prensa. Disponible: https://www.who.int/es/news/item/02-05-2018-9-out-of-10-people-worldwide-breathe-polluted-air-but-more-countries-are-taking-action, consultada en marzo 11, 2023
[2] IQAir, "PM2,5", Actualización más reciente septiembre 22, 2015. Disponible: https://www.iqair.com/newsroom/pm2-5 .
[3] Cao, Junji and Chow, Judith and Watson, John and Lee, Shuncheng, "A brief history of PM2.5 and its adverse effects". Aerosol and Air Quality Research, Ene. 2013. DOI: https://doi.org/10.4209/aaqr.2012.11.0302
[4] Observatorio Nacional de Salud, "Carga de enfermedad ambiental en Colombia". Instituto Nacional de Salud (INS), pág. 96 Bogotá D.C. Nov. 2018. Disponible: https://www.ins.gov.co/Noticias/Paginas/Informe-Carga-de-Enfermedad-Ambiental-en-Colombia.aspx
[5] M. E. Quinteros, S. Lu, C. Blazquez, J. P. Cárdenas-R Ossa, X., Delgado-Saborit, J. M., Harrison, R. M. and Ruiz-Rudolph, P., "Use of data imputation tools to reconstruct incomplete air quality datasets: A case-study in Temuco, Chile", Atmospheric Environment 200, pp. 40-49. 2019. DOI: https://doi.org/10.1016/j.atmosenv.2018.11.053
[6] Pope III, C. A., J. B., Anderson, J. L., Cannon, J. B., Hales, N. M., Meredith, K. G., Le, V. and Horne, B. D., "Short-Term" exposure to fine particulate matter air pollution is preferentially associated with the risk of ST-Segment elevation acute coronary events". Journal of the American heart association. 2015. DOI: https://doi.org/10.1161/JAHA.115.002506
[7] Beyea, J., Stellman, S. D., Teitelbaum, S., Mordukovich, I. and Gammon, M. D. "Imputation method for lifetime exposure assessment in air pollution epidemiologic studies", Environmental Health. 2013. DOI: https://doi.org/10.1186/1476-069X-12-62
[8] M. Lee, P. Koutrakis, B. Coull, I. Kloog. and J. Schwartz,, "Acute effect of fine particulate matter on mortality in three Southeastern states from 2007-2011", Journal of exposure science & environmental epidemiology, pp 173-179. 2015. DOI: https://doi.org/10.1038/jes.2015.47
[9] S. M. Taghavi-Shahri, A. Fassó, B. Mahaki and H. Amini, "Concurrent spatiotemporal daily land use regression modeling and missing data imputation of fine particulate matter using distributed space time expectation maximization", bioRxiv. DOI: https://doi.org/10.1101/354852
[10] J. Céspedes., J. Cuero and F. Hernández "Metodología para seguir las concentraciones de aerosoles atmosféricos usando técnicas de teledetección", Universidad del Valle, Colombia. Sep. 2015.
[11] L. C. Chien, Y. A. Chen and H. L. Yu, "Lagged Influence of fine particulate matter and geographic disparities on clinic visits for children's asthma in Taiwan". International journal of environmental research and public health. Abr. 2018. DOI: https://doi.org/10.3390/ijerph15040829
[12] D. Allison, "Quantitative Applications in the Social Sciences: Missing data". Univ. of Pennsylvania, Pensylvania P, USA, 2002. DOI: https://doi.org/10.4135/9781412985079
[13] N. A. Zakira, and M. N. Noor, "Imputation methods for filling missing data in urban air pollution data for Malaysia". Urbanism, Arhitectură. Construcţii, Malaysia, Vol 9, No. 2, 2018.
[14] A. Caicedo and C. Jiménez, "Imputación basada en análisis de datos funcionales de observaciones faltantes de contaminación atmosférica por partículas finas suspendidas en el aire (PM2,5)". Universidad del Valle, Colombia. 2016.
[15] A. Otero, and M. Presiga. "Evaluación de un método de imputación basado en el Análisis de Datos Funcionales para los registros de PM2.5 en la ciudad de Cali". Trabajo de grado en Estadística, Universidad del Valle, Colombia. Dic. 2019.
[16] G. G. Fernando. "Estimación de matrices de covarianzas: nuevas perspectivas", Universidad Nacional de Educación a Distancia, España, 2014. Disponible: http://e-spacio.uned.es/fez/eserv/bibliuned:masterMatavanz-Fgodino/Documento.pdf
[17] J. Schäfer and K. Strimmer. "A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics". Statistical applications in genetics and molecular biology, vol. 4, Feb. 2005. DOI: https://doi.org/10.2202/1544-6115.1175
[18] J. Villaseñor and E. Gonzales, "A Generalization of Shapiro-Wilk's Test for Multivariate Normality". Communication in Statistics - Theory and Methods, 2009. DOI: https://doi.org/10.1080/03610920802474465
[19] C. K., Enders "Applied Missing Data Analysis". Univ. of Pennsylvania, New York, NY, USA, 2010. Disponible: http://hsta559s12.pbworks.com/w/file/fetch/52112520/enders.applied
[20] Rubin and B. Donald, "Inference and missing data". Biometrika vol. 63 pp. 581-592. Oxford University Press, 1976. DOI: https://doi.org/10.2307/2335739
[21] Little and J. A. Roderick, "A Test of Missing Completely at Random for Multivariate Data with Missing Values", Journal of the American Statistical Association, vol. 83, pp. 1198 - 1202. Dic. 1988. DOI: https://doi.org/10.1080/01621459.1988.10478722
[22] Mardia and V. Kanti, "Measures of multivariate skewness and kurtosis with applications", Biometrika vol. 57, no. 3, pp. 519-530, Dic. 1, 1970. DOI: https://doi.org/10.1093/biomet/57.3.519
[23] DAGMA, "Sistema de Vigilancia de Calidad del Aire de Cali - SVCAC" Cali, Colombia, acceso: Julio 2020.
[24] R Core Team, "R: A Language and Environment for Statistical Computing" Viena, Austria. 2020 URL: https://www.R-project.org
[25] RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL: http://www.rstudio.com/.
[26] J. Schäfer and R. Opgen-Rhein and V, Zuber and M. Ahdesmaki and P.D. Silva and K. Strimmer (Maintainer). "Package corpcor". R Package Versión 1.6.9. Ene. 4, 2017. DOI: https://doi.org/10.2202/1544-6115.1175
[27] H. Tsukuma and T. Kubokawa, "Shrinkage Estimation for Mean and Covariance Matrices". Spinger, 2020. DOI: https://doi.org/10.1007/978-981-15-1596-5
Downloads
-
Vistas(Views): 362
- PDF (Español (España)) Descargas(Downloads): 173
- HTML (Español (España)) Descargas(Downloads): 5
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Scientia et Technica
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyrights
The journal is free open access. The papers are published under the Creative Commons Attribution / Attribution-NonCommercial-NoDerivatives 4.0 International - CC BY-NC-ND 4.0 license. For this reason, the author or authors of a manuscript accepted for publication will yield all the economic rights to the Universidad Tecnológica of Pereira free of charge, taking into account the following:
In the event that the submitted manuscript is accepted for publication, the authors must grant permission to the journal, in unlimited time, to reproduce, to edit, distribute, exhibit and publish anywhere, either by means printed, electronic, databases, repositories, optical discs, Internet or any other required medium. In all cases, the journal preserves the obligation to respect, the moral rights of the authors, contained in article 30 of Law 23 of 1982 of the Government Colombian.
The transferors using ASSIGNMENT OF PATRIMONIAL RIGHTS letter declare that all the material that is part of the article is entirely free of copyright. Therefore, the authors are responsible for any litigation or related claim to intellectual property rights. They exonerate of all responsibility to the Universidad Tecnológica of Pereira (publishing entity) and the Scientia et Technica journal. Likewise, the authors accept that the work presented will be distributed in free open access, safeguarding copyright under the Creative Commons Attribution / Recognition-NonCommercial-NoDerivatives 4.0 International - https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es license.