Volume 4, Issue 1, February 2018, Page: 24-31
Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries
Agnes Njambi Wanjau, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Samuel Musili Mwalili, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Oscar Ngesa, Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya
Received: Feb. 19, 2018;       Accepted: Mar. 19, 2018;       Published: Mar. 23, 2018
DOI: 10.11648/j.ijdsa.20180401.15      View  1045      Downloads  53
Abstract
Count data has been witnessed in a wide range of disciplines in real life. Poisson, negative binomial (NB), zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) are some of the regression models proposed to model data with count response. All the count models are potential candidates that can model count data, but there is no means to choose the one that would perform better than the others. This study aimed to assess the count models mentioned earlier at various degrees of zero inflation. Datasets were simulated with ZIP distribution with different conditions of zero inflation (0%, 2%, 5%, 10%, 15%, 20%, 30% and 40%). Poisson and NB were observed to predict regression coefficients well when the proportion of zero is below 15%. The two ZIM performed well at higher degrees of zero inflation; beyond 15% for ZIP and 20% for ZINB. Exploratory examination of the caries data revealed a zero inflation below 15%, that is, 3.23%. Analysis of early childhood caries (ECC) data among 3-6 year old children who visited Lady Northey Dental Clinic was then performed with Poisson and NB. Akaike information criterion (AIC) test was used to compare all the competing models both under simulation and with real data. Poisson yielded lower AIC values at lower zero inflation rates as compared to other three models. ZIP had the lowest AIC value at 10%, 15%, 20%, 30% and 40% levels of zero inflation. NB model had the lowest AIC value when real data was analyzed. Education level of the father- primary school completed, chewing gum several times a week, Feeding habit jam several times a day, Feeding habit juice every day, Feeding habit soda every day and Feeding habit sweets several times a week were found to be significant factors causing ECC.
Keywords
Simulation, RMSE, Competing Models
To cite this article
Agnes Njambi Wanjau, Samuel Musili Mwalili, Oscar Ngesa, Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries, International Journal of Data Science and Analysis. Vol. 4, No. 1, 2018, pp. 24-31. doi: 10.11648/j.ijdsa.20180401.15
Copyright
Copyright © 2018 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reference
[1]
Agresti, A. (2003). Categorical data analysis (Vol. 482). John Wiley & Sons.
[2]
Benson, N. F. (2018). Introduction to a Special Issue on Simulation Studies as a Means of Informing Psychoeducational Testing and Assessment. Journal of Psychoeducational Assessment, 36(1), 3-6.
[3]
Beaujean, A. A. (2018). Simulating data for clinical research: A tutorial. Journal of Psychoeducational Assessment, 0734282917690302.
[4]
Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge university press.
[5]
Çolak, H., Dülgergil, Ç. T., Dalli, M., Hamidi, M. M., et al. (2013). Early child- hood caries update: A review of causes, diagnoses, and treatments. Journal of Natural Science, Biology and Medicine, 4 (1), 29.
[6]
Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to poisson regression and its alternatives. Journal of personality assessment, 91 (2), 121–136.
[7]
Greene, W. (2008). Functional forms for the negative binomial model for count data. Economics Letters, 99 (3), 585–590.
[8]
Hallgren, K. A. (2013). Conducting simulation studies in the r programming environment. Tutorials in quantitative methods for psychology, 9 (2), 43.
[9]
Hilbe, J. M. (2011). Negative binomial regression. Cambridge University Press.
[10]
Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1–14.
[11]
Morgan, G. B., Moore, C. A., & Floyd, H. S. (2018). On using simulations to inform decision making during instrument development. Journal of Psychoeducational Assessment, 36(1), 82-94.
[12]
Morris, T. P., White, I. R., & Crowther, M. J. (2017). Using simulation studies to evaluate statistical methods. arXiv preprint arXiv:1712.03198.
[13]
Mwalili, S. M., Lesaffre, E., & Declerck, D. (2008). The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research, 17 (2), 123–139.
[14]
Padhi, S. S., & Mohapatra, P. K. (2007). A discrete event simulation model for awarding of works contract in the government–a case study. In 5th international conference on e-governence-2007.
[15]
Sainani, K. L. (2015). What is computer simulation? PM&R, 7 (12), 1290–1293.
[16]
Sokolowski, J. A., & Banks, C. M. (2011). Principles of modeling and simulation: a multidisciplinary approach. John Wiley & Sons.
[17]
Wenger, S. J., & Freeman, M. C. (2008). Estimating species occurrence, abun- dance, and detection probability using zero-inflated distributions. Ecology, 89 (10), 2953–2959.
[18]
Xia, Y., Morrison-Beedy, D., Ma, J., Feng, C., Cross, W., & Tu, X. (2012). Modeling count outcomes from hiv risk reduction interventions: a compari- son of competing statistical models for count responses. AIDS research and treatment, 2012.
[19]
Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Zero-truncated and zero-inflated models for count data. In Mixed effects models and extensions in ecology with r (pp. 261–293). Springer.
Browse journals by subject