Prediction of the effective reproduction number of COVID-19 in Greece. A machine learning approach using Google mobility data

Authors

  • Athanasios Arvanitis Environmental Informatics Research Group, School of Mechanical Engineering, Aristotle University of Thessaloniki, Greece
  • Irini Furxhi Deptartment of Accounting and Finance, Kemmy Business School, University of Limerick, Ireland
  • Thomas Tasioulis Environmental Informatics Research Group, School of Mechanical Engineering, Aristotle University of Thessaloniki, Greece
  • Konstantinos Karatzas Environmental Informatics Research Group, School of Mechanical Engineering, Aristotle University of Thessaloniki, Greece

DOI:

https://doi.org/10.31181/jdaic1001202201f

Keywords:

COVID-19, mobility reports, effective reproduction number, machine learning

Abstract

This paper demonstrates how a short-term prediction of the effective reproduction number (Rt) of COVID-19 in regions of Greece is achieved based on online mobility data. Various machine learning methods are applied to predict Rt and attribute importance analysis is performed to reveal the most important variables that affect the accurate prediction of Rt. Work and Park categories are identified as the most important mobility features when compared to the other attributes, with values of 0.25 and 0.24, respectively. Our results are based on an ensemble of diverse Rt methodologies to provide non-precautious and non-indulgent predictions. Random Forest algorithm achieved the highest R2 (0.8 approximately), Pearson’s and Spearman’s correlation values close to 0.9, outperforming in all metrics the other models. The model demonstrates robust results and the methodology overall represents a promising approach towards COVID-19 outbreak prediction. This paper can help health-related authorities when deciding on non-nosocomial interventions to prevent the spread of COVID-19.

Downloads

Download data is not yet available.

References

Anastassopoulou, C., et al., Data-based analysis, modelling and forecasting of the COVID-19 outbreak. PLOS ONE, 2020. 15(3): p. e0230405.

Salisu, A.A. and L.O. Akanni, Constructing a Global Fear Index for the COVID-19 Pandemic. Emerging Markets Finance and Trade, 2020. 56(10): p. 2310-2331.

Demertzis, K., D. Tsiotas, and L. Magafas, Modeling and Forecasting the COVID-19 Temporal Spread in Greece: An Exploratory Approach Based on Complex Network Defined Splines. International Journal of Environmental Research and Public Health, 2020. 17(13): p. 4693.

Wilson, M.E., Travel and the emergence of infectious diseases. Emerging infectious diseases, 1995. 1(2): p. 39.

Tatem, A.J., D.J. Rogers, and S.I. Hay, Global transport networks and infectious disease spread. Advances in parasitology, 2006. 62: p. 293-343.

Yuan, X., et al., State heterogeneity in the associations of human mobility with COVID-19 epidemics in the European Union. Am J Transl Res, 2020. 12(11): p. 7430-7438.

Brauner, J.M., et al., Inferring the effectiveness of government interventions against COVID-19. Science, 2020: p. eabd9338.

Seale, H., et al., Improving the impact of non-pharmaceutical interventions during COVID-19: examining the factors that influence engagement and the impact on individuals. BMC Infectious Diseases, 2020. 20(1): p. 607.

Basellini, U., et al., Linking excess mortality to Google mobility data during the COVID-19 pandemic in England and Wales. French Institute for Demographic Studies., 2020.

Binti Hamzah FA, et al., CoronaTracker: Worldwide COVID-19 Outbreak Data Analysis and Prediction. Bull World Health Organ., 2020.

Arroyo-Marioli, F., et al., Tracking R of COVID-19: A new real-time estimation using the Kalman filter. PLOS ONE, 2021. 16(1): p. e0244474.

Karnakov, P., et al., Data-driven inference of the reproduction number for COVID-19 before and after interventions for 51 European countries. Swiss Med Wkly, 2020. 150: p. w20313.

Ahmad, A., et al., The Number of Confirmed Cases of Covid-19 by using Machine Learning: Methods and Challenges. Archives of Computational Methods in Engineering, 2020.

Georgiou, H.V., COVID-19 outbreak in Greece has passed its rising inflection point and stepping into its peak. medRxiv, 2020: p. 2020.04.15.20066712.

Saba, T., et al., Machine learning techniques to detect and forecast the daily total COVID-19 infected and deaths cases under different lockdown types. Microsc Res Tech, 2021.

Tuomisto, J.T., et al., An agent-based epidemic model REINA for COVID-19 to identify destructive policies. medRxiv, 2020: p. 2020.04.09.20047498.

Politis, G.D. and L. Hadjileontiadis, Covid19 infection spread in Greece: Ensemble forecasting models with statistically calibrated parameters and stochastic noise. medRxiv, 2020: p. 2020.06.18.20132977.

Rachaniotis, N.P., et al., A Two-Phase Stochastic Dynamic Model for COVID-19 Mid-Term Policy Recommendations in Greece: A Pathway towards Mass Vaccination. International Journal of Environmental Research and Public Health, 2021. 18(5): p. 2497.

Katris, C., A time series-based statistical approach for outbreak spread forecasting: Application of COVID-19 in Greece. Expert Systems with Applications, 2021. 166: p. 114077.

Patsatzis, D.G. On the relation of the COVID-19 reproduction number to the explosive timescales: the case of Italy. 2021.

Linka, K., M. Peirlinck, and E. Kuhl, The reproduction number of COVID-19 and its correlation with public health interventions. medRxiv : the preprint server for health sciences, 2020: p. 2020.05.01.20088047.

Kaloudis, K., et al. Estimation of the effective reproduction number for SARS-CoV-2 infection during the first epidemic wave in the metropolitan area of Athens, Greece. 2020. arXiv:2012.14192.

Cori, A., et al., A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics. American Journal of Epidemiology, 2013. 178(9): p. 1505-1512.

Wallinga, J. and P. Teunis, Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures. American Journal of Epidemiology, 2004. 160(6): p. 509-516.

Lytras, T., et al., An improved method to estimate the effective reproduction number of the COVID-19 pandemic: lessons from its application in Greece. medRxiv, 2020: p. 2020.09.19.20198028.

Salas, J., A Convex Optimization Solution for the Effective Reproduction Number Rt. medRxiv, 2021: p. 2021.02.10.21251542.

Bryant, P. and A. Elofsson, Estimating the impact of mobility patterns on COVID-19 infection rates in 11 European countries. PeerJ, 2020. 8: p. e9879-e9879.

Ilin, C., et al., Public Mobility Data Enables COVID-19 Forecasting and Management at Local and Global Scales. medRxiv, 2020: p. 2020.10.29.20222547.

Sulyok, M. and M. Walker, Community movement and COVID-19: a global study using Google's Community Mobility Reports. Epidemiology and infection, 2020. 148: p. e284-e284.

Tamagusko, T. and A. Ferreira, Data-Driven Approach to Understand the Mobility Patterns of the Portuguese Population during the COVID-19 Pandemic. Sustainability, 2020. 12(22): p. 9775.

Huynh, T.L.D., Does culture matter social distancing under the COVID-19 pandemic? Safety Science, 2020. 130: p. 104872.

Drake, T.M., et al., The effects of physical distancing on population mobility during the COVID-19 pandemic in the UK. The Lancet Digital Health, 2020.

Dietz, K., The estimation of the basic reproduction number for infectious diseases. Statistical Methods in Medical Research, 1993. 2(1): p. 23-41.

Nishiura, H. and G. Chowell, The Effective Reproduction Number as a Prelude to Statistical Estimation of Time-Dependent Epidemic Trends. Mathematical and Statistical Estimation Approaches in Epidemiology, 2009: p. 103-121.

EU, Communication from the Commission EU Guidance for the progressive resumption of tourism services and for health protocols in hospitality establishments – COVID-19 2020/C 169/01. 2020.

EU, Communication from the Commission to the European Parliament, the council, the European economic and social Committee and the committee of the regions. Tourism and transport in 2020 and beyond. 2020a: Publications Office of the EU.

JRC, Effective Reproduction Number Estimation from Data Series 2020. 2020: Publications Office of the EU.

Bettencourt, L.M.A. and R.M. Ribeiro, Real Time Bayesian Estimation of the Epidemic Potential of Emerging Infectious Diseases. PLOS ONE, 2008. 3(5): p. e2185.

Giraudo, M.T., et al., [Rt or RDt, that is the question!]. Epidemiol Prev, 2020. 44(5-6 Suppl 2): p. 42-50.

Wallinga, J. and M. Lipsitch, How generation intervals shape the relationship between growth rates and reproductive numbers. Proceedings of the Royal Society B: Biological Sciences, 2007. 274(1609): p. 599-604.

Wallinga, J. and P. Teunis, Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol, 2004. 160(6): p. 509-16.

Lapatinas, A., The effect of COVID-19 confinement policies on community mobility trends in the EU. 2020, JRC: Luxembourg.

Sarica, A., A. Cerasa, and A. Quattrone, Random Forest Algorithm for the Classification of Neuroimaging Data in Alzheimer's Disease: A Systematic Review. Front Aging Neurosci, 2017. 9: p. 329.

Smith, P.F., S. Ganesh, and P. Liu, A comparison of random forest regression and multiple linear regression for prediction in neuroscience. J Neurosci Methods, 2013. 220(1): p. 85-91.

Mutanga, O., E. Adam, and M.A. Cho, High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm. International Journal of Applied Earth Observation and Geoinformation, 2012. 18: p. 399-406.

Singh, K.K., et al., Kalman filter based short term prediction model for COVID-19 spread. Applied Intelligence, 2020.

Breiman, L., Random Forests. Machine Learning, 2001. 45(1): p. 5-32.

Grömping, U., Variable Importance Assessment in Regression: Linear Regression versus Random Forest. The American Statistician, 2009. 63(4): p. 308-319.

Segal, M. Machine Learning Benchmarks and Random Forest Regression. 2004.

Efron, B., Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 1979. 7(1): p. 1-26, 26.

Philemon, M.D., Z. Ismail, and J. Dare, A Review of Epidemic Forecasting Using Artificial Neural Networks. International Journal of Epidemiologic Research, 2019. 6(3): p. 132-143.

Srivastava, N., et al., Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 2014. 15(1): p. 1929–1958.

Kingma, P., Diederik and J. Ba, Adam: A Method for Stochastic Optimization. Machine Learning, 2017.

Gupta, A.K., et al., Prediction of COVID-19 pandemic measuring criteria using support vector machine, prophet and linear regression models in Indian scenario. Journal of Interdisciplinary Mathematics, 2021. 24(1): p. 89-108.

Chang, Y.-W., et al., Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. J. Mach. Learn. Res., 2010. 11: p. 1471–1490.

Kuo, C.-P. and J.S. Fu, Evaluating the impact of mobility on COVID-19 pandemic with machine learning hybrid predictions. The Science of the total environment, 2021. 758: p. 144151-144151.

Menze, B.H., et al., A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics, 2009. 10(1): p. 213.

Furxhi, I., et al., Machine learning prediction of nanoparticle in vitro toxicity: A comparative study of classifiers and ensemble-classifiers using the Copeland Index. Toxicology Letters, 2019. 312: p. 157-166.

Furxhi, I. and F. Murphy, Predicting In Vitro Neurotoxicity Induced by Nanoparticles Using Machine Learning. International Journal of Molecular Sciences, 2020. 21(15): p. 5280.

Felizola Diniz-Filho, J.A., et al., The effective reproductive number (Rt) of COVID-19 and its relationship with social distancing. medRxiv, 2020: p. 2020.07.28.20163493.

Gostic, K.M., et al., Practical considerations for measuring the effective reproductive number, Rt. medRxiv, 2020: p. 2020.06.18.20134858.

Bataineh, M. and T. Marler, Neural network for regression problems with reduced training sets. Neural Networks, 2017. 95: p. 1-9.

Wang, L.a., et al., Estimation of biomass in wheat using random forest regression algorithm and remote sensing data. The Crop Journal, 2016. 4(3): p. 212-219.

Li, B., et al., Comparison of random forests and other statistical methods for the prediction of lake water level: a case study of the Poyang Lake in China. Hydrology Research, 2016. 47(S1): p. 69-83.

Cammarota, C. and A. Pinto, Variable selection and importance in presence of high collinearity: an application to the prediction of lean body mass from multi-frequency bioelectrical impedance. Journal of Applied Statistics, 2020: p. 1-15.

Dormann, C.F., et al., Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 2013. 36(1): p. 27-46.

Spüler, M., et al. Comparing metrics to evaluate performance of regression methods for decoding of neural signals. in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2015.

Willmott, C.J. and K. Matsuura, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 2005. 30(1): p. 79-82.

Weaver, K.F., et al., Pearson's and Spearman's Correlation, in An Introduction to Statistical Analysis in Research. 2017. p. 435-471.

Wang, H. and N. Yamamoto, Using a partial differential equation with Google Mobility data to predict COVID-19 in Arizona. Mathematical Biosciences and Engineering, 2020. 17(5): p. 4891-4904.

Kourlaba, G., et al., Willingness of Greek general population to get a COVID-19 vaccine. Global Health Research and Policy, 2021. 6(1): p. 3.

Published

18.12.2021

How to Cite

Arvanitis, A., Furxhi, I., Tasioulis, T., & Karatzas, K. (2021). Prediction of the effective reproduction number of COVID-19 in Greece. A machine learning approach using Google mobility data. Journal of Decision Analytics and Intelligent Computing, 1(1), 1–21. https://doi.org/10.31181/jdaic1001202201f