02/05/2020

Using Kalman filter to estimate effective reproduction number from public data (R script)


COVID-19 and data analysis

Starting from beginning of this year, almost all people in the world are affected their own life by COVID-19. Daily, headline news report new infection of hundreds ~ thousands of people with nicely analyzed data, and I knew that they put so much importance in effective reproduction number. Now, as a stupid scientist, my focus turned into the this special sounding parameter "Reproduction number" and wanted to estimate this number by myself from existing data base.

Effective reproduction number

Here, I quickly searched what I am trying to estimate. (by the way, I am not a professional of this field as you may sense already, if you want to know accurate & precise information, please find appropriate article.)
" The basic reproduction number of an infection can be thought of as the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection."
What I wanted to know was effective reproduction, but fine, there is a hint, so how many people do you infect when you are infected, this is the effective reproduction number. But, how we can estimate this number from the very limited database we can reach. To be fare, I could not find a good material which can describe how to calculate this number from public data. (that's why I made this article).

Learning from SIR model

Going back to basics, I also learnt that there is a model so called SIR, which is well used to estimate the trend of infection over time. The equation is below.
\begin{align} S^{'}=-\beta SI \end{align}\begin{align} I^{'}=\left( \beta S-\gamma \right) I \end{align}\begin{align} R^{'}=\gamma I \end{align}\begin{align} S+I+R=const \end{align}
Simply, this equation describes the relationships in between population from each domain: susceptible, infections and recovered.
Reproduction number also can be described as
\begin{align} R_{t}=\dfrac {\beta S}{\gamma }  \end{align}

Over the news, what we hear "newly infection ($N_{new}$)" can be$\beta SI$. And total infection is $I$. so, $R_{t}$ can be written as
\begin{align} R_{t}=\dfrac {N_{new}}{\gamma I }  \end{align}

Now calculation seems possible, this is just a ratio between daily infected number and total number of infection (and divided by $\gamma$; recovery rate).
In this study, we just use recovery rate to be 1/14, which is simply say, patients recover from COVID-19 over 2 weeks after their infection in average. I checked some materials but this number seems almost consistent and unchanged overtime (unless there is good medicine), but reproduction number changes drastically depending on the prevention measure : social distancing.

Data process

This time, I get data of Tokyo (my original place, I live in UK now) where they publish details of infection, date and ages for each patient. This time, I modified an existing R script to get the data, and summarize number of newly infections, and total infection number. Next, we just calculate the $\dfrac {N_{new}}{\gamma I }$.


The figure shows daily new case, and reproduction number.


Recently, Japanese TV show announced reproduction number went down to 1, so it seems the result is matching with the public announcement.

Kalman filter smoother

Since the plot is bit spiky and not looking fancy, lets use Kalman filter, while there is no specific known model for time transient of effective reproduction (I believe...), so lets define the system model to be just a random walk as below
\begin{align} x_{t} = x_{t-1}+w_{t} \end{align}\begin{align} y_{t} = x_{t}+v_{t} \end{align}
here, $w_{t} = N(0,W)$ and $v_{t} = N(0,V)$. This is the simplest form of Kalman filter, and we dont have no more than this because we have no idea about behavior model of reproduction. We can solve the problem by just writing the code for the Kalman filter loop (which I usually do for my daily work for firmware engineers), but this time I will use R package for my ease. By filtering the noisy signal, process result is shown as below. Now it looks like smoothed, and little sophisticated (I hope you agree...).
Details of the coding, and parameter used are uploaded to Github. (< it is not a beautiful code, don't get angry with me!)
(https://github.com/makiton5/R_public/blob/master/Covid_analysis_example.R)

Conclusion

Again, this is just my hobby work, and not a professional analysis for sure. This is just for us to ease to understand what is the meaning of those parameters, and how we can estimate them with little bit of data analysis technique (Kalman filter this time). But, the analysis shows consistent result I can find from news, which was good, and now I think I understand what are specialists saying about on TV.
Like this way, I want to summarize what I analyzed as a part of my hobby work.

No comments:

Post a Comment