Tuesday, April 07, 2020

Statistical methods for Epidemiologists

I wrote yesterday that I am reading a book called 'The signal and the noise' by Nate Silver about predictions; I had got up to the chapter which writes about predicting epidemics, which is a very suitable topic for these days. There is discussion of avian and swine flu, which were predicted to infect many more people than they actually did - which is just as well.

Silver talks about the simplistic SIR model "formulated in 1927, that posits that there are three “compartments” in which any given person might reside at any given time: S stands for being susceptible to a disease, I for being infected by it, and R for being recovered from it. For simple diseases like the flu, the movement from compartment to compartment is entirely in one direction: from S to I to R". Another important variable is R0 which gives an indication of how many healthy people are liable to catch the disease from a carrier.

For 'normal' flu, R0 is about 1.2, but initially scientists were saying that for Covid-19, R0 is about 3, which makes it very infectious. One thing which I haven't seen yet is the major constraint in handling this disease: the number of ventilators. The rationale to handling this epidemic is to try and keep the number of infected people low enough to ensure that they don't overwhelm the hospitals. Eventually everyone will probably catch Covid-19, but if this number is spread out wide enough over time, the health authorities will be able to deal with all the cases.

Another important statistic which I haven't yet seen in Silver's book but is connected to the above constraint is 'how often does the number of infected people double?'. A week ago, the figure in Israel was 3: if there were 1000 infected people on Sunday, then by Wednesday there were 2000. Now they are saying that the rate is 11 days which is a huge improvement, thanks to social distancing and people staying at home. To use an analogy: if the number of infected people is the velocity of a car, then the doubling rate is its acceleration. The health authorities would like zero acceleration, which would mean no increase in velocity (infected people) and even better, deceleration, in which the number of infected people decreases.

The epidemic endgame rests on how long the virus can stay active after it has infected people. If all the non-infected people are kept off the streets and only people who have recovered from the virus are allowed on the streets, what is the possibility of non-infected people catching the virus from those who have recovered? No one knows as yet. What is the influence of temperature and UV light on the virus (think of an Israeli summer and its effect on skin cancer)? Again, no one knows.

Another important datum is the length of time that the virus can lay dormant in a carrier. Let's say that everybody between the ages 20-29 cannot become ill but they can become carriers. If a 20 year old is infected on the first of April, will the virus that she carries be able to infect a 60 year old on the fifth of April? On the tenth of April? At the end of April? This datum again will influence the endgame, which is when people can return to a normal life.

As Silver notes, the important data about an epidemic are known only several years after the epidemic passes. No one knows the data when the epidemic is active and so the health authorities have to make predictions in order to handle the epidemic. This book is about making predictions: how to make models as accurate as can be and how to mitigate biases (something which I have written about recently in my thesis). So, this is very much a book for our times.

No comments: