I Want to Go Out! But... Are There Going to be Infectious People There?
"Well, I have had enough of the lockdown, and I want to go out with some buddies. Are there going to be any infectious people there?"
Good question.
You will probably want to find out the answer to this question before going out to join a group of friends in this new Covid-19 world. Or to a restaurant. Or a party.
Especially now that we know at least 40% of the people carrying the virus are asymptomatic (Reference 1) (Reference 2).
Finding a definite answer to this question is, of course, impossible. But we can make a good estimate with the help of a little bit of logic and statistics.
Estimating Prevalence
First we have to estimate the percentage of infectious people in our state. The details are in the appendix but as of today, June 17, 2020, in the state of Maryland, my best estimate is:
- 1 person out of 114 is infectious (0.88%)
OK, that is good to know.
So if I stay in crowds smaller than 114 people does that mean I'm safe?
Well, no.
That is not the way it works.
Given how the laws of chance operate, you won't all of the sudden have one infected person once the crowd size reaches 114 people.
So I cannot say that there will be only un-infectious people in a crowd less than 114 people.
But what I can do is give you a pretty good guess on what the probability is of your crowd containing one or more infectious people.
And then you get to decide on whether to take that chance or not.
Note that if it turns out there is someone infectious in the crowd, it doesn't mean you will definitely get infected. The statistics of actually getting infected will be the subject of a future blog post - and of course it will depend on things like:
- Are people wearing masks?
- Is it outside?
- Do you know the "Covid hygiene" of all the people that are going to be there or are they mostly strangers?
Estimating Probabilities
You don't need to know the details of the statistical calculations (they are in the appendix if you are interested).
You only need to know how to use the graph I've created, which is attached below.
The graph is easy to use. You just select the size of the crowd you expect on the "Crowd Size" (X) axis, follow one of the vertical lines until it intersects the blue line (assuming you live in Maryland) and see what the corresponding value for the probability is on the "Probability" (Y) axis.
For example, refer to the graph below. If you know that 80 people will be at an event, following the red arrows, you can see the corresponding probability is about 50%.
Are you willing to take a 50% chance that someone in the crowd is infectious with the virus? That is up to you!
Once again, this does NOT mean there is a 50% chance you will get the virus. It is much less than that and depends on how close you get to the person, whether there are masks involved, how long you are near him or her and several other factors.
But at least you are now informed as to how likely it is that the crowd contains an infectious person.
Large Events
There has been lots of speculation recently about people attending Trump rallies (on the right) and Defund the Police protests (on the left).
What is the chance of someone that is infectious being in THAT crowd?
Some of these events will draw 10,000 people or more. This number is too big to be on the graph, but I have calculated it for you (for the state of MD):
- The chances are 100.00000000% (to as many decimal places as my Excel spreadsheet will allow) that there will be at least one person shedding viruses in that crowd.
- The expected value for a crowd of 10,000 in MD is 88 people - meaning that there are most likely around 88 people sharing their Coronavirus RNA.
Might not be a good time to visit a rally or a protest, especially if people aren't wearing masks.
___________________________________________
Appendix
Part 1: Estimating Prevalence of Infectious People
I have not yet found a web resource that gives the prevalence of the virus at the state or local level. (Note that the prevalence of infected people does not equal the prevalence of infectious people - it is quite possible to be infected but not infectious.) So we will have to estimate it using available data.
The best sources of up to date Covid-19 data (such as the Johns Hopkins Covid-19 Dashboard - Reference 5 and the New York Times Coronavirus Reporting Page - Reference 4) give primarily cumulative numbers for Coronavirus cases and deaths - these are not much help for current up-to-the-minute information that we need to calculate prevalence.
The New York Times page as well as state resources for Maryland Covid-19 Dashboard (Reference 6) supply newly reported cases and hospitalizations each day. Since hospitalizations happen about 1-2 weeks after initial infection, it will be better to use case counts - which are the daily results of tests. Note that not every infection of Coronavirus is accounted for in the testing, and we will have to account for this later.
I will use the New York Times coronavirus reporting section for Maryland as the data source (click here for the report).
Here is the trend for newly reported cases in Maryland:
Estimating the Number of Currently Infectious People from Daily Test Results
You can see that as of June 16, for example, there were 351 new cases. To estimate the number of people that have been tested and are currently infectious, we also need to know how long people are infectious with the virus (after they contracted it).
As doctors and medical professionals gain experience with the SARS - CoV - 2 virus, they are learning that people are infectious pre-symptomatically and continue to shed viruses well after they are on the road to recovery. The best thinking to date is that people are infectious from 3 days to 14 days after exposure to the virus (Reference 3 ). That is a 12 day period (inclusive of both the beginning and the end days).
Since there is a decreasing (and fairly constant slope) on the graph above, we will just take a linear average for the prior 12 days. We can easily obtain this average by retrieving the number of new cases at the halfway point, or June 10, which is 6 days prior to June 16. This number is "571 new cases per day".
N = # of new cases (cases/day); averaged over a 12 day period = ~571 cases
L = Length of the period of infectiousness (days) = 12 days
To Be Calculated:
T = # of tested cases in the population who are (still) infectious (cases)
T = L x N = 571 x 12 = 6852 cases
Accounting for infectious people that haven't been tested
This is by far the most difficult one to estimate, and we will be super conservative here.
There are several reasons why the new cases per day reported by the New York Times and other web sources significantly understate the actual prevalence of the infection in the general public:
- Presymptomatic cases - these folks will eventually come down with the Covid-19 virus, but don't know it yet. This period can range from 3-7 days.
- People that get sick and don't get tested - There is not a current shortage of testing (that I am aware of), but it is still possible that quite a few people will not go in for a test, especially if the case is mild.
- Asymptomatic cases - it is well documented that asymptomatic transmission of the virus is commonplace (Reference 7) (Reference 8). This might account for the bulk of the cases.
These are all potentially significant sources of people that are infectious and we must account for them. However there is no direct way of counting them so we will have to use an approximation.
New York did a a very extensive antibody study about three weeks ago where they went out to supermarkets and randomly tested people for antibodies against SARS - CoV - 2. If someone tests positive for antibodies, it means they have previously had the infection.
The surprising conclusion is that they found ~20% of the people in NYC tested had antibodies (Reference 9), but, consulting the New York Times Covid tracking for NYC, only about 2.6% of the population have been reported to have the virus. This is a factor of 7.7.
Note: Antibody assays tend to have false positives (overstating the numerator) and Covid infection testing using techniques such as PCR tend to have false negatives (understating the denominator) - so this factor of 7.7 is therefore probably on the high side, and will overstate the number of cases, making the final estimate more conservative.
We can use this information to estimate the actual prevalence of infectious people. The big assumption, of course, is the same sort of dynamics hold for Maryland as for New York. This is probably the biggest source of uncertainty in the calculation. We can update the numbers when the Maryland antibody test results are made public.
T = # of tested cases in the population who are (still) infectious (cases) = from above = 6852
A = Prevalence of people with antibodies in NYC (cases) = 20%
R = Reported test case positive result in NYC = 2.6%
To Be Calculated:
V = prevalence of infectious people (cases)
V = T x A / R = 6852 x 20% / 2.6% = 6852 x 7.7 = 52,760
There are probably about 52,760 people in Maryland that are currently infectious!
Likelihood
We want to know how likely it is that we will run into one of the 52,760 people.
Maryland has a population of 6 million
M = Population of MD = 6,000,000
V = Prevalence (from above) = 52,760
To be Calculated:
L = Likelihood
H = proportion of population
L = M / V = 6,000,000 / 52,760 = 1 in 114 people
H = V / M = 52,760 / 6,000,000 = about 0.88% of the population.
Part 2: Estimating Probabilities
Now that we know the likelihood, or the % of the population that is infectious, we will next estimate the probability (Prob) of being in a crowd (of size C) given the proportion H of infectious people in the state.
First we will calculate the expected number of infected people in the crowd:
C = Crowd size (independent variable)
H = proportion of infected people (0.88%)
To be Calculated:
X = Expected number of infected people in crowd = C x H = C x 0.88%
So, for example, for a crowd size C of 1000 people, the expected number X is 8.8 infectious people.
Given the randomness inherent in a group of a given number of people, there is a significant chance that a crowd will not contain the expected number of infectious people. To account for this properly, we will assume a Poisson Distribution (thanks to Larry Marschall for suggesting the use of Poisson to solve the problem). (Reference 10)
Poisson distributions are usually used to predict timing of events. A good example is to predict the distribution of arrival times of meteorites over some time period, given the average number per hour. Considering this, it is analogous to making a prediction of the number of infectious people in a crowd, given the crowd size and the expected number of infectious people.
C = Crowd size (will be an independent variable)
X = Expected number of people in the crowd = C x 0.88%
e = Euler's number = 2.718 (we use this in the Poisson distribution calculation)
k = actual number of infected people in the crowd (another independent variable)
To be Calculated:
Prob = probability of observing k people in the crowd
Poisson distribution:
Prob (observing k infected people in crowd size C) = (e(raised to the power of -X)) times X (raised to the power of k) / k!
We can considerably simplify this by setting k = 0 - i.e. looking for the probability of observing NO infectious people in the crowd (0 people). We can then subtract this from 1 to get the cumulative probability of the crowd having one or more infectious people.
Since 0! = 1 and any number (X) raised to the power of 0 is also 1 (yes, this is real math!), then the formula for the Poisson distribution is:
Prob (observing 0 people in crowd size of C) = (e(raised to the power of-X)) times X
Since X = C times 0.88%
Prob = (e(raised to the power of -C times 0.0088))
To get the probability of having one or more infectious people we subtract that formula from 1:
Prob (observing at least one infected person = 1 - (e(raised to the power of -C times 0.0088))
This is the formula we plotted in the graph above (as a function of crowd size C in increments of 10 people.
Comments
Post a Comment