Data Collection/Analysis and Covid-19: TL;DR Edition

Not to encourage people to skip the longer version, but here's a key point.

Steven L. Taylor · Saturday, April 11, 2020 · 8 comments

Let me emphasize a key point that I really wanted to make in my previous, much longer post. It comes from a FiveThirtyEight piece (Why It’s So Freaking Hard To Make A Good COVID-19 Model) I linked therein:

Numbers aren’t facts. They’re the result of a lot of subjective choices that have to be documented transparently and in detail before you can even begin to consider treating the output as fact. How data is gathered — and whether it is gathered the same way each time — matters.
There’s also the issue of uncollected or inaccurate data. To determine the fatality rate, you have to divide the number of people who have died from the disease by the number of people infected with the disease. In this case, we don’t really have a reliable count for the number of people infected — so, to put it mathematically, we don’t know the denominator. (If we’re being honest, we probably don’t know exactly what the first number — the numerator — is, either, but we’re assuming it’s closer to correct.)

In other words not only is there a lot we don’t know, but even the number of deaths as currently reported is an artifact of uncertainty.

The real death tally is the reported tally +/- some level of error (which, of course, is true about the annual flu-related death rate, or anything else that requires judgment calls and/or has to account for human error).

The question at the moment is: what is more likely in these conditions? An over-count or an under-count?

I would argue that an under-count is more probable. First and foremost because of the lack of adequate testing. Second, this is a new phenomenon (unlike the flu) and there is, therefore, no experience with making decisions about how to classify morbidity (and this also raises consistency problems in terms of coding deaths). Third, we are placing a lot of stock in instant counts, but the reality is that in the middle of crisis we should expect some communication errors and lags.

On that last point let me note: a lag in communication cannot lead to an over-count, it can only lead to an under-count.

Another problem with getting a good sense of the available data is that there are various time-horizons in operation here. California is one clock, NYC in on its own clock, and Louisana yet another. It is difficult to really assess the effects of various policy choices right now. For example, Florida’s stay-at-home order is only just over a week old as I write this. (And their tests are lagging, as I noted in a link my previous post).

Fundamentally, I would argue that the criticisms of the estimates of the death toll are asserting far too much certainly prematurely because they aren’t thinking through both the quality of the data at the moment nor the incompleteness thereof.

Comments

senyordave says:

Saturday, 11 April 2020 at 15:29

Vaccine available in September? I can’t find much of anything but I just saw something saying that researchers at Oxford are 80% sure they have a vaccine and it would be ready sometime in September.
senyordave says:

Saturday, 11 April 2020 at 15:35

Here is the link, seems real:
https://www.bloomberg.com/news/articles/2020-04-11/coronavirus-vaccine-could-be-ready-in-six-months-times
CSK says:

Saturday, 11 April 2020 at 16:33

@senyordave:
Fingers crossed.
Stormy Dragon says:

Saturday, 11 April 2020 at 16:49

@senyordave:

Related to that:

Bill Gates is funding new factories for 7 potential coronavirus vaccines, even though it will waste billions of dollars

Bill Gates is building factories to mass produce seven vaccine candidates, so that as soon as testing concludes the best one can be made immediately available and just accepting the other six as a loss.

5
charon says:

Saturday, 11 April 2020 at 17:16

Fundamentally, I would argue that the criticisms of the estimates of the death toll are asserting far too much certainly prematurely because they aren’t thinking through both the quality of the data at the moment nor the incompleteness thereof.

Getting overly focused on the lack of accuracy of models is a failure to appreciate their purpose – which is guiding policy, which they can do without being accurate.

4
senyordave says:

Saturday, 11 April 2020 at 17:32

My last job consisted primarily of building financial models. One of the people I built models for swore by my work, and he used to say that all my models were wrong, but much less wrong than anybody else’s models.

3
DrDaveT says:

Saturday, 11 April 2020 at 20:50

@Stormy Dragon:

Bill Gates is building factories to mass produce seven vaccine candidates, so that as soon as testing concludes the best one can be made immediately available and just accepting the other six as a loss.

Good on him.

In truth, that’s what for-profit pharmaceutical companies do, too — they fund many drug developments, in hopes that the one success in the bunch will pay for all of the losers. Fewer than 10% of drugs entering Phase 1 trials eventually get approved — and quite a few drugs never make it even to Phase 2.

2
SKI says:

Sunday, 12 April 2020 at 08:30

@senyordave: from that link

Manufacturing the millions of vaccine doses necessary could take months. Gilbert said she’s in discussions with the British government about funding, and starting production before the final results are in, allowing the public to access the vaccine immediately if it proves to work. She said success by the autumn was “just about possible if everything goes perfectly

So, if they go straight to manufacture of an untested vaccine, and nothing of any sort goes wrong while moving to mass production, and they get lots of luck, then maybe in the fall if her 80% self-confidence is actually true.