Data Collection/Analysis and Covid-19

Critics of the modeling and of the data analysis are being too simplistic.

Steven L. Taylor · Saturday, April 11, 2020 · 36 comments

Let me start this post by violating a cardinal rule of the internet these days and starting that I am not an expert on epidemiology nor am I qualified to offer a detailed critique of the models being used to project the effects of Covid-19. I note this to be upfront, but also to underscore that it seems a lot of people out there quite confident in their expertise. And, not surprisingly, their political preferences seem to shape the findings of their newfound ability to assess data related to the epidemic. Although, in fairness, like Peter Navarro, I am a social scientist with a Ph.D.*

A focus of these conversations is the University of Washington’s Institute for Health Metrics and Evaluation (IHME) model. This has been exemplified by Alex Bereson (see, via FNC: Meet the former NYT reporter who is challenging the coronavirus narrative). A related discussion is that model’s downward estimation of the death toll of the disease (from the 100k-240k range to 60k).

Ultimately, it is about the debate over whether shutting down the economy to try and mitigate the health effects of the virus was worth the economic costs. This is a fair question. A real answer is going to take time. We do not yet know the actual toll of this disease nor do we know what the actual economic damage will be. Further, it is all but impossible to know which choice was optimal, as there are clear downsides to both. At a minimum, I think it is worth noting that there was going to be an economic impact regardless because even without stay-at-home orders, the pandemic was going to create economic damage (and it is even possible that health-related economic damage has been mitigated since higher death tolls and greater stress to the healthcare system would have had effects as well).

I will confess that my bias leans towards the pro-mitigation via staying at home side because death is harder to recover from than is unemployment. I say that firmly understanding that unemployment is a real hardship.

The part of this that I find frustrating is the fact that people are attempting to draw very firm conclusions (and influence public opinion) based on incomplete data.

This conversation is inherently partisan insofar as Republicans are seeking in many cases to lessen to the amount of blame President Trump is receiving in regards to his response to the pandemic. After all, if the effects are not as bad as some said it would be, that means that either a) he did a great job responding (which is what he is going to say no matter what the death toll is) or b) that it was never all that bad to begin with. I will admit, that even if the death toll is “only” 60,000 (or if, by a miracle, it stops at 30,000) that is a far cry from the claims Trump was making just over a month ago (i.e., talking like it was all going to go away and that the federal government was doing a “great job”).

Moreover, since Trump was first more concerned about the stock market than he was about the virus and has been consistently arguing to “reopen the country” the public health v. economic health issue has also been partisan. The ways in which governors in the heavily Republican southeast have underscored this (see, e.g., Alabama, Georgia, Florida, and Mississippi).

And, without a doubt, media consumption affects one’s views of these matters.

While I am not qualified to tear down the models, I can comment on the basic problems of data collection and the coding of said data. Also, there is the fundamental problem of trying to reach firm conclusions about a complex situation in the middle of the event unfolding. We have to remind ourselves that we are not anywhere near done with this crisis, so the notion that we can make definitive statements about policy choices is just plain foolish and almost certainly are going to be heavily influenced by motivated thinking.

For example, any critique of the models (or praise thereof) has to deal with a fundamental problem that we have had from the beginning: the lack of widespread testing. This is true about the living and the deceased. Without that information, we cannot really know how well the models have predicted things like spread or morbidity. It also makes it harder to know about either the need for social distancing nor about its efficacy.

There is also the problem of coding deaths (i.e., how are deaths identified in terms of building datasets of who died from what?). Some (as noted in the comments to James Joyner’s earlier post) think that we are underestimating deaths due to lack of testing (and just lack of data/confusion over the issue across the country).

Others think we are over-counting. One person who has been touting skepticism has been Brit Hume, both on his Twitter feed and on Fox News Channel.** For example, the following from Tucker Carlson’s show:

But there’s another side to this. Dr. Birx said tonight during the briefing at the White House that all deaths from anyone who died with coronavirus is counted as if the person died from coronavirus.
Now, we all know that isn’t true. I remember my own doctor telling me at one point when I was discussing prostate issues, he said about prostate cancer — I didn’t have it, as it happened, but he said, “You know, a lot more people die with it than die from it.”
That’s a real possibility, that people who have this disease, particularly they’re — lots of people are asymptomatic, who may have other terrible diseases. And if everybody is being automatically classified, if they’re found to have covid-19, as a covid-19 death, we’re going to get a very large number of deaths that way and we’re probably not going to have an accurate count of what the — of what the real death total is.

The problem is, this doesn’t make any sense. For one thing, a person who dies with no Covid-linked symptoms isn’t going to be tested. For example, a person who dies from a brain aneurysm but who is asymptomatic for Covid-19, but who is infected, is not going to be tested nor are they going to be categorized as a Covid-19 death.

Indeed, it is almost certainly the case that deaths coded as being from Covid-19 are going to be from people who have respiratory symptoms. Indeed, one would think that the process is not unlike how the CDC assesses influenza-related deaths:

Seasonal influenza-related deaths are deaths that occur in people for whom influenza infection was likely a contributor to the cause of death, but not necessarily the primary cause of death.

Since skeptics seem to want to liken all of this to the seasonal flu, perhaps they should think through how those deaths are accounted for.

In all honesty, especially given the lack of testing capacity, it seems far more likely that we are underestimating deaths rather than overestimating them.

Fundamentally it is this lack of information that has us in the conundrum we are in. That is: we have closed down social interaction to the level we have because we simply don’t know how prevalent the virus is in a given community. Further, we do not how many people are asymptomatic, nor how many have had the disease and have recovered. We don’t even know for sure if the recovered are now immune.

Further, we simply don’t know if we have been more successful than predicted with social distancing (and that explains the models being off) or we didn’t need as much social distancing as we have deployed (and that is what explains the models being off). There may, in fact, be other explanations for the seeming problems with the models.

And the reality is: modeling this kind of this is hard.

It seems quite reasonable to say we don’t know for sure if we have done the right thing in terms of the stay-at-home orders? It may well be that we have over-reacted. It is also true that we do not have the testing necessary to determine exact numbers at this time.

(It’s as if preparing for testing on a mass scale several months ago would have been a really good idea.)

I understand that there is a huge economic cost to our current approach, but my frustration with the critics is based in the fact that they are being far too confident that the data they have are enough to draw conclusions about the models. Ther are assuming that the current infection and death toll figures are accurate (or even that the death tolls are inflated). Since we lack testing capacity (and since there is a lag in results for the tests we have) it is problematic, to put it mildly, to say that we really know what the situation really is. It is not even clear that we are getting clear results in general (see this story about nursing home deaths).

Ultimately it is easy to sit at home from a safe distance and be skeptical about the situation. And, it can be easy to see the economic impact and be rightly concerned. But I am skeptical of persons who are neither epidemiologists nor directly involved in data collection and analysis proclaiming that they know for sure what the best course of action is, especially when they are suggesting the riskier scenario in terms of public health.

One thing I know for sure: it will take years to fully understand what the effects of this disease were as well as to fully evaluate the human costs both in terms of health and the economy. Another thing I am certain of is that the policy choice at the moment is far more complex than “open up the country” or not.

*If my snark is too subtle here, I thought that Navarro was being an arrogant ass by suggesting that his social science skills meant that he should be taken seriously as an expert source in regards to whether hydroxychloroquine was an efficacious drug for use to treat Covid-19. I would further note that his training should have made him skeptical about a study with such a small N (although medical studies often have Ns that make social scientists very uncomfortable), the sample was not random, and especially the fact that several of the participants who took the drug were excluded from the final study because they had to go to the ICU, one of whom died.

**It is worth noting the Hume largely endorsed Texas Lt. Gov Dan Patrick’s call to reopen the economy knowing it would put the elderly at risk, so his thinking on this topic is suspect, IMHO.

Comments

Slugger says:

Saturday, 11 April 2020 at 13:40

Thanks, good analysis! Kudos for a temperate discussion.
I do think that some things were done well, some so-so, and some badly. It would be great if we could undertake a careful analysis in order to learn in preparation for the next bad thing to come along which it surely will, but our partisan divide will make this impossible.

2
Michael Reynolds says:

Saturday, 11 April 2020 at 13:57

The only data I consider are deaths and deaths per million. The rest is test-dependent and given that we have – still – no national testing program, it’s useless. Of course the Trumpies now have to suggest the death data is b.s. because: money and #Cult45. But deaths and deaths per million remain the best data we have, the hardest data we have.

In terms of total deaths, we are #1. USA! USA!

In terms of deaths per million, if you set aside min-states like Andorra, and set aside China and Iran as likely to be propaganda, and count only ‘real’ countries, it looks like this:

Spain: 350
Italy: 322
Belgium: 289
France: 212
Netherlands: 154
UK: 145
Switzerland: 120
Sweden: 88
USA: 61
Ireland: 58
Portugal: 46
Denmark: 45
Austria: 37
Germany: 33
.
.
Canada: 16
.
.
South Korea: 4

Note that it’s a lot easier to get high per capita numbers if your population is small. None of those countries is anything like our size, the closest in population being Germany with a death rate per million half of ours.

Then there’s South Korea, which learned of Covid on the same we did and yet we have a per cap death rate 15 times theirs. So far. That number keeps getting worse for us.

We should have and could have done much better. The countries north of us on the deaths per million ranking are all far, far more geographically compact, have nothing like our great emptinesses, and are all more dependent on mass transit than we are.

The Trump apologists will want to pretend we couldn’t possibly have done as well as South Korea because: reasons, reasons, Asians, reasons. But what’s the explanation for the fact that we have a death rate almost four times that of Canada? Is it hockey? Donuts?

12
Kurtz says:

Saturday, 11 April 2020 at 14:24

The part of this that I find frustrating is the fact that people are attempting to draw very firm conclusions (and influence public opinion) based on incomplete data.

Oh come on now, nobody does this, especially not Alex Berenson.

3
Stormy Dragon says:

Saturday, 11 April 2020 at 14:38

@Michael Reynolds:

I consider are deaths and deaths per million.

Based on what I’ve read from epidemiological experts recently:

While infections per capita is important in terms of projecting where the spread will ultimately peter out, it’s irrelevant in terms of determining how quickly it spreads. If cases are going up 6% per day in country A and 3% per day in country B, country A is worse off even if a smaller percentage of its total population is infected.

Insisting on everything be expressed per-capita just lets bigger countries (like the US) make themselves look better without actually reducing the transmission rate.

2
Steven L. Taylor says:

Saturday, 11 April 2020 at 14:40

@Kurtz: Especially not, indeed.

1
Pete S says:

Saturday, 11 April 2020 at 14:41

@Michael Reynolds:

Speaking from Ontario, our right wing populist premier has been calm and competent throughout this crisis to the surprise of many of us. Our federal government has been professional and helpful. We seemed to move to shelter in place ahead of the US even though our infections started after yours. We now test way less than you do which may mean this trend does not continue.

I think the biggest difference is the US focused on stopping travel from China and seems to have done a piss poor job of that. We fairly early on required travelers from everywhere in the world including the US to self isolate for 2 weeks when entering the country.

2
Modulo Myself says:

Saturday, 11 April 2020 at 14:43

In the face of a broadening consensus on both the left and the libertarian right that sees marijuana as mostly healthy and even a positive in some circumstances, Berenson argued that the evidence instead shows a link between the drug and serious mental illness and an epidemic of violence.

Now he’s turned to challenging the narratives on the response to the coronavirus. What Berenson is promoting isn’t coronavirus denialism, or conspiracy theories about plots to curb liberties. Instead what Berenson is claiming is simple: the models guiding the response were wrong and that it is becoming clearer by the day.

In February I was worried about the virus. By mid-March I was more scared about the economy. But now I’m starting to get genuinely nervous,” he tweeted this week. “This isn’t complicated. The models don’t work. The hospitals are empty. WHY ARE WE STILL TALKING ABOUT INDEFINITE LOCKDOWNS?”

It’s just a clinical history of being dumb for dumb people’s money here. Every public health issue from cigarettes to health care to climate change to guns has had its baseline assumptions and science challenged by paid hacks, so it’s not surprising that there’s an instant industry ready to go.

4
Modulo Myself says:

Saturday, 11 April 2020 at 14:52

@Pete S:

Well, in America our psychopathic narcissist made Corona about him, and that’s considered totally normal. Most people have enough object-permanence in them to grasp that Corona exists in the real world, but lots of Americans are convinced reality is a script you workshop your way through to spin it in a different way. Does it work? Well, it keeps the spectacle going.

2
Mu Yixiao says:

Saturday, 11 April 2020 at 14:57

I was going to comment on the changes in the IHME model this morning–not to say “Aha! They were wrong!” but to suggest that what we’ve been doing seems to be working.

Just from my gut: I’m going to fall back on the 80/20 cliche. I’ll say that 80% of the drastic measures are justified, and 20% are some degree of over-reaction. That degree runs the gamut from “just a titch” to “yeah, that might be a bit paranoid”.

I am quite confident that “stay at home” and “social distancing” has had an effect. That seems pretty obvious from the various graphs showing a decline in deaths at a predictable interval after those practices have been put in place.

With regards to “cause of death”… that’s a tricky one. I haven’t looked into any of the details, but I’d be curious to know how deaths are being recorded. I did a temp job with a company that writes software for maintaining medical records (which only increased my aversion to “Medicare for All” based on how Medicare pays). One of the things that I learned about was “co-morbidity”–and the limits on how it’s recorded.

Co-morbidity is basically “This is the big problem, these are the smaller problems”. From what I could gather, the “smaller problems” are often left off of official reports (they’re on the patient records, but not listed on billing forms and other documents required by the government). So (as you said) if a person with COVID dies of an aneurysm, what’s listed as the cause of death? And do we have any clue whether COVID contributed to the aneurysm? Did the person not go to the hospital because people are told to stay at home unless it’s an emergency? Did they not go because they were afraid of catching COVID in the hospital–not knowing they already had it?

There are just too many variables.

My “perfect scenario” for dealing with this situation fails–because it requires that people be reasonable, informed, and caring about their community–when too many people are willfully-ignorant, selfish idiots and too many politicians are using this crisis to push partisan agendas and power-grabs.

Models change as more information becomes available. I’m encouraged that the IHME model has been corrected downwards. To me that says that–at least on one front–we’re doing the right thing. What the cost will be on other fronts is going to be… “interesting”.

3
charon says:

Saturday, 11 April 2020 at 14:59

Models are useful for various purposes, mainly gaming out policy possibilities.

Fair to say most people at high levels of the federal government (i.e., Trump and hes enablers and toadies) have little grasp of what models are, how they work, what they are good for.

Some useful discussion here:

https://www.balloon-juice.com/2020/03/25/martin-guest-post-questions-on-data-modeling-in-the-epidemic-part-1/

https://www.balloon-juice.com/2020/03/26/martin-guest-post-questions-on-data-modeling-in-the-epidemic-part-2/

Lots of charts and useful info part 2 especially.

There are later posts at BJ on this topic also, by Martin or Cheryl Rofer.

1
charon says:

Saturday, 11 April 2020 at 15:02

me:

There are later posts at BJ on this topic also, by Martin or Cheryl Rofer.

For example:

https://www.balloon-juice.com/2020/04/07/the-ihme-epidemiological-model/

https://www.balloon-juice.com/2020/04/10/the-two-numbers-i-am-watching-before-things-open-up/

1
charon says:

Saturday, 11 April 2020 at 15:08

Models are useful for various purposes, mainly gaming out policy possibilities.

There is some thinking out of Germany, for example (which may or may not be good) that grocery shopping and contact with contaminated surfaces are not significant vectors for the dusease:

https://today.rtl.lu/news/science-and-environment/a/1498185.html

Maybe, maybe not.

1
charon says:

Saturday, 11 April 2020 at 15:12

@Mu Yixiao:

I’m encouraged that the IHME model has been corrected downwards.

Whatever this model is useful for, it is also being used as a vehicle for partisan politics. Hoocudaknowed.

1
Mu Yixiao says:

Saturday, 11 April 2020 at 15:23

@charon:

Whatever this model is useful for, it is also being used as a vehicle for partisan politics.

I know. And I hate it.

2
Michael Reynolds says:

Saturday, 11 April 2020 at 16:30

@Stormy Dragon:
I’ve found doubling rates for cases, but again, those are sketchy as test regimes vary widely. Even hospital admission comparisons are problematic, because, again, different strokes.

I’d be interested in the doubling rate for deaths. Corpses are easy to count.
Stormy Dragon says:

Saturday, 11 April 2020 at 16:46

@Michael Reynolds:

Just counting corpses has issues too, since they generally aren’t tested post-mortem, so if they die before getting diagnosed with COVID19, they get left out of the death count.

And even if you stick to deaths, the rate of absolute increase is still a better measure than deaths per capita.

1
de stijl says:

Saturday, 11 April 2020 at 17:09

Brian N. did data collection. I did analysis.

That was how we split the loaf.

He had better programming skills, I had better SQL.

Of course, every afternoon I would give him and the dba specs of a new db structure to create and what to extract and load.

Me and my Adidas made a pretty good team.

Velshi as dbm created the space, and instituted the db.

I learned later that Velshi was a victim of one of those H1 visa scams that render you an indentured servant until you pay them off.

He and his whole family now live in Cherry Hill debt free.

My Adidas.
de stijl says:

Saturday, 11 April 2020 at 17:23

@Modulo Myself:

The Hack / My Corona

Mmmm my corona.

Doo doo doo doot/
My corona

Imagine Winona Rider spazz dancing in a 7 Eleven with Janene Garafalo.
Sleeping Dog says:

Saturday, 11 April 2020 at 17:58

@Michael Reynolds:
@Stormy Dragon:

The benefit of looking at death rate, is that we have a historic baseline. If you made the assumption everything above that was caused by C-19 you are going to be about as accurate as you can under the circumstances, poor testing etc.

4
Monala says:

Saturday, 11 April 2020 at 18:45

@Mu Yixiao: my husband died two years ago. His death certificate lists both his primary cause of death (heart failure) and his secondary causes (diabetes, kidney disease), and named them as such.
95 South says:

Saturday, 11 April 2020 at 20:17

I hope people will accept these things from Steven. They didn’t when I said them. (I admit to some sour grapes about it.)
Kurtz says:

Saturday, 11 April 2020 at 21:43

@95 South:

I missed when you said these things. Which thread?
Hal_10000 says:

Saturday, 11 April 2020 at 22:24

@Michael Reynolds:

The rest is test-dependent and given that we have – still – no national testing program

Only Iceland has anything approaching this. The number of deaths in Spain and Italy suggests they have many times as many cases as they have positive tests. Hospitalizations and deaths may be the best measure, although those are still weak because many people are dying at home or in nursing homes and not being counted.

2
Kurtz says:

Saturday, 11 April 2020 at 23:07

@95 South:

Well, I found it, or at least part of it.

You may think you said the same thing that Steven is, but you didn’t.

You have no ground to criticize others in the realm of data analysis given the last time you deployed a bad data argument around here.

If you want to get a serious response, make a decent argument without the 4chan attitude.

6
James Joyner says:

Sunday, 12 April 2020 at 08:25

@Michael Reynolds:

The countries north of us on the deaths per million ranking are all far, far more geographically compact, have nothing like our great emptinesses, and are all more dependent on mass transit than we are.

Yes but there’s an important caveat: almost all of our 20,000+ deaths at this point are concentrated in three states: New York with 8627, New Jersey with 2183, and Michigan with 1392. (Those are WorldoMeters data, which lag the state-reported numbers considerably.) New York City, which is densely populated and relies on mass transit, accounts for a shocking number of our deaths (5820 as of Friday—but, again, their own numbers, not the WorldoMeter). Yes, some 8.3 million people live in NYC. But they account for more than a quarter of the death toll for the entire country of 330 million.
SKI says:

Sunday, 12 April 2020 at 08:43

@James Joyner: but how much of that is because they are different in timing vs different in characteristics? We don’t know that yet.
DrDaveT says:

Sunday, 12 April 2020 at 10:43

@James Joyner:

Yes but there’s an important caveat: almost all of our 20,000+ deaths at this point are concentrated in three states: New York with 8627, New Jersey with 2183, and Michigan with 1392.

Conversely, all of the “flattening of the curve” is concentrated in those states, too. If you plot cumulative deaths for the rest of the country over time, they are still growing essentially exponentially, doubling in less than 4 days. That’s not as fast as the initial growth in NYC, but it’s also not showing any significant signs of slowing yet.

1
Jay L Gischer says:

Sunday, 12 April 2020 at 12:03

You know, coroners have been doing their job for a long time, and they are probably pretty good at it. They are likely very conversant with issues like co-morbidity.

The notion that deaths are being overcounted, as fronted by Tucker Carlson, requires that all of them be working together to undermine President Trump. It’s the classic hallmark of a conspiracy theory – it requires the coordination of a huge number of people, many of whom aren’t very interested in coordinating or cooperating – because they are people.

2
Barry says:

Sunday, 12 April 2020 at 12:17

@James Joyner: “Yes but there’s an important caveat: almost all of our 20,000+ deaths at this point are concentrated in three states: New York with 8627, New Jersey with 2183, and Michigan with 1392. ”

James, note that rates for other states are starting to increase sharply.
Barry says:

Sunday, 12 April 2020 at 12:19

@Jay L Gischer: “The notion that deaths are being overcounted, as fronted by Tucker Carlson, requires that all of them be working together to undermine President Trump. ”

It also requires a massive surge in deaths in several areas, for no good reason.
charon says:

Sunday, 12 April 2020 at 13:18

@DrDaveT:

@Barry:

You people must be looking at different data than I am seeing, daily exponent down into 1.07 to 1.09 for most states (and trending down), corresponds to 8 to 10 days to double.

Only a few states show increasing daily exponents.
DrDaveT says:

Sunday, 12 April 2020 at 16:24

@charon:

Only a few states show increasing daily exponents.

I didn’t say increasing exponent; I said “essentially exponentially”. The exponent has been roughly constant or declining slightly, depending on the state.

I’m using the JHU CSSE time series data for cumulative death totals. I note that something weird happened to their data about a week ago, where (e.g.) NY had a negative number of deaths one day. If you have an alternative source for state-by-state time series data, please post the link — I’ve been looking for one.
DrDaveT says:

Sunday, 12 April 2020 at 16:55

@charon:

You people must be looking at different data than I am seeing, daily exponent down into 1.07 to 1.09 for most states (and trending down), corresponds to 8 to 10 days to double.

I just re-fit to the latest data from JHU. If you combine all US except NY, NJ, and MI and look at daily deaths, there is very little bending down. Doubling has been roughly every 3.7 days since the day when there were 10 total deaths. Over the past 15 days, things have been slightly better, doubling every 4.6 days — but the growth is still much faster than linear.

If I look just at the daily % increase, yesterday’s 10% was the lowest value in a month. We’ll know in a few days whether that was a new trend or an anomaly.

1
charon says:

Sunday, 12 April 2020 at 19:56

If you have an alternative source for state-by-state time series data, please post the link — I’ve been looking for one.

It takes a while for new cases to die, so new death curves lag new cases curves by roughly two weeks.

Here is a site that lets you pick individual states or individual countries to highlight, you can also chose to look at active cases, confirmed cases, new cases, deaths or new deaths.

You can hover your cursor at individual data points to get daily exponent, exponent averaged for the past week, or several other factoids.

http://91-divoc.com/pages/covid-visualization/

This is another site I like, it has links to the similar data detailed for the U.S.

https://www.nytimes.com/interactive/2020/world/coronavirus-maps.html?action=click&module=Top%20Stories&pgtype=Homepage&action=click&module=Spotlight&pgtype=Homepage
DrDaveT says:

Sunday, 12 April 2020 at 20:13

@charon:

It takes a while for new cases to die, so new death curves lag new cases curves by roughly two weeks.

Right. On the other hand, “deaths” is probably low by a factor of less than two, while “confirmed cases” underestimates infections by an unknown factor that is somewhere between 5 and 50, and depends strongly on the local testing criteria. I don’t know of any good way around that problem.
DrDaveT says:

Tuesday, 14 April 2020 at 23:17

@DrDaveT:

I just re-fit to the latest data from JHU. If you combine all US except NY, NJ, and MI and look at daily deaths, there is very little bending down. Doubling has been roughly every 3.7 days since the day when there were 10 total deaths. Over the past 15 days, things have been slightly better, doubling every 4.6 days — but the growth is still much faster than linear.

The latest fit for “not NY NJ or MI” over the past week looks better — deaths per day roughly constant, about the same rate as NY. Cross your fingers…