“The Data Science of COVID-19 Spread: Some Troubling Current and Future Trends”

Rex_Douglass_UCSD · August 18, 2020, 12:00am

We have a draft research note that I hope folks here will find relevant: “The Data Science of COVID-19 Spread: Some Troubling Current and Future Trends”

We enumerate some pathologies in measurement, inference, and interpretation in recent COVID-19 observational work.

Any feedback would be much appreciate, link and twitter discussion here: (https://twitter.com/RexDouglass/status/1295391498477826053)

Direct link: (The Data Science of COVID-19 Spread: Some Troubling Current and Future Trends)

todd_hendricks · August 19, 2020, 6:24pm

This is excellent, Rex. Thanks for sharing. The noise of the current moment should heighten our scientific skepticism. Disciplined, fundamentally sound science is critical.

todd_hendricks · August 19, 2020, 6:27pm

I can appreciate abstaining from the question of whether data science has been a net positive for the pandemic response…but if you’re inclined I’d be interested in your thoughts.

Rex_Douglass_UCSD · August 19, 2020, 6:50pm

Thank you for the kind words and for the question. My TLDR is that domain expertise is neither necessary nor sufficient for doing good work.

In practical terms, every single field has at least one person or group making positive contributions to this knowledge base. What people want to know is if there are any shortcuts, like can we gate-keep to just epidemiologists and be able to screen most of the bad and keep most of the good work, and the answer is unfortunately no.

Every single field in science has bad work, more in fact because bad work is easier to do than good work. Something like 1/4 to 3/4 of research isn’t even replicable by field (not good, replicable). So even as a social scientist, I’ve been churning through covid papers and being just horrified by what slips through.

My advice is try to find people who are genuinely curious and methodologically careful. There are groups that really really care about measurement and try to do it well, there are others who really really care about causal inference and try to do it well. We need to applaud and try to elevate them, because we’re never going to successfully screen everything that ought to be screened.

I wrote up a more substantive guide when this all started here: “How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From ‘Coronavirus Perspective’ (Epstein 2020)”
https://rexdouglass.github.io/TIGR/Douglass_2020_How_To_Be_Curious_Instead_of_Contrarian_About_Covid19.nb.html

Thomas_Young_Econometric_Studios_Utah_Legislature · August 19, 2020, 10:39pm

I’m guessing you’re not a fan of Nichols (https://www.amazon.com/Death-Expertise-Campaign-Established-Knowledge/dp/0190469412? Thanks for the post.

Thomas_Young_Econometric_Studios_Utah_Legislature · August 19, 2020, 10:42pm

@Rex_Douglass_UCSD I didn’t actually see any suggested optimal methods?

Rex_Douglass_UCSD · August 19, 2020, 10:57pm

I haven’t read that book specifically but I’m pretty versed in that literature. It always strikes me as both atheoretical and ahistorical. Expertise is itself a technology and an institution, many of which appeared formally in the 20th century. Like similar institutions that came before, e.g. the Church, it would be weird if they held on to power indefinitely. Science the institution is just a guild, with people that show up to work each day and produce better or less quality work that over decades might move the state of the art monotonically closer to the truth. It would be weirder if they were worshiped/had disproportionate power over society.

What I’m much more interested in is curiosity as an ethical value, an epistemic responsibility to hold true beliefs. Having those ethics will mechanically lead people to seek out and listen to whoever is doing the best work right now. And it’ll lead them to listen to someone else if they stop doing good work. If knowing the actual truth isn’t someone’s top priority, there’s nothing anyone anywhere can do to force them to spend the time and effort seeking it out. There’s no teacher in real life that can force people to do their homework.

Thomas_Young_Econometric_Studios_Utah_Legislature · August 20, 2020, 2:42am

@Rex_Douglass_UCSD Well argued. I can tell you know about the issues. What are you going to do with the paper? Are you thinking of adding some empirics to it?

Rex_Douglass_UCSD · August 20, 2020, 2:46am

Thank you and thanks for the interest. This note is is painfully brief because it was invited for a special issue and we had a hard word cap. Given how perfectionist we tend to be in my lab, it was probably good to have someone force us to put out something while it was still relevant. In the background we have several empirical projects focused on each of the three issues, measurement, inference, and interpretation.

Thomas_Young_Econometric_Studios_Utah_Legislature · August 20, 2020, 2:48am

Ajitesh_Srivastava_University_of_Southern_California · August 23, 2020, 6:58pm

@Rex_Douglass_UCSD: Interesting read! One reason for the difficulty in forecasting the peak (or accurate long-term forecasts in general) is the estimation of “effective population” which is reduced from actual population due to unreported (asymptomatic, mild symptoms, untested) cases and immune/isolated cases (if any). While some researchers are modeling this factor, it can be shown that it is learnable only in certain rare scenarios. In other words, you can always fit a model to the past data, but the leaned values do not reflect the truth. Here is my paper from KDD Health Day track on the learnability: https://arxiv.org/pdf/2006.02127.pdf

Regarding “hold models accountable for their long run out-of-sample performance”: A challenge here may be that the trends change with changing policies. So it is not clear what the ideal length of forecast should be for the evaluation.

Rex_Douglass_UCSD · August 23, 2020, 7:08pm

Totally agree.

One quick thought on how we should do long term evaluation of forecasts- if a forcast can’t make long run predictions then it should report confidence intervals out to some point where they explode and explicitly show where they claim to have knowledge and don’t.

I’ve seen some odd things recently with models claiming to predict a second hump in the u.s. when all their model did was say all values were possible, including zero.

While it’s true forcasting policy is its own nightmare, we can have measures that are functions of policy like mobility. If someone claims a good fit using mobility now, it better perform well in 6 months too.