Hi everyone, we have a new working paper out on how air pollution exacerbates Covid-19:
One major finding using Safegraph data is that rate-based measures of Covid-19 intensity (e.g. deaths per 100,000) can be significantly biased when researchers use ACS population in the denominator, because this fails to account for how people moved around during the pandemic. We find that adjusting death rates based on changes in device counts during the first wave would double death rates in some wealthy neighborhoods.
Using time downwind of highway as an instrument variable is very clever!
Reading through the paper I had 3 questions, 1 about instrument variables and 2 about your use of SafeGraph data. Sorry for the wall of text, would love any thoughts you and your team can share!
re: the instrument variable. From SM “[We assume] wind direction only effects COVID-19 outcomes through its effect on air quality”.
I understand such assumptions are necessary; I’m wondering how to test or confirm the strength of this assumption. For example, I could imagine avg time downwind of highway → avg air quality → avg real-estate / rental prices → socio-economics of population → risk factors for covid. In the SM you wrote “This is a reasonable assumption because the pollutants of interest are generally not detectable via sight or smell at concentrations in NYC, and differences of the magnitudes our coefficients report would clearly not be detectable.” Air pollutants is not something I know much about, but I find your argument very compelling. Still, I’m curious whether your instrument variable is correlated with the socio-economic variables you considered, such as income, and whether or not such a correlation is a problem for the assumption of your model. Thanks for your thoughts!
2nd and 3rd, re: SafeGraph data to estimate changes in true population. 2 part question
Do I understand correctly that SafeGraph data was used to help explore /control for the problems with rate-based outcome measures, but ultimately your main model simply used counts of deaths and hospitalizations (log-transformed) and did not incorporate any actual “population correction” based on SafeGraph data or otherwise? In other words, SafeGraph data helped you decide not to use rate-based measures, but SafeGraph data was not used in the final form of the model. Is that correct?
Nitty-gritty SafeGraph data nerd question: How exactly did you use SG data to produce the “adjusted for device-count change”, as shown in S3? I know that using SG data to estimate population changes during the pandemic is a topic that many have discussed and has various challenges, so would love any additional detail on your thought process. I assume SafeGraph Patterns home_panel_summary.csv gives you the number_devices_residing for Week 10 and Week 20 for every Census Block Group. But how did you take these numbers to produce the “death rate adjusted for device-count change” ?
Hey Ryan, great questions! Let me see if I can answer:
We did some tests of exactly the nature that you report in our supplementary material. Mostly we find that our instruments are not significantly correlated with other socio-economic characteristics. However, even if they were significant, such a test would not be definitive. For example, imagine that air quality causes lower income (many papers have shown effects like this). Then your instrument would be correlated with income differences, but this would not necessarily be a bad thing. If the causal chain is AQ -> income -> covid outcomes, then we are counting that portion of the effect of income that is caused by air quality in our estimates. Another way of saying this is that we are capturing all the direct AND indirect effects of air quality on covid outcomes. Since we have multiple instrumental variables, another test we can do is called the Sargan test of overidentification. Basically this test looks at whether if you dropped any of your instruments would your coefficient estimates change significantly - sometimes this is interpreted as an indirect test of the exogeneity assumption. These test statistics are also reported in the tables in the supplement.
2 and 3. You’re correct that we ultimately did not use safegraph data in our final specifications, although there are some tests in the appendix that incorporate it into rate based measures. Basically, we attempted a population correction factor in a very simple way - if the percentage of devices decreased by x% then we multiplied the population by (1-x) and used that adjusted population to create new death and hospitalization rates. In general this increased our point estimates of the effect of air quality, but they were not significant. We believe that is because our ad hoc correction factor is too noisy, and in particular, we imagine elderly people that are most susceptible to the effects of covid x air quality are probably underrepresented in the mobility data. No doubt there is more we could do on that front, and if you have any suggestions, we’d love to hear them! Our ‘count based’ models basically rely on the assumption that population dynamics during the epidemic were not significantly different from each other depending on whether a tract is upwind or downwind, which we thought was reasonable.
@Matthew_Gordon_Yale_University Interesting study. Have your results held up through June 2021? I’m guessing the results have weakened, but would like to see a time-series view of it. Thanks for the paper.