There seems to be a lot less devices counted in there than usual (~2.7 million). Is this a result of a methodology change?

The decline in the total number of devices in (what is now) the backfill is what caused this entire thread.

The total devices went from 180k to 155k or so in one week.

this week has files in XML format, not csv.gz

All the weeks from April 9, 2018 forward are not properly formatted. :disappointed:

Ryan: I regenerated the stores this morning and so far no problems with XML. I just wish that I could push one button and download all the data. Generating stores for 153 weeks is pretty tedious, and then to have all the files have the same name.

The devices issue relates to the revision. Why in the revised data has the total number of devices seen gone down?

@Bruce_Mizrach_Rutgers_University thanks for clarifying. We are investigating and will get back to you next week.

Cc: @Lauren_Spiegel_SafeGraph @Chris_Tramel_SafeGraph will be the ones to provide an answer next week.

Here are total US devices from the new backfill (thru week of 2020-11-23). Still seeing a decline even in this data.

Hi @Bruce_Mizrach_Rutgers_University, we fixed an issue with the home_panel_summary in this backfill that explains the lower counts versus before the backfill:

Home Panel Summary (“home_panel_summary.csv”) in Weekly Patterns and Monthly patterns now only includes those devices whose homes are eligible to be counted in the visitor_home_cbgs column. See here for methodology on calculating homes. In the past, the counts in the Home Panel Summary included any device which had at least one visit during the given time period and for which we had identified a primary nighttime geohash with any degree of confidence. Meanwhile visitor_home_cbgs only included devices for which we had identified a primary nighttime geohash with a high degree of confidence. With this change, we are aligning our requirements for the visitor_home_cbgs column with the Home Panel Summary. This means the counts in the Home Panel Summary will be lower than they had been in the past but the Home Panel Summary and visitor_home_cbgs methodologies are now consistent which had been the original intent to help with normalization.

We saw generally less ping density due to social distancing starting in march 2020. There is a divergence there in your plot as well. With less moving around we get fewer high confidence homes. I realize this makes it harder to normalize the data. Would it be helpful to have both the high confidence and low confidence numbers going forward?

Thanks @Lauren_Spiegel_SafeGraph. I was using this variable as a way to normalize visits at the sub state-level, which allowed me greater flexibility with how I aggregate geographies. But if you all are going to be implementing revisions to this variable at such a scale then obviously that’s not a good thing to do.

I’ll have to just rely on the variables in the normalization datasets, which are only at the state-level. It will lead to a fairly sizable revision for all of my indicators relying on visits, but I guess that’s unavoidable.

The change was intended as an improvement, but I’m sorry for the inconvenience caused. We try to only make major changes twice per year along with a backfill so that users have a consistent view over time.