A few questions I wanted to confirm. Thank you for any assistance you can offer:
Does Safegraph have a device level database we could use to track movements from home cbg -> poi -> back to home cbg ? (We could subset only for 2020-21 and prison POIs)
If not, is there a way for us to extract the traffic moving out of the prison POIs to other cbgs- like an inverted version of the places patterns dataset? [ I understand you pointed us to the Neighborhood patterns dataset, but we don’t want to take mobility out from prison CBGs as that would represent an overestimate relative to prison POIs]
The home-device location algorithm was changed in May 2020 - where it began being updated on a daily basis. Would it be possible to get a version of data pre May 2020 period when this new algo is applied, for those earlier dates as well?
This topic was automatically generated from Slack. You can find the original thread here.
TL;DR: in May 2020 we switched to calculating home locations every day on a rolling basis, rather than once per month. The algorithm itself was identical but the cadence was different. Yes, the algorithm would change device_home_areas insofar as the algorithm would be less confident about homes later in the month (so visits/stops from those homes could be underrepresented).
There was a separate change in the Dec 2020 vs July 2021 backfills that is related but ultimately separate. In the Dec 2020 backfill, we allowed the home loc algorithm to “look forward” to gain confidence of home locations, but we received feedback that this made the Dec 2020 backfilled data incompatible with pre-May 2020 data delivered at that time (i.e., data delivered then could not use future data), and reverted to the non-forward looking Home Algo v1 for the July 2021 backfill. That is likely why you see a change between these two backfills. This is described in the “Why does SafeGraph not use Home Algo v2…”section above.
Thanks so much for this info . It’s now clear to me that the Home Loc Algo v2 has never been backfilled to pre-May 2020 data, so I don’t need to worry about that particular change.
I’m sorry to trouble you with a few more questions:
Is this change noted in the Dec 2020 release notes separate from the change you described (“allowed the home loc algorithm to “look forward” to gain confidence of home locations”) between the Dec 2020 and July 2021 backfills:
• “Home Panel Summary (“home_panel_summary.csv”) in Weekly Patterns and Monthly patterns now only includes those devices whose homes are eligible to be counted in the visitor_home_cbgs column.”
If this quoted change is separate from the change you mention, is this new method of determining Home Panel Summary still used in July 2021?
Next, on the page you linked, I read:
“For historical backfills of data before May 2020, a hybrid algorithm is used, rather than back-computing Home Algo v2.
The Hybrid Home Algo for Historical Backfills is applied to the following backfills:
• Backfills of Monthly Patterns from Jan 1 2018 through May 2020”
Which seems to suggest that the forward looking home loc algorithm you refer to (Hybrid Home Algo, as it’s called on the webpage) is the current (July 2021 release) source of home location data for Jan 1 2018 to May 2020.
However, I also read:
“In the Dec 2020 backfill, we worked around this limitation by allowing back-propagated Home Algo v1 to use data “from the future” (e.g., 30 days following the first of the month); however, this ended up being incomparable to previous backfilled data and so we have reverted to using the “standard” Home Algo v1 (no forward-looking) in the July 2021 backfill.”
Which would suggest that the original home loc algorithm is the current (July 2021 release) source of home location data for Jan 1 2018 to May 2020, and the forward looking home loc algorithm (Hybrid Home Algo, as it’s called on the webpage) was only used for the Dec 2020 backfill.
Am I misreading, or do these two quotes seem to contradict each other?
No you’re right to point this out - it is confusing and I’ll clarify the docs as it isn’t the easiest to follow. To be clear, the forward looking home loc algorithm was NOT used in the July 2021 backfill, and so was only used for the Dec 2020 backfill.
> “Home Panel Summary (“home_panel_summary.csv”) in Weekly Patterns and Monthly patterns now only includes those devices whose homes are eligible to be counted in the visitor_home_cbgs column.”
this applies to data from July 2021 as well.
Thanks - this is much clearer now, and the updated docs make more sense to me!
WRT the modified home algo for the 2020 release,
> we received feedback that this made the Dec 2020 backfilled data incompatible with pre-May 2020 data delivered at that time (i.e., data delivered then could not use future data)
I don’t quite understand the compatibility issue here… is the problem that data from different points within the Dec 2020 release are incompatible (eg there is some sort of discontinuity in May 2020?) or is there a compatibility issue between the Dec 2020 release and other releases of the data?
FWIW my analysis is contained entirely within 2018, so as long as observations within year 2018 for the Dec 2020 backfill are mutually compatible I should be fine to use this version?
It’s the latter - compatibility between Dec 2020 release and other releases. i.e., in the Dec 2020 backfill we basically added in many more homes for Jan 2020 than the Jan 2020 data had at that time, and this was undesirable for some folks.
So for your 2018 analysis, as long as you use the same backfill (which are always internally consistent as they use the same logic/polygons), you should be fine.
Hi everyone - thanks for the great discussion! If there’s no further questions at the moment, I’m going to go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to safegraphdata, as we aren’t monitoring old threads at this time. Thanks!