Quick question about the construction of the Neighborhood Patterns dataset;
- Is it fair to say that the algorithm described in the attribution whitepaper for Monthly/Weekly Patterns is the same used for the Neighborhood Patterns database with the exception that instead of POIs, Neighborhood Patterns matches clusters of pings to CBGs, and instead of visits, Neighborhood Patterns reports stops (any dwell time greater than one minute)?
- If so, does the construction of Neighborhood Patterns remove “any sequence of pings that are too linear over a long enough period of time, and those that appear to be travelling too quickly” (Attribution Whitepaper, p 10)? That is, does Neighborhood Patterns include driving or does it filter it out?
Hi @Kamen_Velichkvo_The_Wharton_School, thanks for the questions. We’ll get back to you soon!
Hi @Kamen_Velichkvo_The_Wharton_School, just wanted to follow up and let you know we’ll have an answer for you soon. Sorry for the delay!
Hi @Ryan_Kruse_MN_State, thanks for the replies! Apologies for the follow up, but I was just wondering whether you, @Spencer_Vail_SafeGraph, or anyone else at the SafeGraph team had a chance to chime in about the coverage and construction of Neighborhood Patterns (i.e. the two questions above)?
Hi @Kamen_Velichkvo_The_Wharton_School, sorry again for the delay. I’m waiting on a final confirmation from the product team, but the (tentative) answers to your questions are yes and yes.
- The underlying algorithms are the same, we just assign “stops” according to the CBG’s boundaries instead of “visits” to a POI’s boundaries.
- Similarly, any driving behavior is filtered out. I believe that we also attempt to filter out any walking behavior, but I can’t confirm that until I hear back from the product team. In the recently deprecated Social Distancing Metrics product, we attempted to filter out walking behavior. As touched on in this post, it can be difficult because of the short 1 minute cluster threshold, so there would be some false positives associated with walking or even driving behavior.
Thanks for the response; this is really helpful. Interested to hear what is the final verdict on 2).
I was also curious about the sample size of Neighborhood Patterns. Is it the same 45 million devices used in the other SafeGraph products?
@Kamen_Velichkvo_The_Wharton_School Okay I have some resolution. The same algorithm used to filter out driving would also be used to filter out walking. As I mentioned, it is prone to false-positive errors based on the cluster of pings formed by walking, so it’s not exact of course.
The products are based on the same sample of devices. The logic used to process and aggregate the pings into each product is different.
Is this info helpful?
Hi @Kamen_Velichkvo_The_Wharton_School, just following up to make sure you saw the above message