I was looking to see the changes over time in the number of devices which SafeGraph is collecting data from and it looked like the monthly versions of home_panel_summary.csv were the correct files to use. I wanted to try and scale the neighbourhood patterns data to reflect changes over time in the number of devices data is being gathered from. However, there is a large drop in the number of devices after March 2020. Does the fact people weren’t leaving their house regularly stop the devices as being registered as being active at all? Is there a better way I could do this scaling? I think some scaling is necessary because for example it looks like in early 2018 there are a lot less devices and thus footfall is mechanically lower.
Does the fact people weren’t leaving their house regularly stop the devices as being registered as being active at all? Is there a better way I could do this scaling?
I just realised that “Neighbourhood Patterns” has different home panel summary files to the “Monthly Places Patterns” home panel summary files and the numbers of devices seem to be different. Do the different data sets use different pools of devices?
Hi @Priyanka_Goonetilleke_University_of_Pennsylvania, getting some information together to send over on these great questions! Sorry for the delay in response.
Hi @Priyanka_Goonetilleke_University_of_Pennsylvania, hopefully I can clear some of this up
They do have different values, but they are not from different sources/pools.
Here is a similar thread that recently came up
You are correct, scaling is most certainly required -
This post from December might be of use in your Scaling quest - Workspace Deleted | Slack
This link to SafeGraphs normalization will help with comparison - SafeGraph | Blog
and for actually scaling in terms of predictions you will need the CBG data - here is a thread that may help there - Workspace Deleted | Slack
To check your scaling, you can use large registered events like sporting events to check your work - Workspace Deleted | Slack
I know I am throwing a lot at you here - if you still have questions after this, please let me know and we can dive more in depth into your use case! Also please let me know if I missed anything!
Thanks so much for the detailed response. That was all very helpful. I do have a question from reading those threads though. Abigail mentioned a v2.0 vs v2.1 issue with baselines. Is this something I need to worry about when using the Neighbourhood Patterns data? Or it is specific to another data set?
“(3) Due to the data product change to v2.0 during Jan1-May 9, 2020, should we assume that we cannot get indices of change for data during this time period, if defining the baseline as coming from a similar day in 2019? That is, is it not possible to create indices of change that compare data coming from v2.0 and from v2.1 time periods?”
@Priyanka_Goonetilleke_University_of_Pennsylvania that is in reference to the SDM (Social Distancing Metrics) data that I don’t believe you are using
Awesome, yes I am only looking at Neighborhood Patterns at the moment. Thanks for clarifying!
No problem! Please don’t hesitate to ask follow up questions!
Actually one last follow up. I am looking at the home_panel_summary which comes with the monthly patterns. It doesn’t show as much fluctuation as the Monthly Places Patterns one did. However, the sample size still fluctuates quite a bit. Is that expected? I was anticipating a gradual increase in sample size over time. I am aggregating all the cbgs in Philadelphia. The sample size goes from 12.6% to 9.4% of the Philadelphia population going from Jan 2020 to Feb 2020 which seems incorrect since scaling the neighbourhood patterns data using this leads to a huge jump in “stops” in Feb 2020.
I am not sure what the monthly places patterns is that you are referring to, could you elaborate?
Also, the sample size is not a gradual increase, it is somewhat random since SafeGraph picks up and drops partners fairly often. does that make sense?
Ah okay that makes sense. In that case can I ask is the number of devices (number_devices_residing) in the home_panel_summary.csv is the number of devices which are observed at least once in the given month? ie if you drop a partner on the second day of the month the effect is only really seen in the next month (since many of the devices which use the partner should have been registered active on the first day of the month)?
When I referenced monthly places patterns I had meant the home_panel_summary.csv which is located with the Monthly Places Patterns (aka “Patterns”) data. If I’m using the wrong name feel free to ignore that reference, it was tangential as I’m now specifically focussed on neighborhood patterns.
I am sure you have probably already seen this, but this will offer a bit more insight into neighborhood patterns - SafeGraph | Blog
To your question, according to the documentation -
> Number of distinct devices observed with a primary nighttime location in the specified census block group.
The problem then becomes how are nighttime locations determined
Which you can read more about HERE
Does this help?