Is there something I am missing in terms of the normalisation?

Priyanka_Goonetilleke_University_of_Pennsylvania · September 1, 2021, 1:53pm

Hi, I am using the neighbourhood patters dataset and am trying to generate normalised mobility for Cook County. I am using the Home Panel and aggregating the number_devices_residing in each relevant CBG to get the total number of devices in the county for each month. I then take (population of Cook County)/(monthly devices residing) to get the scaling factor for each month. I aggregate the number of stops by day across the relevant CBGs from the neighbourhood patterns dataset and generate the normalised stops as: normalised_stops = stops_per_day*scaling_factor_relevant_month. However, when I look at the normalised stops it looks like mobility in Cook county in June-July 2020 was actually higher than in June-July 2019 which cannot be correct. It seems to be driven by the fact that the number of devices is so much lower in 2020 so that my scaling factor goes from ~7 in June/July 2019 to ~10 in June/July 2020. Is there something I am missing in terms of the normalisation?

This topic was automatically generated from Slack. You can find the original thread here.

Niki_Kaz · September 1, 2021, 1:53pm

Hi Thanks for reaching out! We are looking into it and will get back to you once we have an answer.

Niki_Kaz · September 1, 2021, 1:53pm

Hey - just an update. I’m looping one of members from the Product team to look further into the issue. We’ll keep you posted as we find out more. Thanks!

Priyanka_Goonetilleke_University_of_Pennsylvania · September 1, 2021, 1:53pm

Thanks

Jeff_Ho_SafeGraph · September 1, 2021, 1:53pm

Jeff Ho (SafeGraph) : Hi Priyanka! the scaling factor increasing during 2020 is an expected behavior since number_devices_residing tended to decrease during the pandemic. See for example a similar state-multiplier plot from our Normalization notebook (image below).

What that implies is that the proportion of devices in Cook county with stops was indeed higher during the pandemic than before. This makes sense once you consider that “stops” in Neighborhood Patterns is different than POI visits in regular Patterns.

“Stops” includes stops at home (whereas POI visits does not). From our docs:

Number of stops by devices in our panel to this area during the date range. A stop must have a minimum duration of 1 minute to be included. The count includes stops by devices whose home area is the same as this area.

If you still think that this could be erroneous based on your knowledge of the specific CBG, then I would recommend trying a different normalization approach (see the notebook) and comparing to any knowledge/heuristics you may have about your application.

Hope that’s helpful!

Niki_Kaz · September 1, 2021, 1:53pm

Hey ! Just confirming that we answered your question. I’m going to go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to safegraphdata, as we aren’t monitoring old threads at this time. Thanks!