Questions about normalized_visits_ by_state_scaling

Hi, I would like to compare the overall POI visit counts among US counties. The Patterns dataset includes normalized_visits_by_state_scaling which says the data is scaled using the mobile device sampling rate for the state in which the POI is located. I wonder if this is calculated by raw visits divided by the states’ sampling rates? For comparing county-level data, I think normalization should consider the county population while sampling rates can also be considered here. So I feel the complete normalization could be raw_visits/(population*sampling rate). Is this a sound logic based on your data? Also, is there any way I can access, or compute normalized visits by county scaling by myself?

Yes, that’s how that column is computed. See this blog post or Approach 1 here.

You are free to try to use the county sampling rates as well. Follow the colab notebook above but replace the state-level aggregation with the county-level aggregation (both are contained within the poi_cbg column). If you do this, I’d be interested in seeing a comparison of the upscaled data after using state- vs county-level data!

Hi Jeff thanks for your reply. Do you mean the second “Apporach 1 here” is the colab notebook? I did not find the concrete data for county/state sampling rates.

The heading “Approach 1” inside the colab notebook:

It shows you how to calculate the device sampling rates per geography, with state as the geography used in the example.

Hi Jeff, yes, can you please also direct me to the files about the county/state sampling rates used by the SafeGraph Pattern dataset?

We don’t provide explicit files that have those sampling rates - you must calculate those manually using the notebook provided.

Okay thank you.

Hi Jeff, I have a follow-up question here, just to make sure I am handling the data in a correct way. I would like to normalize the raw visit counts from the monthly pattern dataset by county. I summarized the residing devices of CBGs to counties. I suppose I can get results as follows:

sampling rates = county-level number of devices/county population
normalized_visits_ by_county_scaling = raw visit counts/sampling rates.
Then, I can still divide the normalized_visits_ by_county_scaling data by county population to get visit counts per person.

Do those sound correct?

Hi @KANGLIN_CHEN_University_of_Florida , what you have described looks correct to me.

For more context on the evolution of normalization techniques used with SafeGraph data, this notebook may be interesting. Please let me know if you have any other questions!

Thank you Ryan!