Want to get monthly park visit data from block groups for all of 2019. I noticed that the Monthly Patterns data doesn’t have a naics code column in it

Michael_Esposito_UMich · April 7, 2021, 12:00am

Hey all, one other (hopefully!) quick questions. I want to get monthly park visit data from block groups for all of 2019. I noticed that the Monthly Patterns data doesn’t have a naics code column in it. Does that mean that I need to: (a) download the Core Places US (pre-2020) file; (b) join the this to the Monthly Patterns; and (3) filter down to nature sites? Or is there a way to accomplish this just using what’s available in the Patterns set?

Jude_Bayham_Colorado_State_U · April 7, 2021, 10:41pm

I believe you need to join patterns with core.

Jack_Lindsay_Kraken1 · April 8, 2021, 11:13pm

@Jude_Bayham_Colorado_State_U is correct. You need to get:

the POI you want from the patterns data
filter the CORE down to those SafeGraph Place IDs or Placekeys
merge the 2 on either SGPID or Placekey
drop columns you don’t need
Alternatively you can just filter pattern to POI you want and inner join on the aforementioned columns to CORE.

let me know if you need any help with these, if you are using Python or R we have libraries designed to make the process a bit easier!

Michael_Esposito_UMich · April 12, 2021, 6:40pm

Thanks folks! @Jack_Lindsay_Kraken1 I’m using R; I’d love to see what you all have available. In the meantime, to double check my understanding, I would need the to use the “Core Places US, Pre-Nov 2020” to ID parks in 2019, yeah? (Or, alterativelty, is the “Core Places US, Nov 2020” a cumulative list, such that: (1) every park that ever appeared in the data is listed in this frame; and (2) there’s no variation in naics categorization–re: all places coded as parks and/or rec centers in the pre Nov-2020 set are also coded as parks and/or rec centers in the post Nov-2020 set)

Jack_Lindsay_Kraken1 · April 13, 2021, 4:40am

@Michael_Esposito_UMich here is the link to SafeGraphR - GitHub - SafeGraphInc/SafeGraphR: R code for common, repeatable data wrangling and analysis of SafeGraph data

To your question about which Core to use. We always recommend that you use the most recent version of Core - from month to month, the Core data is updated to make the POI more accurate. That is to say, the Core file works as an evergrowing master list of POIs. However, if you monitor the change log you will notice POI are added and dropped all the time, whether it be SGPID churn or data bugs or whatever. If you have specific traffic in something like monthly patterns from 2019 (not the backfilled) that you really really want and you don’t care if it may have been removed for a good reason, you can use the core that corresponds to that pattern file (i.e. the one that was released at the same time). That will result in a 100% match rate when you merge.

In my opinion, if you arent missing one or 2 specific POI that are imperative to your research, I would always use the most recent Core data in the interest of using the cleanest and most bug free data - even if it means you may be missing a few POI

To add to that, if you use the backfill data, the loss should be minimal.

I know this is a lot of information delivered in a seemingly unorganized manner, so please feel free to ask follow up questions and let me know anything you are not sure about.