I have been working with the data from NAICS code 712190 Nature Parks and Similar Institutions. across 10 US Metropolitan Statistical Areas (MSAs) over 18 weeks in 2021. I am quite surprised by my results and wondering if this is consistent with others. After expanding the visitor_home_cbgs variable to show visit counts from home neighbourhoods I have over 5 million rows of data showing park visits per week, per cbg.
But, if I subset for those visits where the poi_cbg first 11 numbers of the string (as in ,the Census Tract level) is the same as the visitor_home_cbgs first 11 characters, it returns only 474 rows of data. Resulting in only 371 parks, out of over 18,000 in the original dataset. These are visits to parks from CBGs that are within the same census tract, although the reduction in data when matching at the state or county level is similarly extreme. I have also calculated the travel distance between home cbgs and parks in many different iterations, filtering in many different ways, and still find that the distance between home cbg centroid and park centroid is unrealistically high.
Is this simply a challenge with the Safegraph sample, where the data is not capturing typical park visitation patterns? Even though I have an original dataset with cbg’s that are across metropolitan statistical areas, it is hard to believe that the vast majority of them are not only outside the tract, but even outside the state. This is even after filtering out park-cbg weekly visit counts that are under 5, and after normalizing visit counts with panel data.
Are there any recommendations for me? I am interested in knowing who is visiting parks but these results are vastly different from almost every other survey or study results describing travel and park usage, and it feels rather questionable to publish these findings.