Distance between home cbgs and parks unrealistically high

Hello

I have been working with the data from NAICS code 712190 Nature Parks and Similar Institutions. across 10 US Metropolitan Statistical Areas (MSAs) over 18 weeks in 2021. I am quite surprised by my results and wondering if this is consistent with others. After expanding the visitor_home_cbgs variable to show visit counts from home neighbourhoods I have over 5 million rows of data showing park visits per week, per cbg.

But, if I subset for those visits where the poi_cbg first 11 numbers of the string (as in ,the Census Tract level) is the same as the visitor_home_cbgs first 11 characters, it returns only 474 rows of data. Resulting in only 371 parks, out of over 18,000 in the original dataset. These are visits to parks from CBGs that are within the same census tract, although the reduction in data when matching at the state or county level is similarly extreme. I have also calculated the travel distance between home cbgs and parks in many different iterations, filtering in many different ways, and still find that the distance between home cbg centroid and park centroid is unrealistically high.

Is this simply a challenge with the Safegraph sample, where the data is not capturing typical park visitation patterns? Even though I have an original dataset with cbg’s that are across metropolitan statistical areas, it is hard to believe that the vast majority of them are not only outside the tract, but even outside the state. This is even after filtering out park-cbg weekly visit counts that are under 5, and after normalizing visit counts with panel data.

Are there any recommendations for me? I am interested in knowing who is visiting parks but these results are vastly different from almost every other survey or study results describing travel and park usage, and it feels rather questionable to publish these findings.

Thank you!

Hello all.

I am just posting here to let readers know that after a week of pulling out my hair trying to figure out why these results are so unrealistic, I found the error in the code. I am just posting here in case other have the same issue.

The problem occurred when I expanded the visitor_home_cbgs column using the function from the safegraphR package expand_cat_json(). I did not realize that I had the setting “na.rm=TRUE” which was resulting in the rows with empty curly brackets being ignored, and thus incorrect rows were joined with the original dataframe. I realize this is a rooky mistake and I should have caught it earlier, but so be it with coding. Hopefully my mistake will help others in the same boat.

After ensuring the correct rows were joining after expanding the JSON format, my data looks much closer to what I expected, with a good number of park visits occurring within the block and tract levels. Thank you to those who reached out to help me with this.

Keep in mind that what you observed initially might not be anomalous. As you expand the circle around a point, the area (and hence the number of potential visitors) grows quadratically, so if you restrict to a small area around them the numbers would be small.

1 Like