I realize the boundaries may be unique, and I’d expect some variance, but is there any way to determine exactly how far off these counts are from true visits?

Hello all, I’m working on an investigation into park visitation (2018-2020) we have used both the micro and macro normalization techniques, but the numbers don’t seem to be anywhere close to those reported by park management (sometimes up to 6x below manual counts). I realize the boundaries may be unique, and I’d expect some variance, but is there any way to determine exactly how far off these counts are from true visits? I’m afraid to report that parks receive 6 fold fewer visits than they are reporting. That is how they are funded and it wouldn’t go over well. Thanks for any insight you can provide.

cc @Nick_H-K_Seattle_University

That’s concerning! Some thoughts:

  1. Try checking the geography of the POI polygons in the geography files if you have access to them. It’s possible that the POI only refers to a subset of the park, like just a visitors center or something
  2. Macro normalization usually relies on local population values. This works less well for areas that attract lots of tourists and have few people living there, like parks. So you’d want some way of normalizing to the tourist population, not the locals
  3. Be sure to check that the park POIs you have in your data are actually being tracked for patterns and aren’t just present as POIs. If the visits for a particular park are, like, less than 3 every day, there’s a good chance it’s just not actually in the database
  4. In the end, safegraph data is designed first for relative counts (changes over time) not absolute, with absolute counts as a secondary bonus that can be iffy in some scenarios like this one (big outdoor areas, sometimes away from cell towers, with lots of tourists - ack!). If the official count differs, it’s probably more accurate than SG. You can, perhaps, just scale the SafeGraph average to the official average, and use SG to track changes over time in visitation.
1 Like

I agree with Nick. I’ve looked into a few national parks (e.g., Rocky Mountain National Park), and the poi geometry for the park is a small building in Estes Park (nearby town). As a first pass, you may locate all pois inside of the park boundaries, and aggregate visits. In my experience, the trends over time make sense even if the absolute numbers do not.

Thanks for the quick reply. I agree that it is a tall order and I expected some discrepancies. I removed all the sites only surrounding visitor centers by looking at the geometry data. The ones I’ve included have pretty elaborate boundaries around the entire park. It may be a cell phone coverage issue, but I’m not sure how to account for that. Most parks don’t keep good visitation stats, which is why we are doing this research. I would guess that the larger state parks outside of town would under count, based on lack of coverage. Urban parks are probably more reliable. Without having a manual count for comparison at each site, I wouldn’t know how to extrapolate any consistent calculation. Of course, if I had manual counts for every site, I wouldn’t need this data…

Yeah that’s a rough one. Your best bet at that point would be trying to get an absolute count from safegraph itself by estimating the size of the tourist population so you can scale to it.

I haven’t tried this approach yet, but one I’ve had in mind would be to:

  1. For POIs in the surrounding area (ideally wider than the park itself, maybe the county if that isn’t too far away), use the visitor_home_cbg variable. Figure out the best way to deal with the many, many censored values (4 or below).
  2. For the relevant “lives by the park” region, calculate the proportion of visitors to those POIs who are nearby vs. those who are from outside
  3. This will give you an estimate of the ratio of tourists to locals in the area
  4. Multiply that ratio by the Census population in the area to get an estimate of the tourist population
  5. Add the result to the Census population to get an estimate of the total population
  6. Use THAT as your population for macro normalization