I’m wondering if there are any known data issues that could be causing these anomalies, also seeking advice on how to handle them

Hello, I am studying SafeGraph weekly patterns visit data for parks in Seattle, and am seeing some anomalies. In general I am seeing low, flat visitation for most of 2018 and then some extreme spikes in 2019. For example, this small park (Dahl Playfield) in a residential neighborhood is showing a few hundred visits per week in 2019, >80k visits (and >50k unique visitors) the week of 2019-09-02, and a few thousand visits per week after that. If I extrapolate that 80k visits by SafeGraph’s sampling rate for King County that week, it’s over 1.2 million visits, nearly double the population of Seattle, so that seems impossible. I’m wondering if there are any known data issues that could be causing these anomalies, also seeking advice on how to handle them.


This topic was automatically generated from Slack. You can find the original thread here.

Hey @Jeffrey_Kyllo - thanks for flagging! Could be an issue with the Geometries for this POI. Looping some other team members into this. Will update later when I get more information.

Thanks, I will try plotting the polygon for this park on a map and see if it looks correct

Hi Alex! The polygon in the latest release looks pretty good to me, and if you’re using data from the July 2021 backfill then the same polygon was used throughout history for the visits (so there’s no chance of the polygon being odd just for those few weeks).

I can confirm that 2019-09 is indeed peculiar here, but still trying to figure out why. All of the other columns look fairly normal, so it might be some odd sink of devices that converged at that park. Will report back!

Thanks Jeff. I think I have a few more examples of parks POIs in Seattle with anomalies like this, would you like me to send a list?

Yes please, if you can!

Here are a couple more anomalies I found, that I can’t explain with local events or holidays:
• Virgil Flaim Park placekey zzz-222@undefined x4-4ny-8n5
• Freeway Park placekey zzz-222@undefined x4-4b5-ffz

This one seems suspicious too, 30k visits in one week in March 2021 to this little park: Maclean Park, placekey zzw-222@undefined x4-49x-pd9

Hi @Jeffrey_Kyllo Jumping in here as the geometry PM with a few thoughts.

Virgil Flaim Park: Nothing jumps out - still investigating

Freeway Park: It looks like it actually straddles a highway overpass, so when there is traffic (and cars stop on the overpass), then these are unfortunately counted as visits.

Maclean: Is it possible there was a concert or an event held at the park if it only spiked for a single week?

The fact that a few of these spiked in September 2019 suggests to me that we had some unaccounted-for spike in devices or something at that time in this area. But I did look at our Supplementary Files and don’t see some 4x jump in number_devices_residing that would explain this…

Yeah I was wondering if Freeway Park’s could be anomalous due to motorists driving under it, but that doesn’t quite explain the spike in September 2019. I doubt the traffic on I-5 suddenly got several times worse for just that month.

As for Maclean Park, I’ve been Googling for events there in March 2021 and nothing’s turned up. It’s not the kind of big park where you’d expect a concert or a fair, and especially not in March. And if I extrapolate by SafeGraph’s sampling rate for King County it looks like 800k people visited, more than the population of Seattle

So it seems highly likely to me that the MacLean Park spike is bad data somehow

I plotted monthly visits for all POI nearby Dahl Playfield, and noticed that only the POI with a somewhat overlapping polygon (Wedgwood Swim Club) has that spike. All others don’t have anything close to that. This suggests to me that it’s not due to some anomalous bump in devices overall in that one area, as the “visits” aren’t spread out over multiple POI.

I wonder if this is some sink in the anonymized lat/longs pings in these locations, making it seem as if there were many more devices than normal just at these places. :thinking_face:

Interesting. In the meantime I think I’m going to remove these points from my analysis (imputing a rolling median or something) because it sounds like they’re probably bad data, I would like to know if you find a root cause though.

I think that’s prudent Alex. We can update here if we find the underlying cause!

Thanks @Jeffrey_Kyllo ! Will certainly keep you updated if we get to the underlying cause. To prevent any further questions from being overlooked, I’ll go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to help, as we aren’t monitoring old threads at this time. Thanks!

For now, I’ve noted this discussion in our Known Issues or Data Artifacts page.