Hello, I am studying SafeGraph weekly patterns visit data for parks in Seattle, and am seeing some anomalies. In general I am seeing low, flat visitation for most of 2018 and then some extreme spikes in 2019. For example, this small park (Dahl Playfield) in a residential neighborhood is showing a few hundred visits per week in 2019, >80k visits (and >50k unique visitors) the week of 2019-09-02, and a few thousand visits per week after that. If I extrapolate that 80k visits by SafeGraph’s sampling rate for King County that week, it’s over 1.2 million visits, nearly double the population of Seattle, so that seems impossible. I’m wondering if there are any known data issues that could be causing these anomalies, also seeking advice on how to handle them.
This topic was automatically generated from Slack. You can find the original thread here.
Hi Alex! The polygon in the latest release looks pretty good to me, and if you’re using data from the July 2021 backfill then the same polygon was used throughout history for the visits (so there’s no chance of the polygon being odd just for those few weeks).
I can confirm that 2019-09 is indeed peculiar here, but still trying to figure out why. All of the other columns look fairly normal, so it might be some odd sink of devices that converged at that park. Will report back!
Here are a couple more anomalies I found, that I can’t explain with local events or holidays:
• Virgil Flaim Park placekey zzz-222@undefined x4-4ny-8n5
• Freeway Park placekey zzz-222@undefined x4-4b5-ffz
The fact that a few of these spiked in September 2019 suggests to me that we had some unaccounted-for spike in devices or something at that time in this area. But I did look at our Supplementary Files and don’t see some 4x jump in number_devices_residing that would explain this…
Yeah I was wondering if Freeway Park’s could be anomalous due to motorists driving under it, but that doesn’t quite explain the spike in September 2019. I doubt the traffic on I-5 suddenly got several times worse for just that month.
As for Maclean Park, I’ve been Googling for events there in March 2021 and nothing’s turned up. It’s not the kind of big park where you’d expect a concert or a fair, and especially not in March. And if I extrapolate by SafeGraph’s sampling rate for King County it looks like 800k people visited, more than the population of Seattle
I plotted monthly visits for all POI nearby Dahl Playfield, and noticed that only the POI with a somewhat overlapping polygon (Wedgwood Swim Club) has that spike. All others don’t have anything close to that. This suggests to me that it’s not due to some anomalous bump in devices overall in that one area, as the “visits” aren’t spread out over multiple POI.
I wonder if this is some sink in the anonymized lat/longs pings in these locations, making it seem as if there were many more devices than normal just at these places.
Interesting. In the meantime I think I’m going to remove these points from my analysis (imputing a rolling median or something) because it sounds like they’re probably bad data, I would like to know if you find a root cause though.
Thanks @Jeffrey_Kyllo ! Will certainly keep you updated if we get to the underlying cause. To prevent any further questions from being overlooked, I’ll go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to help, as we aren’t monitoring old threads at this time. Thanks!