Is this a data cleaning and aggregation issue, or do you expect that the raw data you collect from these sites might be inaccurate?

Hi! Brand new here. My non-profit organization’s goal was to analyze how visits (total visitors, median visit time, etc.) to public parks in Wisconsin changed in regard to the onset of the pandemic. We noticed that while some parks (vast majority are sub_category “Nature Parks and Other Similar Institutions” ) have data that seems accurate to us, others are completely off – for example, a popular downtown Madison park shows zero visits in 10 of the 36 months from Jan. 2018 to Dec. 2020. A few questions: 1) Is this a data cleaning and aggregation issue, or do you expect that the raw data you collect from these sites might be inaccurate? If the issue is inaccuracy, do you know why that might be, and might this be a systematic issue that we could correct with a reasonable degree of confidence? 2) Is it possible some of these locations have accurate data and others do not? If so, would it be possible to pull data only from the sites that are accurate, or is that not possible to determine? 3) Would there be another way to get at our main goal – figuring out what’s been happening with visits to public parks since March of 2020 vs. what was happening previously – using Safegraph data? Thanks!

This is a great use-case! Thanks so much for sharing.

Have you tried using Geometry data to check:

  1. Is the polygon shared?
  2. Is the polygon accurate?
    Separately, have you tried applying outlier filtering methods?

Hope this helps!

Here’s a really useful thread I just came across: Workspace Deleted | Slack