The backfill is here! Here’s your guide on how to use the July 2021 Backfill Data and what to expect as you dive in !
As a reminder, the backfill is when we take our most recent version of Places (i.e., Core + Geometry) and run our visit attribution algorithm backward in time to generate a new history of “backfilled” Patterns. It happens no more than twice a year, most commonly in July and in December as updates to Places warrant.
This backfill covers Jan 2019 to present. A second delivery of Jan 2018 to Dec 2018 US Backfilled Patterns will be completed by the end of July or early August.
Table of Contents:
- Specific improvements to Places you will see in the July 2021 Backfill
- What to expect in the US Backfill
- What to expect in the Canada Backfill (Canada normalization example)
- Other changes you may notice
- Issues and Artifacts
Specific improvements to Places you will see in the July 2021 Backfill
-
New POIs added. We added new industrial POIs in 2021, so visits to these POIs would previously only start appearing in Patterns data from the release they were added. Of course, visits to these POIs occurred historically as well; hence we “backfill” visits to these new POIs prior to 2021 in order to get proper full histories.
-
Improved Geometry info. As of June 2021, we fixed a number of geometry issues, specifically to ultra prominent POIs like Disney World and also generally across SafeGraph places. These will reflect in more accurate foot traffic data in Patterns.
What to expect in the US Backfill
In general, you should expect to see the same trends as in the previous backfill, but with slightly increased visit numbers overall based on the types of changes above. See below, for example, a time series comparison of the old (Dec 2020) and the new (July 2021) backfill, showing total visits to all POIs:
Daily total POI visits from the July 2021 and Dec 2020 backfills show similar trends overall.
Visits to the vast majority of POI have remained stable . However, there may be more subtle changes (increasing or decreasing visits) when aggregating visits to POIs of specific brands or specific NAICS codes. For instance, we observed a large increase to Nature Parks and Other Similar Institutions ( naics_code
= 712190) in this backfill due to improved Core and Geometry information.
What to expect in the Canada Backfill
This is the first year we’ve added Canada to the backfill! We added Canada Weekly Patterns data starting in the May 2021 release, and have now backfilled data for Canadian POIs back to January 2019. If you are interested in Canada Weekly Patterns data, reach out to your customer success manager for a sample.
It is important to normalize historical Canada patterns data when using 2020 data . This is because the size of our historical Canadian panel has changed in important ways compared to 2020 in our historical US panel.
Relative to Jan 2019, our panel underwent large increases around Jan 2020, and for most of of 2020, before returning to a more consistent level in 2021:
Daily Total Devices Seen for Canada, indexed on Jan 2019, showing variation in panel size historically.
Per year on average, we saw the following number of devices daily:
Year
Average Daily Devices Seen
2019
328k
2020
542k
2021
362k
Because of these changes to the Canadian panel, we encourage users to experiment with various normalization techniques like we do in the US. For inspiration, see our Data Science Resources, particularly our most recent Google Colab notebook on normalization.
Canada normalization example
To drive home this point, when summing all visits to Starbucks POIs in Canada , one can produce very different time series when dividing by total devices seen (from the normalization_stats.csv
Supplemental Files) or by total POI visits (computed as total visits minus home visits, also from normalization_stats.csv
):
Starbucks visits normalized by total devices seen from normalization_stats.csv. Values show relative change from Jan 2019.
Starbucks visits normalized by total POI visits computed from normalization_stats.csv. Values show relative change from Jan 2019.
Overall, this means that you should be incorporating the domain knowledge of your own specific application when using Canada Patterns data to compare 2020 data to 2019 or 2021, particularly when using raw_visit counts
.
Other changes you may notice
- This backfill incorporates all of the schema changes from the July 2021 release, meaning
safegraph_place_id
andparent_safegraph_place_id
are retired. - It also means that changes in the way columns are computed, such as to
related_same_day_brand
, are also reflected. - We have also released a backfill of Neighborhood Patterns, primarily fixing a bug with certain columns such as
weekday_device_home_areas
andweekend_device_home_areas
which caused those columns to be lower than expected. - See the July 2021 Release Notes for all Patterns schema changes.
Issues and Artifacts
If you notice any issues with backfilled data, please reach out! While we have done our best to QA the data thoroughly and squash as many bugs as possible, inconsistencies can always creep in. See also Known Issues and Artifacts on our Docs site.