Hi, Where can I find the home locations backfill data that was generated by the hybrid algorithm from Jan 1 2020 - May 2020?
Where can I find the home locations backfill data that was generated by the hybrid algorithm from Jan 1 2020 - May 2020?
Hi @Jason_Kao_Columbia_University by home locations data do you mean Core POI or the Patterns data that goes with them?
Hi @Jack_Lindsay_Kraken1, I’m specifically referring to the home panel summary files inside weekly patterns
each of those “01, 02, 03, etc” break down into 4-ish files for each week
Ah just saw it on the Web! Apologies, I hadn’t seen the path on the CLI s3 buckets
let me know if you need anything else!
Gotcha, thank you so much
@Jack_Lindsay_Kraken1 just to clarify: It says on the website that the hybrid backfill was only generated for Jan 2020 - May 2020. But in the bucket, there seems to be a backfilled home panel summary from 2018 til Nov 2020. Was all this data still generated by the hybrid algorithm?
@Jason_Kao_Columbia_University, this was part of the most recent release with December.
> The December 2020 “backfill” restates foot traffic activity from January 1st 2018 - present (Nov. 30th) for Weekly Patterns, Monthly Patterns, and Neighborhood Patterns.
You can find the Changelog here
and to confirm, for home location, this backfill generation is all using the hybrid algo?
I am going to assume so. To my knowledge, it should be the same backfill as its predecessor. I will see if I can get confirmation on that
Great, thank you so much
Hi there, @Jack_Lindsay_Kraken1 thanks for helping us understand what’s in the Safegraph catalog. I’m still a little confused about the difference between, say, “patterns” and “patterns_backfill” (in Weekly Places Patterns). My understanding is that “patterns_backfill” will contain data from the new backfill, but the current “patterns_backfill” has data only for the most recent week of December, so it doesn’t show me how the new backfill affected the data between May and November 2020. Unless it it is the content of “patterns” that is being replaced by data from the new backfill, ie if I download the data for July in “patterns” now, it is different from the data in “patterns” that I downloaded before the backfill? Can you help me understand which is which? Many thanks in advance Jack!
Hi @Etienne_Lale_University_of_Quebec_at_Montreal, to answer this, I think it is important to understand what the backfill does. The backfill, in an oversimplified explanation, serves to correct inevitable errors that arise with the release of any new day, and smooth out the data - and also fill in historical data*. The patterns data is the “raw” data that you get each month -> blemishes and all. As those accumulate, a backfill is created every 6-12 months to help alleviate some of that.
I do not fully understand what you mean by “the current “patterns_backfill” has data only for the most recent week of December” – can you elaborate here?
in sum, yes the 2 are different, and for a good reason hopefully.
Here is some more information from the FAQ:
> Please note that the underlying Places data used to create Patterns changes over time due to the history of how we built and updated the product. Below is a chronological breakdown of the Places release used to backfill Patterns for a given time period:
> Historical Patterns activity from October 2016 through and including December 2016 was generated using the April 2019 release of Places. We no longer externally provide this data.
> Patterns provided/delivered between November 2019 and April 2020:
> – Activity from January 2017 through and including October 2019 was generated using the November 2019 release of Places.
> --Activity from November 2019 through and including April 2020 was based on the Places release of the same month as the activity (so December 2019 activity will use the December 2019 Places release).
> Patterns provided/delivered between May 2020 and November 2020:
> --Activity from January 2018 through and including May 2020 was generated using the May 2020 release of Places.
> --Activity from May 2020 thru and including November 2020 is based on the Places release of the same month as the activity (so June 2020 activity will use the June 2020 Places release).
> Patterns provided/delivered December 2020 onward:
> --Activity from January 2018 through and including December 2020 was generated using the Dec 2020 release of Places. This is the first historical delivery that considers point-in-time POI openings/closures. For example, if a POI opened in January 2019, we will not attribute visits to the POI from January 2018 - December 2018 and will only attribute visits from January 2019 onward. On the other hand, if a POI closed in January 2019, we will only attribute visits from January 2018 - December 2018 and will not attribute visits from January 2019 - present.
> We are relying on the metadata provided by our
tracking_opened_since columns to make these determinations. If we do not have open/close information for a POI, we will treat the POI as “open” for the duration of the backfill. See here for more about how we determine POI openings/closings.
> --Activity from January 2021 onwards will be based on the Places release of the same month as the activity (so January 2021 activity will use the January 2021 Places release).
remember to check the changelog to see exactly what issues were there and amended by some of the backfills!
Theoretically, yes it should look different
slightly* hopefully in a good way
Ok thanks, and thanks for the quick reply! And do you know what is the difference between the content of two folders I highlighted in the creenshot?
I dont understand the question again @Etienne_Lale_University_of_Quebec_at_Montreal, one is regular raw patterns and one is the backfill patterns were discussing. is that what you mean?
Sorry if I’m having troubles understanding this. If the content of the folder “patterns” now contains data that has been generated using the Dec 2020 release of Places (this is my understanding of why data currently in that folder would look different from data that I downloaded before the backfill), then what is the content of the folder “patterns_backfill”? Both folders contain data generated using the Dec 2020 release of Places, right?