Hello all, I’m confused by how the most recent backfill is handling POI closures. The documentation indicates that this backfill (unlike prior backfills) is not wiping out visits of closed POIs nor filling in prior visits of just opened POIs. But, I’m seeing that there are never any visits data in the backfill directory of any POIs that have a populated “closed_on” variable. (I’ve been focusing on restaurants—NAICS 722511-- but I think the point is general.)
To be specific, here are four POIs where we have visits data for the previous version of the data, but no visits at all in the most recent backfill. Each of these is listed in the CORE as having a “closed_on” date of “2020-07”. My understanding is that this wasn’t supposed to happen.
we shouldn’t be wiping out visits from places in the months that they’re not closed. are you using weekly or monthly patterns, and I’m assuming that you’re interested in raw_visit_counts, right?
also, what months are you looking at? it’s possible that some of these locations were actually receiving very few to no visitors
Hey Francisco. I’m using weekly visits from the December backfill, and comparing that to what I was recording for these safegraph POIs before the backfill. These four POIs tended to have pretty consistent weekly raw visit counts in the old data, but I don’t see any visits for them at all in 2020/2019 data for the backfill.
Hi Aaron. I looked into it yesterday and it does seem to be an issue here. I’m escalating to engineering and will give you an answer as soon as possible. Thanks again for sharing this issue.
Hey @Francisco_Utrera, any update on this issue? Thanks!
Engineering is aware of this and we’re digging deeper into it. I’ll give you an update late next week since it’s not so easy to triage this one
I just realized that I can give you more context, Aaron.
First, we don’t see a bug in the way that the closed_on filter was implemented. I checked the specific sg_pid’s and they’re entirely missing in the backfill, so it’s also not related to closed_on. However, the real questions remains: why are these visits missing. There could be a lot of reasons, and going through it takes time.
Thanks again for letting us know about this and I’ll get back to you late next week
Hi Aaron. We won’t be able to dig deeper into this until the next backfill (~Q2). The cost of a backfill plus not seeing this effect across other NAICS codes are the two reasons why we’re de-prioritizing until the next backfill.
Thanks again for letting us know about this and have a great weekend!
Hmm, ok. I have a temporary patch that I’m using, and so I guess I’ll just have to continue using that. So you can confirm that there is no “lost” POI-visits in other NAICS categories? I’ll need to know if I should be developing similar patches when looking at other industries.
Yes, there are few “lost” POI-visits in other NAICS categories. After manually implementing the filter myself on the pre-backfill data, I see an increase in visits and visitors relative to the backfill