Hi everyone, we noticed that many unique placekeys within a sector we’re studying are available in Core Places, but missing in Monthly Patterns (about 20K less). We’re using the Jan. 2021 release of Core and the most recent backfilled Monthly Patterns. Is this dropoff from Places to Patterns due to the inherent difficulty in attributing visitors/visits to POIs compared to identifying locations of POIs?
Hi @Alex_Zentefis_Yale_University, there are a couple reasons this may happen.
- There are simply no visits attributed to those POIs during the month(s) you are looking. This is fairly unlikely to be the cause for all 20,000, but it could be contributing.
- Some of the Placekeys from the most recent backfilled Monthly Patterns might have been updated by the release of Core Places you are using (see this thread from yesterday). This could be a bigger contributor. You should be able to somewhat easily test this theory by merging Core Places with Patterns on the
safegraph_place_id
column instead of theplacekey
column. With Placekey being relatively new (less than a year old), bugs are being worked out and it’s being improved regularly. As time goes on, less Placekeys will need to be updated.
Could you test (2) to see if that’s the issue?
Thank you @Ryan_Kruse_MN_State! That thread you referenced was very enlightening. After reading your post, we tested (2) above, but found the exact same issue, sadly. Of the 71,468 unique placekeys
in our Core Places sample, only 51,369 were found in the backfilled Patterns. Likewise, of the 71,468 unique safegraph_place_id
in our Core Places sample, only 51,369 were found in the backfilled Patterns. In the Jan 2021 Core Release, the placekey
-safegraph_place_id
relation is one-to-one.
Any other thoughts?
@Alex_Zentefis_Yale_University Thank you for checking this. I will take a look at this data tomorrow morning. What category/categories of POIs are you looking at? And which months? And are you using the entire US?
In the meantime, do you notice any geographic patterns in the missing POIs?
Okay thank you! We’re looking at banks, 2018-2020, entire US. We’ll look into the geography of the 20K missing ones to see if we notice any patterns in geography.
@Alex_Zentefis_Yale_University I’ve had a chance to quickly look at the data for Feb 2020. I’m seeing the same thing you are seeing. I’m checking to see if we have the geometries for these locations, then I’ll get back to you again.
Hi @Alex_Zentefis_Yale_University, I checked the geometries for these POIs. Generally geometries come in two classes: SHARED_POLYGON and OWNED_POLYGON. The SHARED_POLYGON class are missing from Patterns much more often than OWNED_POLYGON.
In Feb 2020:
• About 73% (46,705 of 63,753) with OWNED_POLYGON class have Patterns.
• Just 40% (10,602 of 26,415) with SHARED_POLYGON class have Patterns.
I limited this to just Feb 2020 so I could more quickly test what’s going on. When you take more months into account, the coverage rates will increase. It seems the SHARED_POLYGON class is driving most of the missing rows of Patterns. If you want, you could verify this by downloading the Places, Patterns, and Geometries for all the months you are interested in, then checking SHARED vs OWNED like I did.
Here’s a link to the documentation on the polygon_class column in case you need to make any assumptions about the POIs that are missing Patterns data.
Thanks, @Ryan_Kruse_MN_State! This is helpful. I don’t see that we have access to the Geometry data, though.
If the SHARED_POLYGON class is the root of the problem, is there any way to boost the match rate in Patterns? (That is, can we do anything about it?) Or is this reduced match rate because of SHARED_POLYGON just a feature of Patterns?
You’re welcome, @Alex_Zentefis_Yale_University. It looks like the Geometry product is only available by purchase, sorry for the confusion.
I searched through some of the previous Slack posts using the term “shared_polygon” and unfortunately I didn’t see any great workarounds. It seems that those visits often get attributed to the “parent” POI, with no great way of determining which visits were to the specific “child” POI.
Am I right that the visitors would be attributed to the parent or the child, rather than not being attributed to either? Might it then be wise to try to merging by parent OR child placekey/safegraph_id? In the case of banks, my prior is that the parent/child distinction doesn’t apply often, unless if the bank is in a building with other POIs (like a multi-office building). In this latter case, would the idea of merging by parent/child be prone to error?
Hi @Alex_Zentefis_Yale_University, it depends. You can see some details on it in this section of the Places Manual, and the subsequent couple of sections.
When enclosed
= True
, visits are only attributed to the parent. Otherwise, visits are attributed to both the child and parent POIs. I think merging by parent/child could make sense when enclosed
= True
and you’d expect the overall foot traffic to the parent to be reflective of the foot traffic to the child bank… However, that may be messy. I wish I could give you a cleaner answer here.
Okay, thanks a lot @Ryan_Kruse_MN_State. We’ll begin by investigating just how many of our missing POIs have parent_safegraph_place_ID
values. Only then could this parent/child issue be the reason behind the drop in merged POIs from Core to Patterns, right? I’ll get back to you if we learn something relevant. Thanks for your help!
@Alex_Zentefis_Yale_University I think so, yes. Thank you and good luck! Worst case scenario, you would probably have to cut your losses with the missing POIs. Then be aware of any assumptions you need to make based on missing POIs that have a shared polygon.