What is the radius around POI that is used to count data?

Hello all! I have another question: what is the radius around POI that is used to count data?


This topic was automatically generated from Slack. You can find the original thread here.

Hey @Angela_Rout_UBC ! Thanks for the question. I’ll resurface this really great conversation from @undefined !

Here’s the link to the original post. I believe what you’re asking about is question #7: Workspace Deleted | Slack

For convenience, here’s the link to the answer:

7. The numbers in that PDF should actually be changed to 80m and 100m, respectively. Those parameters were further tuned after the PDF was published.

With that in mind, pings b/w 80m and 100m do not get included in the current cluster, but they also do not trigger a new cluster. That is, if we receive a ping from someone 70m away from the first ping in the cluster, and then one 90m away, and then another one 75m away, the cluster will include the 70m and 75m pings, and the 90m ping will be discarded.

I’ll also attach a pdf of the technical guide to how SafeGraph handles visit attribution if you want even more details!

The linked conversation is pretty esoteric, about the parameters for our GPS ping clustering algorithm.

It also sounds like you could be asking whether or not there’s a radius around our POI geometries that is used for visit attribution. To be clear, we don’t include a radius in any way, and the details can be described in the white paper above. We have a ML model that attributes visits to POIs, and this model doesn’t include a radius around the POIs.

@Jeff_Ho_SafeGraph great! Thank you! I will review the paper more closely. But it sounds like you are saying that the POI are determined by analyzing the data itself, to find clusters of points and then associating these with a location. Which is very important for statistical analysis because the fact that POI are well visited is not a finding in itself (POI busy-ness is actually the reason they were identified in the first place). We should be careful not to draw conclusions that imply we are using data from a complete, or random sample of POI

No @Angela_Rout_UBC that doesn’t sound right. The POI (the Core and Geometry) information is created independently through our sourcing process. Then the POI information is used as an input to the visit attribution process. i.e., the polygons from Geometry dataset are combined with GPS pings to infer visits.

That is, the GPS pings themselves are not used to determine POI locations.

Ah amazing. thank you for this.