Good morning, SafeGraph Team! Is anyone aware of what percentage of the POIs are accounted for in the Core Places dataset, specifically the Core Places Indianapolis, IN, USA dataset which I ordered/downloaded on June 28, 2021?
Also, what determines if a business/institution/establishment makes it into the CorPlaces datasets? Maybe the owners of the building must reach out to SafeGraph to have their POI assigned a placekey and then added to CorePlaces? Or maybe SafeGraph obtains POI locations from another data source (e.g., GoogleMaps) and populated Core Places that way? I have a feeling answers to my questions may be found on SafeGraph’s website as I’m sure others have wondered the same thing, but I was not able to find any post or article explaining this. With that being said, if a post or article does already exist, please direct me towards it. I just need to be able to explain the limitations of the SafeGraph data (i.e., what Indianapolis POIs are and are not included in my dataset) in my thesis.
Thanks a bunch!
This topic was automatically generated from Slack. You can find the original thread here.
Hey - thanks for the questions. Happy to take a stab at these. For more specifics, let me know, and I’ll be sure to loop in our product team.
For your first question, it might be helpful to reference the Places Summary Statistics found here. It’ll give you a better sense of SafeGraph’s coverage in Core Places. One thing I should mention is that it won’t get down to the granularity of Indianapolis, IN. It aggregates to the countries we have data on.
For your second question, there’s a few resources you might find helpful. On a high level, here is how SafeGraph sources its Core Places data (taken from this blog post here):
Crawling open store locators on the web (ex. crawling a brand's website that lists where all of its stores are)
Using publicly available APIs and crawling open web domains that provide updated locations for a specific category of POIs (ex. websites that list where all airports are)
Processing and modeling to infer additional attributes (ex. inferring what category a POI is)
Licensing third-party data to fill in the gaps
Hey Arianna. Glad to hear those resources were helpful. As a follow-up to your first question, we aim to be as close to 100% coverage for POIs. For certain categories, like restaurants, we have great coverage. However, there’s other categories where our coverage might be weaker. While it may not be the answer you were hoping for, we’re not able to provide a percentage of how many POIs we’re missing in our datasets in Indianapolis, IN. To get a better idea, you could research a separate list of POIs in Indianapolis and cross-check it against the data you pulled from SafeGraph and get an approximation.
Hey Arianna – to prevent any further questions from being overlooked, I’ll go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to help, as we aren’t monitoring old threads at this time. Thanks!