I noticed a lot of POIs in the monthly data would get 0 visitors. Is that an accurate reflection of how many visitors that POI got in reality? How accurate does SafeGraph data represent the general population?

For the SafeGraph data, I noticed alot of POIs in the monthly data would get 0 visitors. Is that an accurate reflection of how many visitors that POI got in reality? How accurate does SafeGraph data represent the general population?


This topic was automatically generated from Slack. You can find the original thread here.

Hey - that’s a great question!

Keep in mind that our panel of devices is a subset of the whole US population. Every single visit to the POIs you’re looking at will not show up in our data, because not all devices are included in SafeGraph’s panel. To give you a better idea of our coverage, our Patterns data aggregates from approximately 10% of devices in the US. This of course introduces some sampling error in our datasets. Have you checked out this Google Colab Notebook on bias in the SafeGraph data? Think you might find this resource very helpful!

Have you also checked out some of our Data Science Resources on normalization? This will help you more accurately estimate “truth” counts from the Patterns dataset.

I know we’ve had some similar conversations in the past from other members. You might check out a few of these previous threads.

Hey @U01NP0J22LB and @U01531MUHK7! Anything you might add to Irwin’s question? Do either of you also have any tips for normalizing our Patterns dataset? I’m sure Irwin would appreciate any advice you might have!

Not necessarily. Remember that SafeGraph captures about 10% of devices, not necessarily 10% of the population. Again, to get a better sense of “truth” visits, I’d recommend working through this Google Colab notebook. If you need additional guidance on stepping through this, I’d recommend connecting with our Community Data Scientists over Zoom to ask any questions. Let me know if that’s something you would find helpful - happy to get you connected!

Another reason it wouldn’t be feasible to just multiply visits by 10 is that the coverage varies by POI. For individual POIs, our coverage of the the true number of visits varies based on a number of factors, but it’s hard to know the “true” coverage rate ahead of time.

This refers to fast food restaurants, for example, Burger King or McDonald’s, this website is really good for digging into naics codes (and their hierarchy) plus examples of business. Note that sub category refers to the 4 digit naics code 7225 (broader), and category refers to the 6 digit naics code 722513 (more specific) NAICS Code: 722513 Limited-Service Restaurants | NAICS Association