I noticed that there is an important selection bias in the device_count data, respect to the real number of individuals who live in each census block, I use to fix it assuming that the proportions remain the same in order to recover a balance stratification sampling. But still I guess some bias remains. Do we have a better solution to such issue?
it was an error
Hi @Fabio_VANNI, the Data Science Resources have some info on how you might correct some of this bias.
I would also suggest you take a look at this paper/presentation, which finds that SafeGraph’s panel tends to underrepresent minorities and older people. The work was featured in a past webinar, much like the one you gave today.
The approach we think works best currently is micronormalization by home CBG. This doc is listed under the Data Science Resources I linked to above. Essentially, we scale the number of visitors to each POI from each CBG in visitor_home_cbgs
by the CBG’s Census population divided by the count of devices in SafeGraph’s panel that we identify as living in that CBG (home_panel_summary.number_devices_residing
). Is this helpful?
yes