Representation of SG Open Census Data

Hey all,

I have a question about the visits from the CBGs. When the visitor_home_cbgs column is exploded to show how many visits come from each census block group, we can then merge SafeGraph’s Open Census data onto the dataset to connect information about income, employment and more.

I have a couple questions:

  1. If there are 4 visits from a single CBG, and 20 raw visits, why do the values differ? Shouldn’t the visitor_home_cbgs add up to the raw visits or raw visitors?
  2. If a CBG is 60% high income and 40% low income (arbitrarily and hypothetically) and there are 4 visits from a single CBG to a POI, how do we know if the visitors are from the lower income or upper income group? Do we assume random sampling from the CBG or is there a better method for understanding the demography of who visits a POI?
  3. If there are 4 visits each from 3 CBGs to a POI (12 total) and the raw visits is not 12, how do we determine know the distribution of CBG visits?

Any and all help answering these questions would be greatly appreciated!