So I am trying to study the demographics of the type of people who travel to my city. I have exploded the visitor_home_cbgs column for every poi in my study, and normalized those visitor counts at the CBG scale, then I merged that dataframe with the median household income column from the Open Census data.
From this I have created various histograms with median_household_income on the x axis, and visitor counts on the y axis. Note: every count on the histogram corresponds to one visitor from my adjusted visitor count column. These histograms therefore give a visualization of the estimation of the population count of all visitors to ALL POIS in my study area, over my study period.
My question is: How accurate can these histograms be, if one person walks into a poi, gets counted as a unique visitor to that poi, then walks down the street into another poi, and then gets counted again as a unique visitor to that poi, creating a total of 2 visitors in 2 different pois visitor_home_cbgs column.
If I was able to look at my entire city as one POI, we would see an individual from CBG ‘x’ enter a store, get counted as a unique visitor, then walk into any other amounts of stores, and still only count in the histogram as one visitor.
What sort of limitations, or bias, or further data manipulation can I do to my dataframe to potentially account for these duplicate visitor counts to multiple pois.