However we find the number of visits a bit low. Is this possible?

For our project we have decided to filter patterns data of Houston and only Subway brand.
Our goal is to predict the number of visits of a brand. He have normalised the data and added new columns.
However we find the number of visits a bit low. Is this possible?
Is there any reasonable explanation?

Hi @Angel_Langdon, what type of normalization method did you use? Typically when we normalize the data, we are just trying to correct for panel changes across time. Oftentimes, the normalized estimates are not estimates of real values.

Hi, I am talking about data normalisation (like SQL normalisation, that is, converting lists and complex data structures to most basic type of data <integers, strings, etc>)

Right now, we only have exploded “vistis_by_day” (list of intergers), that is, replicate index value and assign a new row containing a visit for each visit in the list of visits

See the below image for better understanding, and sorry for the misunderstanding.

@Angel_Langdon Ahh, I see! No worries about the misunderstanding. You are correct–the number of visits you see there are low. The explanation is that SafeGraph’s data comes from a sample of the entire population. Not every visit to Subway will show up in SafeGraph’s data because not everyone has a device in SafeGraph’s panel.

For that reason, the raw counts should pretty much always be lower than the “truth” counts. I would suggest you check out this blog for tips on normalizing the raw counts to more closely estimate “truth” counts.

@Ryan_Kruse_MN_State thank you very much!