I noticed an anomaly in the patterns data when looking at cannabis dispensaries in a city in Colorado. Can anyone help?

Hello. I noticed an anomaly in the patterns data. I am looking at cannabis dispensaries in a city in Colorado. However, when looking at individual dispensaries to see if people are visiting different dispensaries on the same day by searching if there are name of dispensaries in the same_day_brand data, I came across an anomaly. For 2018-present, only 3 out of 26 dispensaries appear in the same_day_brand column. Among them, 1 has good correlation with its actual raw_visit count and appearance on same_day_brand column of other dispensaries. However, other has far higher numbers in same_day_brand columns than its own raw_visit count. The 3rd one has hardly any recorded count on raw_visit but has values in same_day brand columns of other dispensaries. If someone could help me, I would be grateful. Thank you.


This topic was automatically generated from Slack. You can find the original thread here.

Hey @Abhinav_Dev_University_of_New_Mexico ! Thanks for posting this - let me dig into this further and circle back with you. It might be an issue with our data and could be good to submit on product-feedback. However, let me confirm with our team on this.

Very curious. When you say that “numbers in same_day_brand” columns are higher than its own raw_visit_count, are you already accounting for the fact that “related_same_day_brand” shows a percentage?

For another, it seems reasonable to me that dispensaries would’t always show up in that column as not every single brand appears there.

> - These are the brands that the visitors to this POI also visited, on the same day that they visited the POI. The number mapped to each brand is an indicator of how highly correlated a POI is to a certain brand. The value is a simple percent of POI visitors that visited the other brand on the same day.
>
> - Only the first 20 brands are returned.

Thank you for your help. It was clearly a mistake on my part, I assumed it was visitor count in the same_day_brand column and not a percentage of unique visitors to that place that also went to the other place.

No problem - glad I could help! Feel free to reach out if you have any other questions

I checked again, and I have found a few cases where multiplying percentage value on same_day_brand with raw_visitor_count yields a higher value for a place that is a common place of visit than that place’s raw_visitor_count. Eg: Dispensary_A has raw_visitor_count=36, and in its same_day_brand, includes Dispensary_B:10. I assume this means that roughly 10% of the raw_visitor_count for Dis_A visited Dis_B on the same day. So, there would be roughly 3 visitors from Dis_A to Dis_B, over the month. However, the raw_visitor_count data for the same month for Dis_B has null value. I am not sure if i am doing this correctly.

That’s the right approach. One other thing is that related_same_day_brand measures related visitors at the brand level. So for example if “Target” showed up there, it wouldn’t refer to visits to a specific Target per se, but to all Targets.

That could be what’s happening here. Are there other locations of Dispensary Brand B that you could be comparing to?

Yes, Dis_B does have locations in cities other than the one we are looking at. That probably explains the difference in data. Thank you for your help.

Thanks for the great question @Abhinav_Dev_University_of_New_Mexico ! Excited to hear more about you and Dr. Stith’s research. We’ll have to catch up soon again!

To prevent any further questions from being overlooked, I’ll go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to help, as we aren’t monitoring old threads at this time. Thanks!