I have been analyzing a previous version, and I need to know the formula that was earlier used to calculate "related_same_month_brand". The earlier link no longer seems visible. Could you direct me to a resource?

I understand Safegraph has made some changes to its metrics recently in July, as mentioned in Patterns | SafeGraph Docs
However, I have been analyzing a previous version, and I need to know the formula that was earlier used to calculate “related_same_month_brand”. The earlier link no longer seems visible. Could you direct me to a resource?

This topic was automatically generated from Slack. You can find the original thread here.

Hi Thanks for reaching out! We are looking into it and will get back to you once we have an answer.

Hi ! Appreciate your patience on this. Double-checking some things with our Product team on this. We’ll be in touch soon!

Hi - we haven’t forgotten about this. Just asked for an update on this.

The old definition / formula are:

Other brands that the visitors to this POI visited on the same day as the visit to this POI, where customer overlap differs by at least 5% from the SafeGraph national average. The mapping has the brand as the key. 

The value shown for each brand is a percentage representing the median of the following calculation for each day in the month: 
`(same-day visitors to both the brand and the POI / total daily visitors to the POI) - (daily visitors to the brand / all visitors in SafeGraph panel)`.

Thank You

Niki Kazahaya (SafeGraph) : Hey ! Looks like Jeff was able to get you squared away on this. I’m going to go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to safegraphdata, as we aren’t monitoring old threads at this time. Thanks!


Hi Niki
Three questions to follow up:
(1) Can you refer any published papers that have used / analyzed the “related_same_month_brand” metric?
(2) Going by Jeff’s definition, can the value be negative, even if it is not reported (if the value <5%)?
(3) Is there a rationale as to why this specific formula was used or chosen? Meaning reference to previous established metrics or measures?

Hi @Sy_University_of_Michigan_Flint_MIDAS! Thanks for the follow-up questions.

  1. I’m not aware of any off the top of my head. That would require a little bit of digging on my end before I can report back with any other publications that have utilized that variable.

2/3. Let me loop in someone else from our team about these!

We’ll circle back shortly!

  1. Yes the value can theoretically be negative. That would represent brands that the visitors to the POI also visit at less than the national average. I can think of an example where visitors to a healthy food supermarket might visit certain brands like fast food far less than the national average.
    However, because the column is intended to showcase related same day brands (not anti-related), I don’t think we surfaced such negative values when this formula used to be live.
  2. The rationale was to identify brand affinities that were “above average”. i.e., people who visit these POIs are especially strong visitors of these brands. I could dig into references if you really need, but I am not sure if there was more beyond this qualitative rationale.

Hi @Jeff_Ho_SafeGraph , @Niki_Kaz
Thank you for the clarifications. In response to (3), i.e. “the rationale”, intuitively I understand this was to accommodate a brand’s representation bias due to variations in (single state/national brands, number of locations, or locations a brand is present in being more represented/ recorded). However, if you could provide some references (to prior practices/ measurements) it would really help address reviewer queries and concerns.

Hey @Sy_University_of_Michigan_Flint_MIDAS - we did some digging on our side about references around our previous definition. Unfortunately, we don’t have anything available regarding the previous approach to related_same_day_brand. Apologies for any inconvenience!