Is this a pattern others have noticed? Should we think of this as a reflection of actual increases, or is number_devices_residing potentially overestimated in 2018?

Hi everyone: we have a question about the overall visitor counts in the monthly dataset. After normalizing following @Derek_Ouyang_Stanford (using number_devices_residing from home_panel), we see aggregate increases in visitor counts to retail stores for almost all states between 2018 and 2019 (e.g. 22% higher in June 2019 compared to June 2018). Is this a pattern others have noticed? Should we think of this as a reflection of actual increases, or is number_devices_residing potentially overestimated in 2018?

Hi @Howard_Zhang_Columbia_University, can you provide any visualizations showing the change? Also, are you seeing a similar jump in other types of POIs? It looks like number_devices_residing is about 43.7MM in June 2018 and 41.9MM in June 2019. So while June 2018 is a bit higher, I’m not sure it would be enough to cause a 22% increase. Do you see a similar trend in other months year-to-year?

@Howard_Zhang_Columbia_University Also interested to know if you are normalizing based on number_devices_residing for the entire country, just the state the POI is in, or just visitor home CBGs?

@Howard_Zhang_Columbia_University Okay, great. From everything I’ve seen about normalization with this data, I don’t think there’s any reason to believe number_device_residing was overestimated in 2018, so it seems like a reflection of actual increase to me. Did you include a graph of your normalized version of raw_visitor_counts? If so, I’m not able to see it.
I suppose if you don’t feel comfortable, you might be able to find an external data set to verify these increases? If possible, it could also be valuable to see if this is happening with another top_category other than “Store”.

Hi @Ryan_Kruse_MN_State. I updated my posts above (please see the most recent comment with “All-Normalized” and “All - Raw”). Your suggestions make sense but the increases from 2019 to 2019 in visitors to retail stores still seems quite large!

FYI @Howard_Zhang_Columbia_University I’m following along, your method and results look sense to me and I’m also curious to understand the possible reasons for the near 100% increase over 2 years if not a reflection of real doubling. I suppose this doesn’t have to imply a ~2x increase in people, since the same # of people could just be visiting more places. But I think this is probably more so a reflection of an increase in the number of POIs that Safegraph has added to its places collection over the two years. @Ryan_Fox_Squire_SafeGraph

@Derek_Ouyang_Stanfordthe increase in the number of POI shouldn’t impact this though, right? As POI are added their visits are being backfilled (according to our understanding) so that wouldn’t be able to explain the trend

@Guy_Columbia i’m not well-versed on this specific documentation in the schema, so I grant you could be right about that and @Ryan_Fox_Squire_SafeGraph can quickly confirm. but if we’re looking for a specific place where there might either be a method error on Howard’s part or on Safegraph’s part, I’m putting my finger on temporal record of POIs for now.

@Howard_Zhang_Columbia_University, @Guy_Columbia and @Derek_Ouyang_Stanford I can confirm that periodically we do what we call a “backfill” the last time we did one was May 2020. When we do a backfill, we take the latest version of Core Places, and we use that static version to re-calculate Patterns for all history (in this case going back to Jan 2018). So All of the patterns data you are working with from 2018, 2019, and the first 5 months of 2020 are all made using the identical set of POI / Geometry data.

@Ryan_Fox_Squire_SafeGraph another thing @Howard_Zhang_Columbia_University and I were discussing earlier which could also potentially could explain parts of this [though we weren’t convinced it would fully explain it] is that there is also attrition in retail in the sense that there are POIs that close and, from SG’s perspective, we “lose” these visits once the store closes since core places only covers currently open stores. If people are over time more likely to then substitute towards stores that are still around then this would naturally generate an upward trend for retail POI visits.

it would be pretty surprising if this would generate the magnitude of the growth that we see. we also ran this state by state and found it pretty robustly holding across states [with DC being the lone exception] which makes the attrition story less likely.

@Guy_Columbia that is possible, and something that I have considered but have had a hard time quantifying.

To be clear, the suggestion is that when we generate the backfill, we are using a static version of Core Places, based on what is currently open. Undoubtably, the world 2+ years ago was different, and some of these points-of-interest (a) didn’t exist, literally were not built yet (b) existed as a different business
in the (a) cases, we would expect no visits to these places before the structures were built and (b) depending on what the prior business was maybe more or less visits.

one way to investigate this is to look at some individual safegraph_place_id examples to see if you can see this trend happening at the level of individual POI. You could also quantify the number of unique SGPIDs over time in this aggregation.

In general I would suggest examining some examples to see if you can find individual POIs that exhibit the same trend (or not) and whether that tells us anything.

The other thing I would suggest is to look at the underlying panel metrics (home_panel_summary number_devices_residing and normalization_stats, total_devices_seen and total_visits).

Do any changes in these metrics correlate with the trends you are observing in the foot-traffic? If so, that would suggest a methodological contamination that our normalization procedures are not fully controlling for.

(of course it is possible that real-world visits overall increased and SafeGraph panel size overall increased over time, but examining these factors specifically may still be informative).

@Ryan_Fox_Squire_SafeGraph thanks, we’ll do your suggested exercise. We have seen this for individual POI which is what actually got us to start looking into this though unfortunately it was not a “clean” event. One validation measure we were thinking of looking into was traffic at major airports (LAX, JFK, etc.) In this context we know the POI’s geometry is pretty stable over time and there are no close geographic substitutes and we can use external data sources (such as data from the bureau of transportation statistics) to get a rough sense for how much traffic must be going through the POI. Let us know if you think that’s a reasonable exercise as well.

@Guy_Columbia yes I think that is a useful exercise, and individual airports also will publish their total visitors / onboards, etc. (this is something SafeGraph has used in the past to help benchmark).

I will be very keen to see your results.

We also have also used Six Flags / Sea World (which publish attendance) and baseball stadiums (which publish attendance) for similar reasons.

@Guy_Columbia I have one more suggestion, which is to look at the ratio of visits:visitors overall in the panel (e.g., as presented in visit_panel_summary or normalization_stats).

I am not sure a priori if we have an expectation of whether this ratio should be growing or shrinking in the real world over time, but there are a variety of reasons that this ratio could change for methodological reasons around data collection, and that would be useful to understand as a possible additional source of variance

@Ryan_Fox_Squire_SafeGraph good to know you’ve done some benchmarking, and @Guy_Columbia you’re looking into this. Some of my colleagues are working on similar kinds of benchmarking too, @Daniel_Ho_Stanford_University @Amanda_Stanford_RegLab @Neel_Guha @Julia_Wagenfehr_Stanford, will be nice to compare notes. Ryan have you published any of this benchmarking (which is different than the existing CoLab reports as far as I can tell), or could you share now?