The following questions pertain to the Safegraph Spend data:
- Does every POI in the Spend data have a latitude/longitude and NAICS code in the Places data for the United States?
- How are consumers selected for the Spend data panel? Is it a random sample?
- Do the same consumers in the Spend data appear each month, or are consumers added and dropped each month?
- The articles Quantifying Sampling Bias in SafeGraph Spend and Validating Spend Data for Brands Against Company Reporting provide helpful analyses for 2020-21. Have you conducted similar analyses for 2019 or 2022-23?
- The Quantifying Sampling Bias in SafeGraph Spend article says that the Spend data are fairly representative of state-level populations over time. Can I check this for lower levels of aggregation (e.g., counties)?
- The same article also says, “it may be necessary to adjust Spend data based on the state sampling rate each month, in order to account for the variations in state sampling rate over time for some states.” Do you have any suggestions on how to do that?
- Many locations in the 2019 Ohio Spend data report zero expenditures on some days. This can occur multiple days in a row, and not only on weekends. Do you know why that may be?
- The Spend documentation says, “for some transactions, the date reported is instead the date processed by the financial institution, which is typically the next business day.” Do you know what fraction of transactions this applies to?
Hi @gschlauch , I can try to answer a few of these now and will provide more responses later today.
- If the Spend POI is associated with a brick & mortar location, there will be a latitude/longitude coordinate in Places. I’m not sure if Spend encompasses any online retailers at the moment.
- The “How does SafeGraph build Spend data” question on this page provides some high-level insight here.
- We do not have access to any individual-level data to identify whether an individual user is added, dropped, or already existing each month, but intuitively I would strongly suspect the vast majority of the panel remains the same each month. There will certainly be consumers added and dropped each month based on how the Spend data is built, so the panel may change over time.
- I’m not familiar with any similar bias analyses other than the ones you’ve already linked to. I believe all the data used to create the 2020-21 analyses are available to recreate corresponding analyses for other years.
- The Spend product has a
customer_home_city column which may come in handy for analysis at lower-level aggregations.
6. Generally, the recommended approach is to use the Panel Overview Data and Census population data to normalize the raw spend totals for each state each month.
7. I would expect this is most common in low volume POIs and is largely due to sampling.
8. Unfortunately, I’m not aware of the fraction of transactions this applies to. The final bullet point in the section you linked to provides some suggestions for handling this issue.
Great! Thank you for your detailed responses.