How to quantify sampling bias in SafeGraph Spend data?

SafeGraph recently announced a brand new product offering, SafeGraph Spend, the first places-based transaction dataset.

Spend measures how many dollars were spent at commercial points-of-interest based on a panel of 11MM+ monthly active bank and credit cards from ~9MM customers in the United States.

The first thing to understand about the SafeGraph Spend data is its representativeness. This can be phrased as

  • “What about bias in your dataset?”; or
  • “Does your panel really represent the true American public?”

Check out this notebook I’ve been working on about quantifying sampling bias in SafeGraph Spend data. This is a technical tutorial designed for data scientists and analysts to simply answer the following question:

How representative is the SafeGraph Spend dataset?

Let me know what you think, or if you have any questions!