SafeGraph recently announced a brand new product offering, SafeGraph Spend, the first places-based transaction dataset.
Spend measures how many dollars were spent at commercial points-of-interest based on a panel of 11MM+ monthly active bank and credit cards from ~9MM customers in the United States.
The first thing to understand about the SafeGraph Spend data is its representativeness. This can be phrased as
- “What about bias in your dataset?”; or
- “Does your panel really represent the true American public?”
Check out this notebook I’ve been working on about quantifying sampling bias in SafeGraph Spend data. This is a technical tutorial designed for data scientists and analysts to simply answer the following question:
How representative is the SafeGraph Spend dataset?
Let me know what you think, or if you have any questions!