Quick question. I am looking at the change in devices residing home stats (weekly data) and I see a decline over the three weeks. Is this right? I expected to see more devices at home as ppl don’t travel anymore

@Ryan_Fox_Squire_SafeGraph quick question. I am looking at the change in devices residing home stats (weekly data) and I see a decline over the three weeks. Is this right? I expected to see more devices at home as ppl don’t travel anymore

for that stat, they must remain home all day right? If they leave to say get take-out or pick up groceries they wouldn’t necessarily show up in that statistic, right?

I think their location is taken at nighttime

@Cornelia_UCB can you confirm exactly which dataset and which column you are looking at here?

Dataset == weekly_patterns/home_summary_file; column == number_devices_residing

@Cornelia_UCB is it the same issue discussed here?

Not really. I am only looking at the weekly summary files right now

And I am comparing the change in the # of devices residing from one week to another

@Cornelia_UCB OK. I think these might actually be a related issue and I am trying to confirm. I will get back to you soon.


Summary: Our sample size is truly decreasing week over week, so that is why you are seeing that reflected in those numbers. The denominator of total devices for which we are receiving data is decreasing.

First, I just posted some more info on how we calculate the number_devices_residing in the weekly and monthly patterns summary files.

That is helpful background:


Second – the specific issue you are seeing has to do with a new development in the data that we are still exploring how best to handle. That is, we are seeing the total devices in our sample population (panel) decreasing across the country.

There are a couple of reasons that a device will exit our sample population (panel) on a given day.

  1. That device uninstalls or deletes the application that is collecting location services.
  2. That device opts-out of location service sharing in the app on their phone.
  3. Data for that device is collected by a particular app on the phone, and that application stops sending data to SafeGraph (from all devices).
  4. That device is turned off all day or does not generate any data that day.
    It is not #3 and unlikely to be #2. We see some devices leaving the panel every day, driven by #1. However, these large changes are probably being driven by #4. Specifically, the way many location services work on smart phones is that they generate more GPS data the more a phone is moving around. This is not true for every device, and depends specifically on details for how the location services is implemented by the app on the phone. Our current hypothesis is that many of these implementations require phones to move around for GPS data to be collected, and since there is no movement during the day, no GPS data is collected and that device disappears from our sample.

Of course this is a problematic sampling bias. The challenge is when a device doesn’t send any data, how do you know that is because it isn’t moving vs it has left for one of the other reasons.

There isn’t a clear cut answer for this, but this is something we are actively thinking about.

I welcome your feedback and ideas on how to handle this.

@Ryan_Fox_Squire_SafeGraph is your method of collection changing over time? If not, one might expect that significant changes in the detected devices are coming from reason 4 above (devices that do not move around and therefore do not send GPS info). It might be safe to use devices in Jan2020 as a proxy for devices in future months, unless there is a strong reason to believe the sample might have changed for any other reason…

@Luis_E_Quintero there is no reason to expect the sample has changed in any major way since Jan 2020.

However we do know that there is a baseline amount of devices leaving the panel (and new devices joining) even in normal times.

Even a 1 or 2% churn per week can create a lot of churn over 3 months.

This is not something we have invested heavily in modeling or quantifying in the past.

Your suggestion is not a bad idea and may be worth trying.