I'm comparing the 2019, 2020 and 2021 visit counts in the Weekly Pattern data to account for seasonality

Hello all, I’m comparing the 2019, 2020 and 2021 visit counts in the Weekly Pattern data to account for seasonality. I noticed the number of visits is significantly higher in Jan 2020 compared to Jan 2019. The raw visit count is ~30% higher. After taking into account the number of device, 2020 visit count is still ~10% higher than the 2019 visit count.

Do you have any explanation on why the number of visits in Jan 2019 is significantly higher than Jan 2020? Is it related to some change in data? Also, why is the device count much higher in Jan 2020 than in Jan 2019? I see previous discussion on the change of methodology v2.0 to v2.1, but I guess this only applies to social distancing data? I guess we can also rule out the effect of “inactive devices” since COVID and lockdown hasn’t started in Jan 2020.

Specifically, I’m looking at the raw daily visits in the week starting from Jan.7, 2019 compared to the week starting from Jan.6, 2020. Daily visits are aggregated weekly. All analysis are on national level.

Hi @Sissi_Li_American_Enterprise_Institute, maybe this thread will help


let me know if this helps with what you are seeing or not

@Jack_Lindsay_Kraken1 Thanks for your response. I did come across with this post when I searched for the answer. But my question was slightly different from the issue pointed out in this thread: 1) I’m looking at the spike in Jan 2020 of raw device count from normalization_stats instead of the completely at home device count - why is there an increase in the device tracked in Jan 2020 to begin with? 2) I believe the v2.0 to v2.1 methodology change only applies to social distancing data, so it shouldn’t be the cause of this weird spike right?

Ah I think I get the question now. Lets start with the first one. The number of devices tracked is changes from time to time as the sources safegraph gets it data from are updated (ie parters are added / removed)

For number 2 I will look into a bit more and get back to you

@Jack_Lindsay_Kraken1 Thanks for the reply! Following up on your first point, is there also a possibility that the number of devices decreases due to inactivity (i.e. people staying home all day)? Do we know how much each factor (inactivity/source updates) accounts for the change in the number of devices? Knowing this will be super helpful for my research.

@Sissi_Li_American_Enterprise_Institute following up here, no the inactive devices - to my knowledge - should not affect the total number of devices - they should be logged none the less

It looks like the first 3 months of 2020 are just wonky and there isn’t really a clear answer for that. The thread I ended up digging up is this one:


It appears as though the TL;DR is - “Who’s to say the visits shouldn’t be higher?” (yes I there was a pandemic starting in 2020, but who knows - that was also close to the great toilet paper crisis of 2020 haha)

I would say that if you feel confident that the visits in the first part of 2020 are wrong you could fill them in with a distribution of normal error or try just skipping them in general

The top 3 solutions I would recommend are:

  1. Mean or Median Imputation
  2. Multivariate Imputation by Chained Equations (MICE)
  3. Random Forest

I am really sorry I couldn’t be of more support here and please feel free to ask more questions for clarification and I will do my best to help!

@Jack_Lindsay_Kraken1 Thank you for following up and pointing me to resources on yoy standardization. Glad this hasn’t been forgotten!
I have one (hopefully) last question regarding the effect of inactive devices on sample size. I see previous discussions attributing the loss in device count to lockdown Workspace Deleted | Slack, that’s why I feel like normalizing by the device count sometimes may not be the best practice since it’s really hard to disentangle the effect of people staying at home versus more devices added to the panel. I actually found this thread pretty much summarize my concern:
Workspace Deleted | Slack. I’m just wondering if Safegraph has done anything on this front to solve the issue?

I am sorry @Sissi_Li_American_Enterprise_Institute, I am not sure I fully understand - are you referring to how the data is being collected (i.e. via gps) affecting the number of devices? If so I feel as though it is minimal enough to ignore. I know on my phone anyways, if my phone isnt on low power mode the gps services are being pinged all the time - even at home. I believe applying a standard of error is enough to correct for anything the lack of gps collection change would cause.

Something I have seen to help you sanity check your numbers is testing your sums against a static semi accurate sum - like visits to a football game/stadium.

nothing to my knowledge has been done to change the gps behavior because it is not in the hands of SafeGraph, but rather the sources of the gps data - does that make sense?

@Jack_Lindsay_Kraken1 Yes I was referring to the effect of change in behavior on the number of devices. I have this concern as I see in another thread where the large decrease in devices in March/April 2020 was attributed to people staying at home: Workspace Deleted | Slack.
Can you elaborate more on the sanity check you proposed?

Hi @Sissi_Li_American_Enterprise_Institute, before I start linking you to threads, I got this idea from a post @Nick_H-K_Seattle_University did a while back, but I am not sure if it applies here - he may be able to make sure it makes sense before I link you to something.

TL;DR @Nick_H-K_Seattle_University: Can something with a recorded visitor count (like a football game) be used to sanity check your population scaling?

Yep! It’s a great way to check scaling

@Sissi_Li_American_Enterprise_Institute, stubbled across this and thought it might be able to help you out in your scaling joruney