Anomaly in home_panel_summary

Hey I sent in my email for the SG Open Census data. How long does it usually take for the email to come in?


This topic was automatically generated from Slack. You can find the original thread here.

Hey @Alexander_Audet_American_Enterprise_Institute - you should receive the email within a few minutes. I emailed you about this, so keep your eyes peeled for an email from me.

Thanks! It’s been received

I have a separate, unrelated question. Should I post it in this thread, or in help?

I’m basically noticing in an anomaly in the home_panel_survey, where there’s a large increase in the number of at residing devices between January 2020 and February 2020. I haven’t adjusted numbers at all, but when I collapse them by LEAID (school district) or by time ([year, month, week]), the surge in devices appear. Did SG change their sampling at all around that time?

Hey @Alexander_Audet_American_Enterprise_Institute - that’s not a known issue in our data. We did see some bumps in the home panel summary around November 2021. Mind sharing some more details around what you’re seeing? Also looping in your CSM @austinlwheat who can help work with you on this issue.

Sure thing, I’m happy to share an output. I’m reproducing a table right now

Essentially, I took each home_panel_survey file, and appended all of them to create a list of panel devices over years, months and weeks (time points come from where the file is stored). Then, I take comparable time points from before the pandemic and after (covid vs norm) to compare how pools of devices have changed year over year

When I compare Jan 2020 to Feb 2020, there’s a consistent bump of ~20% between Jan 2020 Week 4 and Feb 2020 Week 1 across all counties or school districts. When merging by pre & post, I’m merging with an id uses the state, cbg, month and week. I then drop duplicates within each year, month and week.

If we can’t discern the cause of the bump in devices, could we schedule a call to review what’s happening? I know that’s a lot to ask, but this is a blocker for a major piece of research that AEI is trying to publish.

Also, there isn’t a comparable bump between Jan 2021 and Feb 2021

Here’s the file. Each LEAID is a school district, and there’s the corresponding device counts for each time point. If you look in the residing_devices_norm column, there’s usually a large bump in devices between Jan 2020 and Feb 2020. Let me know what you think!

Last question, back to the cbg open census data: would that literally be the same data pulled from the census website? Specifically referring to the ACS 5-year population estimates

@Jeff_Ho_SafeGraph

@Alexander_Audet_American_Enterprise_Institute How are you defining “Week 1, 2, 3, 4, etc” here. I see the first week of January was Jan 1 - Jan 7, 2020, but home_panel_summary would always cover the Mon to Sun week i.e., Dec 30 to Jan 5.

Sometimes our panel increases in size due to better recruitment of apps, etc - that’s my first thought when I read this - such fluctuations are normal if it’s not an artefact caused by the merging process here.

Besides the bump in number of devices in the sample, is there an underlying insight (i.e., trendline, etc) that is hard to explain? Happy to help draft a response to a reviewer if helpful.

For the CBG open census data, it’s the same as from the census website. Just packaged in a much easier and digestable way.

Hey Jeff, thanks for joining in! So the files for SG are usually nested in a 2020 folder, then a 5 folder for May, then a 4 folder for that date (Picking arbitrary dates as an example). To make it easier to merge weeks, I just assign a value to order the weeks (e.g. 1, 2, 3, 4) which isn’t perfect, but works for most months.

We see a trend where there’s about a 18-30% increase in home residing devices when summing residing devices at the weekly level. What do you mean by recruitment of apps?

Good to know about the census data!

I’m working on a ZIP to reproduce what I’m seeing for Chris, and I can forward it to you as well. I’m just adding comments so it’s legible at the moment.

Hey Jeff, sorry for the delay in getting back to you. We see a fluctuation in devices that, in almost all distances, appears stable. However, between Jan 2020 and Feb 2020, we found a dramatic increase in in residing devices by 18-30% at most aggregated levels.