Potential data error about Advan Weekly Patterns

Recently, our research group explore the Advan Weekly Patterns. We find out some errors which stop us to continue our research.
First, we use one attribute: “visitor_home_cbgs”. We extract all visitor counts with CGBs from Erie County and sum up, then devided by the “number_devices_residing”. This can represent how many visitors from their home CBG visit to all types of POIs in a week. Below is the visitor count curves. Note that each curve represent one CBG.


The curves are extreme weird. Somehow they all have monthly clycle effect. In common sense, these curves should not fluctuate wildly.
In addtion, we also use “raw_visitor_counts” to visualize visitor counts with POIs in CGBs from Erie County. This can represent how many visitors visit all types of POIs in a given CBG with inside Erie County. Below please see curves:

As you can see, its also extreme weird.

Note that we not only test CBGs from Erie County, we also test other place’s CBGs.

We think this dataset have potential data error. We subscribed to Dewey only for this dataset. Right now, we can’t continue our research. Could you also test out this dataset, and fix these errors!

They panel of devices used by Advan is changing regularly, but I believe the summary files are only updated monthly. I just spot checked a few examples of the home_panel_summary and the values are the same for number_devices_residing for a CPG if you compare weeks within the same month. However if you compare weeks from two months, the numbers are different. So if you have weekly visits, but you’re normalizing it effectively by monthly panel data, you’re going to see these monthly spikes.

Hi Evan, Thanks for your reply.
Could you explain more and tell us how to solve this problem?

FYI, the first figure is normalized by “number_devices_residing”, as you said, we will going to see these monthly spikes. Is that mean Dewey does not provide weekly_home_panel_summarys.

Second, the second figure is not normalized. We just samplely use “raw_visitor_counts”. It still have these monthly spikes.

We test this dataset in many way and we check our code many times. It seems Advan Weekly Patterns do have this type of data error. Hope you could address it and help us solve this problem.

I’m not 100% sure on the weekly panel stats, I just spot checked a few examples but it’s worth exploring deeper. Would help explain the monthly lumpiness for the type of analysis you’re doing.

There are been some previous resources posted in the community about normalizing patterns data over time, it’s definitely a challenge when using the data this way, with no single solution.

I’m not sure how well this correlates with Advan’s Patterns data because they do have methodology differences: SafeGraph <> Advan Methodology Differences

Thank you Evan, I will look it. Thank you so much providing me the materials

Thanks Evan, I fixed that problem, thanks for your help!
The method please see below:
First, we use one attribute: “visitor_home_cbgs”. We extract all visitor counts with CGBs from Erie County and sum up, then devided by the “number_devices_residing”. This can represent how many visitors from their home CBG visit to all types of POIs in a week. Below is the visitor rate curves. Note that each curve represent one CBG.
As the same as first figure.

however, I also find out there are sudden hikes in 1/2023.
See below figure:


I also find out othre researchers also post similar questions:

Could you fix this problem for us? Thank you so much!

Hi @Ryan_Zhenqi_Zhou_SUNY_Buffalo glad you figured this out! This will be helpful context for others to see.

Apologies for the delay, I had to wait for confirmation from Advan but we are still waiting on a restatement of their Normalization Stats files starting 1/2023 and beyond. That should help with this issue. They targeted releasing the updated files in the next two weeks.

Thank Evan for your reply. We will wait!

Hi, Evan, for weekly data, is there something wrong with the month 12/2022.
We test Buffalo:


Houston:

Miami:

They all suddenly drops in 12/2022.
I also see the “visit_panel_summary.csv”, the total visit count also suddenly drops in 12/2022 compare to other months.
I would like to know why? and also, I test these three cities in the year of 2021, 2020, it all seems normal, please see below example for Buffalo in 2021.

Btw, I also find the data missing in the week of 12282020 and 11232020, these two files only have 168 csv (usually have 170 csv).

Thank you so much!

Hi, Evan, I further visulize the The_US_Nationwide_Vists_Curve from “visit_panel_summary.csv” files in 2022. Below please see figures:


As you can see, the num_visits fell sharply in 12/2022.

Do you have any suggestion to fix that?

Hi, Evan. I’m also thinking start from 12/2022, maybe the POI change drastically, maybe this is the reason the number of visits fell sharply. So I also test whether 12/2022 and 11/2022 share the same POIs. After testing, they do share the same POIs.

For now, I still can’t figure out why the number of visits fell sharply in 12/2022. Could you send this potential issue to Advan?

Thank you very much!

Hi, Evan. I may find the potential reason. I compare the visit count on 11/07/2022 and 12/05/2022 as an example.


As you can see the above screenshot.
Some POIs on 11/07/2022 have a hug number of visit counts, but these POIs on 12/05/2022 have “nan” value.
This maybe why in 12/2022, the visits count fell sharply.
Could you post these issue to Advan?

Many thanks,
Ryan

Hi Evan,

I filter out 3 examples that you can easily check out. Please see below table:


Could you check and ask Advan to check why “Apple Walden Galleria”, “Billy Beez”, and “Airport Taxi Service” have a lot of visits in 11/7 and then no visits in 12/05.

I would appreciated that you could reply me when you see this message.

Many thanks,
Ryan

Thanks. I’ll share this with them

Hi Evan, sorry to interrupt again.
I just want to provide a more simple way to report the data missing issue in 12/2022.
For example, you can simplely check this POI “zzw-222@63q-s8g-92k, Buffalo Niagara International Airport”. We all know international airport will have many visits (it has huge visits count during all weeks in 01/2022 - 11/2022), but it has “nan” visists value during all weeks in 12/2022.
This just a simple sample. a lot of other POIs also have “nan” visits count during all weeks in 12/2022, but have huge visits during all weeks in 01/2022 - 11/2022.

This is absolutely a data error, please let Advan know and fix the dataset. This can benefit all the researchers who subscribe Dewey data and use Advan’s dataset.

We’ve restated the Monthly Patterns files with new normalization files, but we’re still waiting for an update on Weekly Patterns.

I also heard back from Advan regarding the three Placekeys you share: The first two placekeys you shared, ( 22g-223@63q-rt9-skf & zzw-22x@63q-rt9-st9 ), are the same polygon. The reason you are not seeing any visits for these placekeys is because the polygon is “bad” and we do not run it. It was run initially before we started filtering these out. It is filtered because of bad geofencing, i.e. polygons get denied by our internal checks due to an unreasonable size/the number of vertices.

The third placekey, ( zzy-224@63q-rt7-j35 ), is a result of the same situation. It was corrected later but is not capturing any traffic as it is now a geofence of a tiny shed in someone’s backyard.

It sounds like this means they’ve started filtering out certain POI in December 2022 that are “bad” but haven’t backfilled the data yet. So it’s probably best to drop any of these POI with suddenly missing data on this data from your analysis. I’ve asked if they have a list of POI or if the can backfill the data to make this easier.

@Ryan_Zhenqi_Zhou_SUNY_Buffalo Here’s a list of Placekeys that have been dropped from the Advan data because of some issue with the Place. I recommend dropping these from any longitudinal analysis:

Hi, Evan, thank you so much for your “excluded_placekeys.csv”. This is indeed very helpful!
I have two follow up questions hope you can answer.

Question 1: I understand if the polygon is “bad”, Advan start to exclude it since 12/2022. Again, take the Buffalo Niagara International Airport as example. The placekey_primary is “zzw-222@63q-s8g-92k”, the placekey_shared are “zzy-222@63q-rt7-j35”, “zzy-222@63q-s8g-fpv”, “zzy-224@63q-rt7-j35”, and “222-222@63q-s8g-g6k”. During all weeks in 12/2022, all these places have nan visits. Is it odd? It should be at least one of them have visits, right? Not all of them have the “bad” polygons, right? Buffalo Niagara International Airport is just an simple example. Advan exclude lots of places with complex buildings in 12/2022, such as airports and shopping malls. Since these place nomally have lots of visits, exclude all of them will influence the data integrity. Could you check with Advan again?

Question 2: I can’t understand “We’ve restated the Monthly Patterns files with new normalization files, but we’re still waiting for an update on Weekly Patterns.”
Are the new “normalization files” the “normalization_stats.csv”. If so, they are already in the File Browser on Weekly Patterns.


And I’m wondering how the normalization files related to the “data error” I report?

If you see these questions, appreicated you can reply me!

Many thanks,
Ryan

The intent of the excluded_placekeys file is to exclude those completely from your analysis, not just starting in 12/2022. It’s encouraged to not use those POI in any historical analysis, since they were dropped after determining that visits were potentially not calculating properly for those POI. Eventually, we’ll try and exclude these from the historical data ourselves so users don’t have to do this on their end.

At this point, we’re still working on fixes for both Weekly and Monthly Patterns. Because we’re so close to the launch of our new platform, we’re actually only incorporating the fixes into the new platform launching next week. Once you get access to this platform, the latest Monthly and Weekly Patterns datasets will be readily available. Sorry for the inconvenience.

Thanks Evan, Thanks a lot to answer my questions in those days!