Potential data error about Advan Weekly Patterns

Thank you Evan, I will look it. Thank you so much providing me the materials

Thanks Evan, I fixed that problem, thanks for your help!
The method please see below:
First, we use one attribute: “visitor_home_cbgs”. We extract all visitor counts with CGBs from Erie County and sum up, then devided by the “number_devices_residing”. This can represent how many visitors from their home CBG visit to all types of POIs in a week. Below is the visitor rate curves. Note that each curve represent one CBG.
As the same as first figure.

however, I also find out there are sudden hikes in 1/2023.
See below figure:


I also find out othre researchers also post similar questions:

Could you fix this problem for us? Thank you so much!

Hi @Ryan_Zhenqi_Zhou_SUNY_Buffalo glad you figured this out! This will be helpful context for others to see.

Apologies for the delay, I had to wait for confirmation from Advan but we are still waiting on a restatement of their Normalization Stats files starting 1/2023 and beyond. That should help with this issue. They targeted releasing the updated files in the next two weeks.

Thank Evan for your reply. We will wait!

Hi, Evan, for weekly data, is there something wrong with the month 12/2022.
We test Buffalo:


Houston:

Miami:

They all suddenly drops in 12/2022.
I also see the “visit_panel_summary.csv”, the total visit count also suddenly drops in 12/2022 compare to other months.
I would like to know why? and also, I test these three cities in the year of 2021, 2020, it all seems normal, please see below example for Buffalo in 2021.

Btw, I also find the data missing in the week of 12282020 and 11232020, these two files only have 168 csv (usually have 170 csv).

Thank you so much!

Hi, Evan, I further visulize the The_US_Nationwide_Vists_Curve from “visit_panel_summary.csv” files in 2022. Below please see figures:


As you can see, the num_visits fell sharply in 12/2022.

Do you have any suggestion to fix that?

Hi, Evan. I’m also thinking start from 12/2022, maybe the POI change drastically, maybe this is the reason the number of visits fell sharply. So I also test whether 12/2022 and 11/2022 share the same POIs. After testing, they do share the same POIs.

For now, I still can’t figure out why the number of visits fell sharply in 12/2022. Could you send this potential issue to Advan?

Thank you very much!

Hi, Evan. I may find the potential reason. I compare the visit count on 11/07/2022 and 12/05/2022 as an example.


As you can see the above screenshot.
Some POIs on 11/07/2022 have a hug number of visit counts, but these POIs on 12/05/2022 have “nan” value.
This maybe why in 12/2022, the visits count fell sharply.
Could you post these issue to Advan?

Many thanks,
Ryan

Hi Evan,

I filter out 3 examples that you can easily check out. Please see below table:


Could you check and ask Advan to check why “Apple Walden Galleria”, “Billy Beez”, and “Airport Taxi Service” have a lot of visits in 11/7 and then no visits in 12/05.

I would appreciated that you could reply me when you see this message.

Many thanks,
Ryan

Thanks. I’ll share this with them

Hi Evan, sorry to interrupt again.
I just want to provide a more simple way to report the data missing issue in 12/2022.
For example, you can simplely check this POI “zzw-222@63q-s8g-92k, Buffalo Niagara International Airport”. We all know international airport will have many visits (it has huge visits count during all weeks in 01/2022 - 11/2022), but it has “nan” visists value during all weeks in 12/2022.
This just a simple sample. a lot of other POIs also have “nan” visits count during all weeks in 12/2022, but have huge visits during all weeks in 01/2022 - 11/2022.

This is absolutely a data error, please let Advan know and fix the dataset. This can benefit all the researchers who subscribe Dewey data and use Advan’s dataset.

We’ve restated the Monthly Patterns files with new normalization files, but we’re still waiting for an update on Weekly Patterns.

I also heard back from Advan regarding the three Placekeys you share: The first two placekeys you shared, ( 22g-223@63q-rt9-skf & zzw-22x@63q-rt9-st9 ), are the same polygon. The reason you are not seeing any visits for these placekeys is because the polygon is “bad” and we do not run it. It was run initially before we started filtering these out. It is filtered because of bad geofencing, i.e. polygons get denied by our internal checks due to an unreasonable size/the number of vertices.

The third placekey, ( zzy-224@63q-rt7-j35 ), is a result of the same situation. It was corrected later but is not capturing any traffic as it is now a geofence of a tiny shed in someone’s backyard.

It sounds like this means they’ve started filtering out certain POI in December 2022 that are “bad” but haven’t backfilled the data yet. So it’s probably best to drop any of these POI with suddenly missing data on this data from your analysis. I’ve asked if they have a list of POI or if the can backfill the data to make this easier.

@Ryan_Zhenqi_Zhou_SUNY_Buffalo Here’s a list of Placekeys that have been dropped from the Advan data because of some issue with the Place. I recommend dropping these from any longitudinal analysis:

Hi, Evan, thank you so much for your “excluded_placekeys.csv”. This is indeed very helpful!
I have two follow up questions hope you can answer.

Question 1: I understand if the polygon is “bad”, Advan start to exclude it since 12/2022. Again, take the Buffalo Niagara International Airport as example. The placekey_primary is “zzw-222@63q-s8g-92k”, the placekey_shared are “zzy-222@63q-rt7-j35”, “zzy-222@63q-s8g-fpv”, “zzy-224@63q-rt7-j35”, and “222-222@63q-s8g-g6k”. During all weeks in 12/2022, all these places have nan visits. Is it odd? It should be at least one of them have visits, right? Not all of them have the “bad” polygons, right? Buffalo Niagara International Airport is just an simple example. Advan exclude lots of places with complex buildings in 12/2022, such as airports and shopping malls. Since these place nomally have lots of visits, exclude all of them will influence the data integrity. Could you check with Advan again?

Question 2: I can’t understand “We’ve restated the Monthly Patterns files with new normalization files, but we’re still waiting for an update on Weekly Patterns.”
Are the new “normalization files” the “normalization_stats.csv”. If so, they are already in the File Browser on Weekly Patterns.


And I’m wondering how the normalization files related to the “data error” I report?

If you see these questions, appreicated you can reply me!

Many thanks,
Ryan

The intent of the excluded_placekeys file is to exclude those completely from your analysis, not just starting in 12/2022. It’s encouraged to not use those POI in any historical analysis, since they were dropped after determining that visits were potentially not calculating properly for those POI. Eventually, we’ll try and exclude these from the historical data ourselves so users don’t have to do this on their end.

At this point, we’re still working on fixes for both Weekly and Monthly Patterns. Because we’re so close to the launch of our new platform, we’re actually only incorporating the fixes into the new platform launching next week. Once you get access to this platform, the latest Monthly and Weekly Patterns datasets will be readily available. Sorry for the inconvenience.

Thanks Evan, Thanks a lot to answer my questions in those days!

Hi, Evan,

With you suggestion, I fix the error in 2022/12, but the 2023/01 and 2023/02 still have errors. Please see below figures:



Start from 2023, the number_devices_residing and the number of visitors is super abnormal (I also find out other researchers post this error).
Could you let Advan know and fix it? and when will you post new platform, and is this new platform will fix this error?
Apprieated you could reply me!

Many thanks,
Ryan

Hi, Evan,

Do you have any solusion about the above question?

Hi @Ryan_Zhenqi_Zhou_SUNY_Buffalo, I believe Evan will be largely unavailable until late next week. I will make sure he sees this message when he is back!

We’re still working with Advan to get an update to the 2023 Patterns data.