Hi! Zhecheng Wang and I are wondering why the closure data for the Bay Area has a strange distribution? i.e. looking at the “closed_on” variable, there are 8 thousand observations for January 2020 and then just a few in December and February.
Hi @Herman_Donner_Stanford_University, can you provide any more information? Perhaps a sample of the data your seeing or a visualization?
Asking Zhecheng as he’s the one working with the data. But basically, it is as above - why the distribution of the “closed_on” in the Bay Area is so very skewed.
Hi Ryan, I attach the bar plot here.
It shows the number of businesses with “closed_on” variables in different months
@Zhecheng_Wang_Stanford_University are you looking at a particular subset of businesses?
Please can you describe any and all filters you you used to subset to this dataset that you visualized here, we are trying to fully reproduce and need more details.
Did you compare to other things besides Bay Area, or did you only look at Bay Area?
@Ryan_Fox_Squire_SafeGraph I am using the core_poi.csv in the “ClosedPOI-SanFranciscoOaklandFremontCAMSA-CORE_POI-2020_04-2020-05-15” data folder.
I just filter out all businesses with NaN “closed_on” values and keep the rest of the businesses. It is for the whole Bay Area
@Zhecheng_Wang_Stanford_University I confirmed that this is a known issue for the entire dataset. right now most of our closed POI are listing
closed_on as 2020-01, and apparently the team doesn’t have a great answer, other than we updated a lot of sources aroudn this time, and this refresh may have culled a lot of already-closed POI.
sorry for the inconvenience. this is a new feature and is not ironed out yet
We are looking to add this to the documentation
@Ryan_Fox_Squire_SafeGraph Thank you for letting me know that! It would be great if you could notify us when your team fix this issue or update the data. We are very interested in this business closure info. @Herman_Donner_Stanford_University