I get repeated observations for the POIs in core-patterns. Do you know why this happens, and how to pick one?

I am new to SafeGraph (a researcher), and tried your MonthlyPatterns_Download_Filter_PrepForNorm.ipynb, but
I get repeated observations for the POIs in core-patterns. Do you know why this happens, and how to pick one? In addition, the program does not seem to work for 2/2019. I also tried changing the prefix to prefix = ‘monthly-patterns/patterns_backfill’, but then i get too few observations.

Hi @Marina_Azzimonti_Stony_Brook_University, welcome to the community! SafeGraph updated the paths to their data, so I believe that notebook is now outdated, which is likely leading to these problems. I will look into updating/testing it. It may take a few days – I’m hoping I can have it ready Tuesday. Is that reasonable?

great thanks! I think the issue is that there are two backfills, one from May and one from December. The program seems to be appending both, because when there are repeated samples of the same safegraph id, one observation has “visitor_work_cbgs” whereas the other one does not. The ones that do not seem consistent with info on the dec backfill.

It took me a really long time to understand that there were two when i downloaded the data in the cmd!!! and I could not find an easy way to merge the patterns with the other files, that is why i was trying your script.

@Marina_Azzimonti_Stony_Brook_University You are welcome. Thanks for this extra info, it is helpful. When you downloaded the data in the cmd, what command did you use? Were you syncing the entire dataset?

I downloaded everything from monthly-patterns on 12/21

@Marina_Azzimonti_Stony_Brook_University Very sorry for the confusion with this. Please see this announcement regarding consistent methodology and the paths to use. For the most updated methodology, you’ll want to use the following two paths:
• Download monthly data from sg-c19-response/monthly-patterns-2020-12/patterns for ongoing updates starting from the month in the Dec 04 2020 release (covers Nov 2020)
• Download monthly data from sg-c19-response/monthly-patterns-2020-12/patterns_backfill for monthly data from Jan 2018 to Oct 2020
Does that make sense?