I’m trying to analyze the Monthly Places Patterns (aka “Patterns”) Jan 2018 - Apr 2020 data. When I try to download it shows up as 28 separate files. Am I doing something wrong? How can I access the data as just one file?

Hi! I’m trying to analyze the Monthly Places Patterns (aka “Patterns”) Jan 2018 - Apr 2020 data. When I try to download it shows up as 28 separate files. Am I doing something wrong? How can I access the data as just one file?

Hi Erin. Each file contains one monthly pattern. Therefore, there are 28 (12+12+4) files.

Do you have any recommendations on how to combine the 28 files into one?

I think it is hard to combine 28 datasets into one since each dataset is very big. For my research, I extracted my targeted POIs first, then I combined 28 datasets (only contain my targeted POIs) into one.

Hi @Erin_Brown_Purdue_University, I would agree with @Yun_Liang_Penn_State_University here in that filtering before merging is typically the best approach. There are some functions in SafeGraph_py that will merge all of those together for you, but you will likely get a memory error after 5-10 files if they aren’t filtered

Alternatively, @Ryan_Kruse_MN_State has come up with a nifty notebook that will walk you through an all in 1 process to pull, filter, and save just the data you want! you can check it out HERE

So I tried following that notebook, but in Step 4 I got the error ‘TypeError: only list-like objects are allowed to be passed to isin(), you passed a [str]’.

I am trying to filter the data to get just restaurants in the US, but I think I set the filters wrong, this is what I have: df = chunk[(chunk.top_category.isin(‘Restaurants and Other Eating Place’)) & (chunk.country.isin(‘US’))] but I have a feeling I am doing the filter wrong. How can I filter this correctly? @Ryan_Kruse_MN_State @Jack_Lindsay_Kraken1

Hi @Erin_Brown_Purdue_University, I believe you can switch it to df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Place'])) & (chunk.country.isin(['US']))] and it should work for you. I think when you use .isin() , the input has to be a list, so I just made the following changes:

  1. 'Restaurants and Other Eating Place' to ['Restaurants and Other Eating Place']
  2. 'US' to ['US']
    Let me know if that makes sense/works!

So that fixed my initial problem and so I was able to complete the entire process, but then when I opened the folder in my google drive there was no data, just the column names in the combined_core_poi-patterns.csv when I opened it in google drive. Do you know why this issue occurred and how to fix it? @Ryan_Kruse_MN_State

Hi @Erin_Brown_Purdue_University, I think I’ve identified the problem. Looking at the column names in one of the combined_core_poi-patterns.csv files, I don’t think you will find a country column, which is why the data is filtering to nothing. There may be an iso_country_code column or something similar, but for Monthly Patterns, all the POIs are in the US anyway, so you really don’t need to filter by country at all.

I believe if you just filter by top_category, you will get data back. Something like
> df = chunk[chunk.top_category.isin(['Restaurants and Other Eating Place'])]
I would suggest trying with one month to make sure the filter is working as expected. Please let me know how it goes and if any issues arise!

Note 1: You may have to delete all the folders created in Drive from when you ran it last time. The tool is somewhat barebones, so it’s not super robust.

Note 2: There is a Canada Weekly Patterns product. However, the Canada and US Patterns data are in separate datasets, so there’s no need to filter by country.

I made the change to the filter and deleted the folders created in Drive. Now the brand data is fully downloaded but it is not filtered and there is a core_poi.csv file that only contains headers. I am also getting this error


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-16-c62671c712f8> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', '\nprint("getCoreFile")\ngetCoreFile()\nprint("getPatternsFiles")\ngetPatternsFiles()\nprint("mergeCorePOIandPatterns")\nmergeCorePOIandPatterns()\nprint("disseminateBrandInfo")\ndisseminateBrandInfo()')

3 frames
<decorator-gen-53> in time(self, line, cell, local_ns)

<timed exec> in <module>()

<ipython-input-13-4ab9656eb13a> in getPatternsFiles(destination, months)
     32     if x_dir not in os.listdir(destination):
     33       os.mkdir(destination + "/" + x_dir)
---> 34     for f in date_dict[x]:
     35       if f.split('/')[-1] not in os.listdir(destination + '/' + x_dir): #do not download the file if it is already there
     36         print(bucket, f, '/'.join([destination, x_dir, f.split('/')[-1]]))

KeyError: '2'```
@Ryan_Kruse_MN_State Do you have any theories to why this might be happening?

@Erin_Brown_Purdue_University Sorry for the delay getting back to you. I have some answers for you!

  1. You’ll need to change df = chunk[chunk.top_category.isin(['Restaurants and Other Eating Place'])] to df = chunk[chunk.top_category.isin(['Restaurants and Other Eating Places'])] . Previously, the string did not match any of the categories, which is why nothing was returned.
  2. The months variable has to be a list. So change months = sorted(date_dict.keys())[0] to months = sorted(date_dict.keys())[0:1] . This will make months = ['2018-01'] instead of months = '2018-01'.

I ran this, and the code ended with this error:

FileNotFoundError                         Traceback (most recent call last)
<ipython-input-12-c62671c712f8> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', '\nprint("getCoreFile")\ngetCoreFile()\nprint("getPatternsFiles")\ngetPatternsFiles()\nprint("mergeCorePOIandPatterns")\nmergeCorePOIandPatterns()\nprint("disseminateBrandInfo")\ndisseminateBrandInfo()')```

8 frames
```<decorator-gen-53> in time(self, line, cell, local_ns)

<timed exec> in <module>()```
/usr/local/lib/python3.7/dist-packages/pandas/io/parsers.py
``` in __init__(self, src, **kwds)
   2008         kwds["usecols"] = self.usecols
   2009 
-> 2010         self._reader = parsers.TextReader(src, **kwds)
   2011         self.unnamed_cols = self._reader.unnamed_cols
   2012 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()```
FileNotFoundError: [Errno 2] No such file or directory: '/content/gdrive/My Drive/SafeGraph/Monthly Patterns Test/2-patterns/patterns.csv'

Files were saved to my google drive, but one is a 1.48 GB CSV file for the patterns data. I’m trying to open it to see if that data is correct and what I’m looking for, but given the size and fact it is taking over 20 minutes to open a file that should only contain one month of data I think there must have been an other thing I am doing wrong. I’m not sure if this has anything to do with it since the rest of the notebook runs fine, but I noticed this error after running the first !pip install boto3 command: Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.21.0,>=1.20.19->boto3) (1.15.0)
ERROR: requests 2.23.0 has requirement urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1, but you’ll have urllib3 1.26.3 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you’ll have folium 0.8.3 which is incompatible.
Installing collected packages: urllib3, jmespath, botocore, s3transfer, boto3
Found existing installation: urllib3 1.24.3
Uninstalling urllib3-1.24.3:
Successfully uninstalled urllib3-1.24.3
Successfully installed boto3-1.17.19 botocore-1.20.19 jmespath-0.10.0 s3transfer-0.3.4 urllib3-1.26.3

@Erin_Brown_Purdue_University I think the file size is within expectations—the Patterns files are quite big. However, let me know if the data is not filtered properly for some reason. How are you opening the data? Some tools aren’t well-equipped for working with big files like that, so they may take longer.

I believe the FileNotFoundError may be a result of a previous errored run of the program. When you ran it with months as a String instead of a List, it created the folder “2-patterns”. Then when you ran the tool again, it saw the folder “2-patterns” and expected data to be in it. one of the shortcomings of this lightweight tool is that the destination folder needs to be empty, otherwise you’ll run into this type of error.

Something you can do to decrease the file size is filter to just the columns you want to work with. This can be done by adjusting the code in the Colab notebook in the step where it saves the final dataframe to a CSV in Drive.

My Mac automatically tried to open the file with Numbers. I just wanted to preview the file so I didn’t think what I opened it with would have had a big impact, but considering it still isn’t open I should probably use a different software. Do you have any suggestions? Once I have all the data I want to use it on Tableau, but right now I’m just checking to see if the filtering worked

I opened the file with Excel and it worked! Thank you so much for all your help. I would have been so lost without your help

Oh, great! You’re welcome, I’m glad you got it working. Please let me know if there’s anything else I can help with. And if you start a new thread, feel free to ping me!