I’ve been trying to combine the monthly patterns data into one dataset to analyze with Tableau but am running into a whole lot of issues and just can’t seem to make it work. Does anyone have this data already combined?

Erin_Brown_Purdue_University · March 2, 2021, 12:00am

I’ve been trying to combine the monthly patterns data into one dataset to analyze with Tableau but am running into a whole lot of issues and just can’t seem to make it work. Does anyone have this data already combined and would be willing to share the file with me? Or any easy ways to filter (I’m only interested in restaurant data) and combine datasets into a Tableau friendly form?

Jack_Lindsay_Kraken1 · March 6, 2021, 12:30am

Hi @Erin_Brown_Purdue_University, I believe you have been working with @Ryan_Kruse_MN_State on this, just checking in to see if you have your data in working condition yet?

Erin_Brown_Purdue_University · March 7, 2021, 6:28pm

Yeah I’ve gotten a lot of help from Ryan on this. I do have one question you might be able to help me with though. When using the Notebook to filter the data, how do I specify which columns I want to download? Right now my only filter is df = chunk[chunk.top_category.isin([‘Restaurants and Other Eating Places’])] but I would love to be able to specify the specific columns I want downloaded and I can’t seem to figure out how to do that.

Jack_Lindsay_Kraken1 · March 8, 2021, 4:21am

Are you familiar with the Pandas library in python?

Jack_Lindsay_Kraken1 · March 8, 2021, 4:21am

https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html

Erin_Brown_Purdue_University · March 8, 2021, 3:28pm

Thank you so much! That article was super helpful! I’m still having an issue with one of my filters, where the raw_visits_count needs to be more than 100. Right now I’m using int(“raw_visit_count”) > 100, but it gives the error “ValueError: invalid literal for int() with base 10: ‘raw_visit_count’”. I’m using int() because when I just had “raw_visit_count” > 100 it said that > can’t be used between an integer and a string. When I used the exact format the article used patterns[“Age”] > 100 it said that patterns was undefined. What would be the proper syntax for this filter?

Jack_Lindsay_Kraken1 · March 8, 2021, 7:14pm

I believe you are looking at this section:

df = chunk[chunk.top_category.isin(['Restaurants and Other Eating Places']),"placekey", "parent_placekey", "parent_safegraph_place_id","location_name", "sub_category", "category_tags", "latitude", "longitude", "city", "region", "postal_code", "open_hours", "raw_visit_counts", "raw_visitor_counts", "visits_by_day", "bucketed_dwell_times", "median_dwell", "popularity_by_hour", "popularity_by_day", int["raw_visit_count"] &gt; 100]```
@Ryan_Kruse_MN_State has whipped something up that might help -- if not I can help troubleshoot your way through it

```#TODO: HERE IS WHERE YOU SET THE SPECIFIC FILTERS YOU WANT FOR YOUR DATA
columns_to_keep = ["placekey", "parent_placekey", "parent_safegraph_place_id","location_name", "top_category", "sub_category", "category_tags", "latitude", "longitude", "city", "region", "postal_code", "open_hours", "raw_visit_counts", "raw_visitor_counts", "visits_by_day", "bucketed_dwell_times", "median_dwell", "popularity_by_hour", "popularity_by_day"]
df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Places']) &amp; (chunk.raw_visit_count &gt; 100)]
df = df[columns_to_keep]```
Try this out and let me know how it works

Erin_Brown_Purdue_University · March 8, 2021, 7:47pm

I’m now getting the error “AttributeError: ‘DataFrame’ object has no attribute ‘raw_visit_counts’” any idea why that might be considering that ‘raw_visit_counts’ one of the columns of the dataset?

Erin_Brown_Purdue_University · March 10, 2021, 3:05pm

I also tried just removing that filter but then I got: KeyError: “[‘raw_visit_counts’, ‘popularity_by_hour’, ‘popularity_by_day’, ‘raw_visitor_counts’, ‘parent_placekey’, ‘bucketed_dwell_times’, ‘median_dwell’, ‘visits_by_day’] not in index” I double checked the patterns schema and those are all in there and spelled correctly, so I don’t understand what is going wrong. Do you have any ideas @Jack_Lindsay_Kraken1 or @Ryan_Kruse_MN_State?

Ryan_Kruse_MN_State · March 10, 2021, 5:43pm

Hi @Erin_Brown_Purdue_University, I see the issue. The filterCombineDelete() function is being used to filter, combine, and delete both the Core Places data and the Patterns data. The error is occurring in the getCoreFile() step. You set the filter to keep only certain Patterns columns. But those columns are not in the Core Places data, which is causing the error.

Try this so that it only filters by columns on the Patterns data (and not on the Core Places data):

      df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Places'])&amp;(chunk.raw_visit_counts &gt; 100))]
      if keyword=='patterns':
        columns_to_keep = ["placekey", "parent_placekey", "parent_safegraph_place_id","location_name", "top_category", "sub_category", "category_tags", "latitude", "longitude", "city", "region", "postal_code", "open_hours", "raw_visit_counts", "raw_visitor_counts", "visits_by_day", "bucketed_dwell_times", "median_dwell", "popularity_by_hour", "popularity_by_day"]
        df = df[columns_to_keep]```

Erin_Brown_Purdue_University · March 12, 2021, 3:11am

@Ryan_Kruse_MN_State I’m still getting the “AttributeError: ‘DataFrame’ object has no attribute ‘raw_visit_counts’”. I think it is because in the first line is filtering the Core Places data which doesn’t have the ‘raw_visit_counts’ attribute. So I changed the code to be

      df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Places']))]
      if keyword=='patterns':
        df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Places'])&amp;(chunk.raw_visit_counts &gt; 100))]
        df = chunk[(chunk.top_category.isin(['Restaurants and Other Eating Places'])&amp;(chunk.raw_visit_counts &gt; 100))]
        columns_to_keep = ["placekey", "parent_placekey", "parent_safegraph_place_id","location_name", "top_category", "sub_category", "category_tags", "latitude", "longitude", "city", "region", "postal_code", "open_hours", "raw_visit_counts", "raw_visitor_counts", "visits_by_day", "bucketed_dwell_times", "median_dwell", "popularity_by_hour", "popularity_by_day"]
        df = df[columns_to_keep]```
Which executes well until the very end where I get a KeyError 2 and only the brand_info.csv, core_poi.csv, and README.txt files are uploaded to my google drive. The same KeyError 2 happens when I just remove `(chunk.raw_visit_counts &gt; 100)` from your previous solution. Do you have a clue to what could be causing the KeyError 2?

Ryan_Kruse_MN_State · March 12, 2021, 4:33pm

@Erin_Brown_Purdue_University Check your months variable. I think it is a string, and it needs to be a list .

months = sorted(date_dict.keys())[12]
print("\nYou chose these months:", months)```
&gt; *Output:*
&gt; 
&gt; Months: ['2018-01', '2018-02', '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12', '2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08', '2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02']
&gt; 
&gt; You chose these months: 2019-01
Needs to be switched to something like:
```print("Months:",sorted(date_dict.keys()))
months = sorted(date_dict.keys())[12:13] #MUST BE A LIST
print("\nYou chose these months:", months)```
&gt; *Output:*
&gt; 
&gt; Months: ['2018-01', '2018-02', '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12', '2020-01', '2020-02', '2020-03', '2020-04', '2020-05', '2020-06', '2020-07', '2020-08', '2020-09', '2020-10', '2020-11', '2020-12', '2021-01', '2021-02']
&gt; 
&gt; You chose these months: ['2019-01']
When `months` isn't a `list`, the code does a for loop through `months` as a string.
• When `months = '2019-01'`  (a string), then `months[0] = 2`
• When `months = ['2019-01']` (a list), then `months[0] = '2019-01'`, which is what the code is expecting

Erin_Brown_Purdue_University · March 12, 2021, 9:16pm

That worked!! The data downloaded and everything looks right! It did raise a KeyError: ‘safegraph_place_id’ but since the data I need is all there I’m just going to ignore that error message. Thank you so much for all your help @Ryan_Kruse_MN_State