You can combine all files with some python code.
e.g., I use os.listdir() to find all of the paths in a folder and then a loop or list comprehension to read them all. something like this:
import os
dir_to_read = "your_local_path_for_the_data"
all_pat_files = [os.path.join(dir_to_read,filepath) for filepath in os.listdir(dir_to_read) if 'patterns' in filepath.lower()]
places_df_raw = pd.concat((pd.read_csv(os.path.join(f,"patterns.csv.gz"), dtype={'naics_code':str, 'postal_code':str}) for f in all_pat_files), sort=False)```
@Ryan_Fox_Squire_SafeGraph Thank you very much. I will try this code now.
@Ryan_Fox_Squire_SafeGraph I thiink it is not for social distance combing code? The file I download is “aws s3 sync s3://sg-c19-response/social-distancing/v1/”. the error message is :
"
ValueError Traceback (most recent call last)
<ipython-input-6-7e6c9499718a> in <module>
1 dir_to_read = "F:/Covid_19/SafeGraph/weekly-patterns"
2 all_pat_files = [os.path.join(dir_to_read,filepath) for filepath in os.listdir(dir_to_read) if 'patterns' in filepath.lower()]
----> 3 places_df_raw = pd.concat((pd.read_csv(os.path.join(f,"patterns.csv.gz"), dtype={'naics_code':str, 'postal_code':str}) for f in all_pat_files), sort=False)
C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283
C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
327
328 if len(objs) == 0:
--> 329 raise ValueError("No objects to concatenate")
330
331 if keys is None:
ValueError: No objects to concatenate```
@QIULI_SU_Louisiana_State_Univerisity can you please post the code you tried in addition to the error message?
@Ryan_Fox_Squire_SafeGraph yes. my first step is download the data use: aws s3 sync s3://sg-c19-response/social-distancing/v1/ ./social-distancing/v1/ --profile safegraph. Here is the my data path:
@Ryan_Fox_Squire_SafeGraph then I use the code: dir_to_read = “F:/Covid_19/SafeGraph/social-distancing”
all_pat_files = [os.path.join(dir_to_read,filepath) for filepath in os.listdir(dir_to_read) if ‘patterns’ in filepath.lower()]
places_df_raw = pd.concat((pd.read_csv(os.path.join(f,“patterns.csv.gz”), dtype={‘naics_code’:str, ‘postal_code’:str}) for f in all_pat_files), sort=False)
the error message is :
ValueError Traceback (most recent call last)
<ipython-input-10-7d51ae23e469> in <module>
1 dir_to_read = "F:/Covid_19/SafeGraph/social-distancing"
2 all_pat_files = [os.path.join(dir_to_read,filepath) for filepath in os.listdir(dir_to_read) if 'patterns' in filepath.lower()]
----> 3 places_df_raw = pd.concat((pd.read_csv(os.path.join(f,"patterns.csv.gz"), dtype={'naics_code':str, 'postal_code':str}) for f in all_pat_files), sort=False)
C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279 verify_integrity=verify_integrity,
280 copy=copy,
--> 281 sort=sort,
282 )
283
C:\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
327
328 if len(objs) == 0:
--> 329 raise ValueError("No objects to concatenate")
330
331 if keys is None:
ValueError: No objects to concatenate```
what do you see if you do
os.listdir(dir_to_read)
show you my path files. my target data is social-distancing/v1 data
(side note: you should now be using v2, not v1 of social distancing. )
probably dir_to_read
should be something like F:/Covid_19/SafeGraph/social-distancing/v1/2020/
(should change this to v2 at your earliest convenience)
You probably will need to set up some recursive loops like
for month in os.listdir(dir_to_read):
month_path = os.path.join(dir_to_read,month)
for day in os.listdir(month_path):
all_files.append(os.path.join(month_path, day))```
if you run that what does `all_files` look like?
you mean I should download V2 instead of v1?
we can finish troubleshooting on v1, but yes then you should download v2. the structure of files should be the same
and what does all_files
look like