Your files on Dewey Data just became MUCH more accessible!
As of August 23, 2023, the notebook linked in this post is outdated. Please use the updated notebook instead.
This notebook (outdated as of 8/23/23, see above message) provides a tool to access Dewey Data in bulk via API. Simply follow the instructions to have your data saved into Google Drive, where you can continue analysis or download to your local machine.
Some useful features:
Download multiple data files in one go
Easily filter to the data products you plan to work with to avoid wasting eating up storage n your analysis environment
Preprocess your data before downloading into your analysis environment, allowing further storage optimization and saving compute power
While the File Browser in the Dewey Marketplace provides a handy interface to download your files, programmatic access may be preferable for many users. The structure of the requests shown in this notebook may be applied in other environments, such as in R on your local machine.
Run the code yourself and be sure to reply with any questions, feature suggestions, or other feedback!
Hi @Yingjie_Li_MSU, the foot traffic provider has recently switched from SafeGraph to Advan going forward. We will be rolling out some Advan content this week (with Google Colab) so users can get a comparison of Advan vs SafeGraph.
In the meantime, I would suggest checking out this post, which discusses the methodology differences between Advan and SafeGraph. The post specifically touches on normalization for longitudinal analysis, the topic of the notebook you shared above. (Note, that notebook was made before there were pre-packaged normalized columns included in the dataset, so you may not need to compute the normalized visit counts yourself anymore if one of the included columns works for you.)
I have tried to use it to download the files. It works well at first but after every 200 files are downloaded, the HTTP error occurs and I have to restart my code to rerun. Is there anything suggestion on fixing this bug?
Thanks Evan, I have addressed it by myself. It seems like each access token can only download the files several times. I revise the read_file function as follows and it works fine now.