Tool: Automated Dewey Data API Access

Your files on Dewey Data just became MUCH more accessible!

:warning: :warning: :warning:

As of August 23, 2023, the notebook linked in this post is outdated. Please use the updated notebook instead.

:warning: :warning: :warning:

This notebook (outdated as of 8/23/23, see above message) provides a tool to access Dewey Data in bulk via API. Simply follow the instructions to have your data saved into Google Drive, where you can continue analysis or download to your local machine.

Some useful features:

  • Download multiple data files in one go
  • Easily filter to the data products you plan to work with to avoid wasting eating up storage n your analysis environment
  • Preprocess your data before downloading into your analysis environment, allowing further storage optimization and saving compute power

While the File Browser in the Dewey Marketplace provides a handy interface to download your files, programmatic access may be preferable for many users. The structure of the requests shown in this notebook may be applied in other environments, such as in R on your local machine.

Run the code yourself and be sure to reply with any questions, feature suggestions, or other feedback!

2 Likes

Everytime I click "Browse my files’, it shows ‘API Error’ and ‘Error retriving file list’. Do you know what is wrong?

Hi @yangsong , there may have been a temporary tech issue. Are you still seeing the issue?

Is there a list of updated colab tutorials like this? Google Colab

Hi @Yingjie_Li_MSU, the foot traffic provider has recently switched from SafeGraph to Advan going forward. We will be rolling out some Advan content this week (with Google Colab) so users can get a comparison of Advan vs SafeGraph.

In the meantime, I would suggest checking out this post, which discusses the methodology differences between Advan and SafeGraph. The post specifically touches on normalization for longitudinal analysis, the topic of the notebook you shared above. (Note, that notebook was made before there were pre-packaged normalized columns included in the dataset, so you may not need to compute the normalized visit counts yourself anymore if one of the included columns works for you.)

Thank you for sharing this amazing API tool.

I have tried to use it to download the files. It works well at first but after every 200 files are downloaded, the HTTP error occurs and I have to restart my code to rerun. Is there anything suggestion on fixing this bug?

Thank you!!

Thanks for letting us know. Can you share the email address you’re using for the account so we can check the logs to investigate?

Thanks Evan, I have addressed it by myself. It seems like each access token can only download the files several times. I revise the read_file function as follows and it works fine now.

def read_file(data_url, token=access_token):
    base_url = "https://marketplace.deweydata.io"
    full_url = f"{base_url}{data_url}"
    headers = {'accept': 'application/json', 'Authorization': f'Bearer {token}'}
    # df = pd.read_csv(full_url, storage_options=headers, compression='gzip')
    try:
        df = pd.read_csv(full_url, storage_options=headers, compression='gzip')
    except:
        access_token = get_access_token(un, pw)
        # print(access_token)
        headers = {'accept': 'application/json', 'Authorization': f'Bearer {access_token}'}
        df = pd.read_csv(full_url, storage_options=headers, compression='gzip')
    print(df.shape)
    return df