Bulk data files download with R

Hello Everyone.

I needed to download a bunch of files from Dewey to my hard drive in R. I found one example in Python ( Tutorial: How to Access Files via the API - Help - Dewey Community (deweydata.io)) but not in R. So I made one. Hope this helps R users.

# Load R libraries
library(httr);
library(rjson);
library(jsonlite);
library(rstudioapi);

# Define global variables
DEWEY_TOKEN_URL = "https://marketplace.deweydata.io/api/auth/tks/get_token";
DEWEY_MP_ROOT   = "https://marketplace.deweydata.io";
DEWEY_DATA_ROOT = "https://marketplace.deweydata.io/api/data/v2/list";

# Get access token
get_access_token = function(username, passw) {
  response = POST(DEWEY_TOKEN_URL, authenticate(username, passw));
  response_content = content(response);

  return(response_content$access_token);
}

# Return file paths in the sub_path folder
get_file_paths = function(token, sub_path = NULL) {
  response = GET(paste0(DEWEY_DATA_ROOT, sub_path),
                 headers=add_headers(Authorization = paste0("Bearer ", token)));
  
  json_text = content(response, as = "text", encoding = "UTF-8");
  
  response_df = as.data.frame(fromJSON(json_text));
  response_df;
  
  return(response_df);
}

# Download a single file from Dewey (src_url) to a local destination file (dest_file).
download_file = function(token, src_url, dest_file) {
  options(timeout=200); # increase the timeout if you have a large file to download
  download.file(src_url, dest_file, mode = "wb",
                headers = c(Authorization = paste0("Bearer ", token)));
}

# Example ----------------------------------------------------------

# Avoid including your credentials in the code.
# (You can hard type your credentials in the code as well, though.)
user_name = askForPassword("User name (email address)");
pass_word = askForPassword("Password");

# Get access token
tkn = get_access_token(user_name, pass_word);
tkn;

# Get file paths in the "/2018/01/01/SAFEGRAPH/MPSP" sub folder.
file_paths = get_file_paths(token = tkn,
                            sub_path = "/2018/01/01/SAFEGRAPH/MPSP");
head(file_paths);

# Download the first file to C:/temp/, as an example.
# In the file_paths data.frame,
# url[1] looks like:
# "/api/data/v2/data/2018/01/01/SAFEGRAPH/MPSP/20180101-safegraph_mpsp_cpgp_part9_0"
# and name[1] looks like: "core_poi-geometry-patterns-part9.csv.gz".
src_url = paste0(DEWEY_MP_ROOT, file_paths$url[1]);
dest_file = paste0("C:/temp/", file_paths$name[1]);

download_file(tkn, src_url, dest_file);

# Done!
6 Likes

Awesome! Thanks for creating this. cc @ryank

2 Likes

This is great, thanks!
One suggestion/request - can you place code inside “preformatted output” tags (the </> in editing bar):

# url[1] looks like:
# "/api/data/v2/data/2018/01/01/SAFEGRAPH/MPSP/20180101-safegraph_mpsp_cpgp_part9_0"
# and name[1] looks like: "core_poi-geometry-patterns-part9.csv.gz".

This solves the problem of long lines getting cut off, as a scroll bar gets added.

1 Like

@donn.kim, thank you so much for doing this! Without almost any knowledge of R, I was able to get the csv files. I was wondering if you can show me how can I process the data before downloading it. I need to group by state. Thanks again!

1 Like