Bulk data download and Python in R

This will be the last post for downloading data from Dewey. This post will allow users to download data for a specific period for specific data (e.g., from March 2020 to December 2022 Advan weekly patterns). Also, you can download data for a specific week. The download code is written in Python though, and I will not translate it to R code. The good news is you can use any Python codes in R without converting them to R codes. This article will help you download files. However, in general, this article will help R users utilize any Python code without any translation between them.

First, the download Python code is here. So, please read it. Let’s assume that you saved the full code as “dewey_mp.py” in “./Py” subfolder in your R project.

Second, you need to set up RStudio to use Python in R. The R package reticulate does all the magic. Here is the R code to set up with guiding comments. You can follow the flow.

# Install reticulate package if first-time use
# 
install.packages("reticulate");

# Load library
library(reticulate);

# Install miniconda if first-time use-----------------
# miniconda is package management software for Python and otehr

# Install miniconda. This will take a while
install_miniconda(path = "C:/Temp/miniconda", update = T);

# This doesn't do anything.
# This will display the current virtual environment list
# Virtual environment list. This will be explained and setup later
conda_list(conda = "C:/Temp/miniconda/_conda.exe");
# ----------------------------------------------------

# Create new virtual environment (venv)
# Python keeps a copy version of Python environment and
# installed packages in venv.
# Change env_path only. ------------------------------
env_path = "C:/Temp/miniconda/envs/venv_sample";
# ----------------------------------------------------

# conda.exe path
conda_path = "C:/Temp/miniconda/_conda.exe";

# Extract venv_name (venv_sample) from env_path
evn_split = strsplit(env_path, "/")[[1]];
env_name = evn_split[length(evn_split)]; # env_name
env_name;

# Create venv if first time use
# Create virtual enviroment "venv_sample"
conda_create(envname = env_path, conda = conda_path);

# This will show newly created "venv_sample"
conda_list(conda = conda_path);
#conda_remove("venv_test")

# ------------------------------------------------------------------------------
# !! You have to run this section whenever start new R session!!----------------
# Set Python venv to "venv_sample".
# to make sure Python uses "venv_sample" as the venv.
# Otherwise, R will use system defalut Python
# which may cause error (especially when some required Python packages are
# not installed.)
use_condaenv(condaenv = env_name, conda = conda_path);

# Set up Python system path.
# Assuming Python sources are in the ./Py folder,
# add ".\\Py" to the system path so that Python can search source code files.

# Impoort Python's "sys" package
py_sys = import("sys")

# Append ".\\Py" to the system path.
# R users!!:
# Python is case sensitive for folders when calling source code.
# Be cautious in using upper and lower characters for folders.
# "from py.sub_module import *" will be error because "py" is in lower character.
py_sys$path = c(py_sys$path, ".\\Py")
# ------------------------------------------------------------------------------

# Install required packages
# List installed Python packages
py_list_packages();

# Everything will be installed to "venv_sample" folder
# May take time. Be patient...
py_install(packages = c("pandas")); # provides Python version of data.frame
py_install(packages = c("requests")); # allows you to send HTTP/1.1 requests
# py_install(packages = c("scikit-learn"); # regression
# py_install(packages = c("matplotlib")); # plot
# py_install(packages = c("seaborn")); # plot
# py_install(packages = c("geopy")); # geo

# Test block
if(F) {
  print("Python setup test pandas data frame test.---");
  # Test
  pd = import("pandas");
  pd$array(c(1, 2, 3));
  
  print("--------------------------------------------");
}

One important thing is that you only need to set up things once but have to run the following every time you start a new R session:

library(reticulate);

# venv path
env_path = "C:/Temp/miniconda/envs/venv_sample";
# conda.exe path
conda_path = "C:/Temp/miniconda/_conda.exe";

# Extract venv_name (venv_sample) from env_path
evn_split = strsplit(env_path, "/")[[1]];
env_name = evn_split[length(evn_split)]; # env_name
env_name;

use_condaenv(condaenv = env_name, conda = conda_path);

# If you saved your Python codes in "./Py" folder
py_sys = import("sys")
py_sys$path = c(py_sys$path, ".\\Py")

Now you are all set to use Python in R.

Finally, save the Python full code as “dewey_mp.py” in “./Py” folder in your R project folder. Run the following to import Python objects to R objects (functions, variables, etc.). You can use any Python functions and data in R.

# This will import Python objects to R objects (functions, variables, etc.)
source_python("dewey_mp.py");

# Now you can use Python functions in R with the same usage as in Python
tkn = get_access_token("user_name", "pass_word");
tkn;

# You need to specify as.integer to tell Python that
# 201901 and 202305 are integers not double.
download_files(user_name, pass_word, "D:/temp",
               as.integer(201901), as.integer(202305),"ADVAN", "WP");

# You need to specify as.integer to tell Python that
# 20210215 is an integer not double.
download_weekly_files(user_name, pass_word, "D:/temp",
                      as.integer(20210215), "ADVAN", "WP");

You probably noticed the use of as.integer(201901). This is because R stores 201901 as a numeric type (double), while Python is expecting an integer.

All set!

Enjoy Python in R!

Donn

Donn (Dongshin) Kim

1 Like