Read_patterns not working properly in SafeGraphR package

JingWu · April 28, 2023, 4:14pm

Hi all,
I’m trying to use the ‘read_patterns’ command in R to explode the ‘visitor_home_cbgs’ variable that is JSON format. However, it is not working when I try to expand. The error message is here:

Error in bmerge(i, x, leftcols, rightcols, roll, rollends, nomatch, mult, :
Incompatible join types: x.visitor_home_cbgs (double) and i.visitor_home_cbgs (character)

Here is my code:
week_data_bycbg ← read_patterns(“2023/04/17/patterns_weekly.csv”, expand_cat = “visitor_home_cbgs”,
expand_name = “home_cbgs”,
select = c(“raw_visit_counts”, “raw_visitor_counts”, “normalized_visits_by_total_visits”, “visitor_home_cbgs”),
gen_fips = TRUE)

I also tried other ways to expand, like this:
week_data ← week_data %>%
mutate(visitor_home_cbgs = map(visitor_home_cbgs, ~ fromJSON(.) %>%
as.data.frame())) %>%
unnest(visitor_home_cbgs)
and it also gives error message. I wonder why this happens? (I am suspecting it has something to do with the strings in the home CBGs - some CBG fips in CA have character ‘CA’ in it - but I am not sure how to fix it.)

Any help would be appreciated. Thanks!

evan-barry-dewey · May 1, 2023, 4:47pm

@Christian_Gunning_University_of_Georgia - any idea what’s happening here?

Christian_Gunning_University_of_Georgia · May 1, 2023, 7:33pm

Of note, SafeGraphR might not be the best route for new analysis. It predates the move of SG datasets to Dewey (and I believe that pattern file format changed with the move). So, first off, is your CSV file sourced from the SG store or Dewey?

I would suggest start by reading the CSV without any processing via data.table::fread. I suggest using something like nrows=1e4 and select=... to start with a subset of data, and then manually inspect. E.g.,
summary(factor(week_data$visitor_home_cbgs)).

You may also want to inspect exactly what the read_patterns function is doing by looking at the code (simply type read_patterns in console and hit enter). It looks like your offending line is:

if (!is.null(expand_cat) | !is.null(expand_int)) {
    patternsb <- merge(patternsb, expanded_var, all = TRUE,
        by = by)
}

In your second attempt, I suggest trying a “pure” data.table approach + jsonlite and using intermediate variables rather than pipes for starters. Switching between data.table, dplyr, and data.frame is adding extra complexity here.

Edit: you might want to start with SafeGraphR::expand_cat_json. I’m able to parse SafeGraph’s json columns of type array directly with jsonlite (e.g. visits_by_hour). However, the dict columns contain extra " characters that get stripped out by SafeGraphR.

@evan_barry_dewey , it’s odd to me that the shipped CSV file contents don’t match the SG docs (Patterns). Is there an obvious reason for the extra \" in SG patterns csvs, e.g., “{""060530002002"":4}”?