The SafeGraphR package is now fully functional and has enterered beta

Nick_H-K_Seattle_University · July 10, 2020, 12:00am

Exciting news for a Friday afternoon: the SafeGraphR package is now fully functional and has entered beta! The package is designed to make it easy to download, read in, and process SafeGraph patterns and stay-at-home data.

Features:

Read and compile files (from AWS or from the shop) into R, including the ability to expand JSON columns and aggregate to different levels while minimizing memory footprint (plus, do multiple aggregations/expansions in a single file-read to save time)
Produce a POI-NAICS link file
Perform normalization: to sample size, to adjust for sampling rate differences, hierarchical Bayes shrinkage, seven-day moving averages, and scaling relative to a date or relative to previous year
Helper functions and data sets make it easy to pull state and county FIPS codes from census block group codes, to link those FIPS codes to the actual names of the places, and also to put names on the NAICS codes.
You can find installation information as well as two informative vignettes that walk you through working with patterns and stay-at-home data, respectively, at the package’s website here: Package for Processing and Analyzing SafeGraph Data • SafeGraphR

Let me know any issues you have.

Jude_Bayham_Colorado_State_U · July 10, 2020, 10:39pm

Really nice work! Are the json expansion utilities able to handle the variable size fields now? Thanks

Nick_H-K_Seattle_University · July 10, 2020, 10:44pm

That is unfortunately not there yet. But it’s on my list for an update

Jude_Bayham_Colorado_State_U · July 10, 2020, 10:45pm

10-4. What is your strategy? I’ve been lapplying over rows… inefficient, but feasible for subsets. I’d like to contribute if you find it useful.

Nick_H-K_Seattle_University · July 10, 2020, 10:45pm

I might try to make the expansion produce something in mergeable format and then let merge do the work.

Jude_Bayham_Colorado_State_U · July 10, 2020, 10:48pm

K. I’ll let you know if I write anything useful. Thanks again. Also, your affiliation is changing soon, right? Congrats

Nick_H-K_Seattle_University · July 10, 2020, 10:51pm

THat’s right, I’ll be at Seattle University. Thanks!

Nick_H-K_Seattle_University · July 22, 2020, 9:04am

@Jude_Bayham_Colorado_State_U Check the most recent SafeGraphR update, it can now handle unequal categories across rows. I haven’t checked how slow it gets on the big data (unfortunately it does have to epxand the data all the way before collapsing, rather than piece-by-piece as before) but it routes everything through data.table so should at least be faster than lapplying

Jude_Bayham_Colorado_State_U · July 22, 2020, 9:23pm

Great! @Nick_H-K_Seattle_University, I’ll let you know if I have any issues/comments. I’m sure it will benchmark better than my solution. Thanks!

The *SafeGraphR* package is now fully functional and has enterered beta

The SafeGraphR package is now fully functional and has enterered beta