Hi all, does anyone have any recommendations for dealing with the json object that stores information on visitor home cbgs? I’m trying to construct a measure of what percent of visitors in a given county came from another county. I came up with a really hacked solution, but I imagine there’s a faster way to parse through these n=1MM weekly datasets, lol.
Hi @Stan_Oklobdzija_CA_YIMBY are you looking for a quicker way to explode the jsons or do the actual calculation?
More the latter, but I assume to do the calculation, one must explode the jsons?
Yes, I was going to recommend you check out the safegraph_py library for both single core and multicore explode functions that are pretty quick ( https://github.com/SafeGraphInc/safegraph_py)
If you are looking for a faster solution to the post explode part, i would love to see your code and see if i can help
I managed to do so with some pretty hacky R code. But it’s quite slow and I imagine there must be a better way to do it. I’ll take a look at those libraries, thanks!
Oops just assumed you were a python user. Sorry about that. Here is a link to the safegraph R package
It’s ok! I can do both, (though I’m better at R.) No need to apologize!
Looks like the python library handles exploding te jsons much better.
I dont have a ton of experience with R, but i feel like python goes faster sometimes
Yeah, it looks like there’s prepackaged functions to turn the jsons into a pandas DF, but nothing equivalent for R.
Oh wait! Looks like expand_cat_json is what I as looking for!
Just so you know, there are quite a few R users in the community - they would likely be able to help you with any of your R needs
<#C013B8TSETG|r-troubleshooting>
Oh, I didnt know about that channel. Thanks!
no problem!
I’ve previously used R code that I modified from @Derek_Ouyang_Stanford to expand the visitor origins json column. This is their website that has lots of code examples: covid19
Inside the “safegraph_normalization_function.R” file is a function called “expandOrigins” that may help you.
Thank you!
Hello @jack_lindsay_kraken1 @Stan_Oklobdzija_CA_YIMBY I’m interested in your question and wondering if you could assist with python code snippet to measure percent of visitors in a given county that came from another county to visit POIs. I have exploded the json string object (using vertically_explode_json() function) but discovered that once I did so, the dataframe was larger than the previous size (e.g., ~5000 to ~650000 rows).