Analysis of people traveling to vaccination sites, and derive demogrpahic characteristics using their home locations

Maybe this has already happened, but I think it’d be cool if someone could do an analysis of people traveling to vaccination sites, and derive demogrpahic characteristics using their home locations. This would be useful in many states (like Texas, Mississippi, California) where there’s a public comprehensive list of vaccination sites but data on race/ethnicity of those being vaccinated is sparse.

Hi @Jason_Kao_Columbia_University, very cool idea! @Ryan_Kruse_MN_State might have done something similar! I know he has worked a lot with both voter and vaccination sites.

Could you link to some of the vaccination site data you have come across? It would be really neat to work with nonetheless!

Sure! Annoyingly, each state/territory has its own way of reporting this information. But as an example, Texas releases this data as a CSV (visualized here). Louisiana’s provider list is here.

I’ve recently started scraping some of this data daily with GitHub Actions, as it may be useful esp for comparing SafeGraph travel surges + appointment availability

Awesome! Yeah, I understand the struggle - we tried putting these together in the past when the vaccine sites first popped up, but the coverage is so spotty and differnt.

You can probably create a beautiful Soup 3 app to scrap this website and run every zipcode through it and probably get a clean-ish dataset

I am interested in how these locations would work being placekey’d as well!

Keep us posted on this @Jason_Kao_Columbia_University! I would love to help if I can - very cool project!

Thank you for sharing that source!!

I will! First however I need to renew my academic data sharing contract heh, will send that request in now

Giscorps has posted a shapefile for vaccination sites https://covid-19-giscorps.hub.arcgis.com/datasets/c50a1a352e944a66aed98e61952051ef_0

I’ve been working on finding the corresponding POIs but it’s a bit of a pain in my computer

@Martin_Andersen_UNC_Greensboro have you tried either a) cross referencing with safegraph shapefiles or b) checking for lat/lng of poi data within those polygons?

Or is the processing power that you are referring to as a pain in the computer?

I run out of memory. Even after I restrict to poi’s in the same county

Aha. Perhaps try batch processing or something like Dask?

Not familiar with dask, I was using the map function from purrr to run the merge within counties. I’ll look at a more restrictive batch procedure

Keep me posted! Sounds like a cool project

Sorry, I’m a bit late to the party here. @Martin_Andersen_UNC_Greensboro, I glanced at that giscorps dataset—are they all vaccination sites? Or is there a way to filter to distinguish vaccination sites from testing sites? It isn’t clear to me. Also I’m not seeing any lat/lng columns, just addresses.

Either way, if you’d like to find the corresponding POIs for each site, this would be an excellent use case for Placekey. Using the Placekey API, you an add Placekeys to each of the testing/vaccination sites based on the provided address. Then you can merge with SafeGraph’s Places (on the Placekey column, which comes built-in with SafeGraph data). This way you don’t have to use any lat/lng, and it shouldn’t be very memory intensive at all. Would that be helpful for what you’re trying to do?

@Ryan_Kruse_MN_State I’ve mainly been toying with these data, but @Jason_Kao_Columbia_University’s comment made me interested in thinking about where people are coming from. Plus the Krispy Kreme effect :slightly_smiling_face:.

I have gotten the linkage to work, though. I had miscoded something and now it works.

@Martin_Andersen_UNC_Greensboro That’s great news! How many of the vaccination sites were successfully covered by/matched with placekey id’s?

@Martin_Andersen_UNC_Greensboro @Ryan_Kruse_MN_State hmm it also seems like there are some missing CVS pharmacies in the GISCorps data. https://vaccinefinder.org/ is more comprehensive (and uses an unauthenticated API) but the API returns a maximum of 50 sites per lat/lng query. So to scrape data we’d need to input thousands of individual lat/lng queries. One approach to the whole analysis maybe could be done on a city-by-city basis. One good test city could be Richmond, which has nearly 40% of vaccrace data with unreported race/ethnicity

@Jason_Kao_Columbia_University @Martin_Andersen_UNC_Greensboro It could be worth contacting VaccineFinder and seeing if they’re willing to give some sort of data dump instead of having to ping the API thousands of times