Suggestion: If you could provide direct SQL query access to the database it would be great for developers (specially students who can’t afford a powerful server with lots of space). Right now the process of filtering data is a bit tedious and we spend lot’s of time creating Python scripts to organize the data and filter it. If we could just query in sql and get the data we want will save lot’s of time.
Suggestion: If you could provide direct SQL query access to the database it would be great for developers
Hi @Angel_Langdon, we have considered something like this in the past. I think the hosting/request costs are prohibitive. Additionally, many users benefit from the
aws s3 sync functionality of S3 buckets. Fortunately, there’s a tool to make this process easier:
There is a demo for using Python to download and filter the data. I use it pretty regularly, and a number of other community members do too. It’s not super robust because it is meant to just give users a head start in using the data. It does everything in Google Drive so that your machine doesn’t get bogged down by the huge files. Then you can download the filtered files if you want, or work with them in something like Google Colab
Please let me know if you find it valuable (especially if you think this might be ideal for students). If there is demand, I can put some time into making it more robust and compatible with the other Patterns products (Weekly/Neighborhood)!
Yes, Ryan Kruse this notebook was very useful for us, we have have implemented it (copy pasted ) in our project.
If you are planning to make it more robust you could also use some of our created functions (Project2021/download_safegraph_data.py at main · angel-langdon/Project2021 · GitHub) Specially the SafeGraphSession class could help you with that. If you need me to comment the code to be more understandable just ask!.
I didn’t know that costs would be much more higher (even prohibitive) in comparison with such a big s3 bucket . Thank you for your response!
If your are planning to make it more robust I would suggest using regular expressions (I don’t think it’s even necessary, maybe just a simple split) combined with datetime objects to parse file paths
Thank you for the suggestions! I’m glad you found the notebook useful. I will let you know if I have any questions about the code
@Angel_Langdon, we have a very MVP of an API and will be adding on different query filters. which filters would you find most helpful in getting just the data you need? Shop Update
@Lauren_Spiegel_SafeGraph I would find very very useful query patterns data by city, state, brands and date range (date_range_start). With this filters we would be much more productive
thank you for the feedback!
@Karissa_Paddie_SafeGraph – interesting idea