Downloaded weekly pattern from S3, but datasets that I need to use is about 130GB - very hard to open

Hi everyone,
I downloaded weekly pattern from S3, but datasets that I need to use is about 130GB in total and it is very hard to open 130GB csv files. My current plan is upload these data to my own S3, but it might take a few days. Is there any method that I can read data directly from s3://sg-c19-response/weekly-patterns-delivery/weekly/? I tried to read directly, but my access is denied.

Hi @Tuo_Wu_Georgetown_University, we try to discourage reading directly from the Safegraph s3 bucket.

If you find yourself running out of local storage, I can think of 2 options that would be cheaper than buying a bigger harddrive

  1. You can purchase a 200gb Google Drive (3.99 a month) and then run everything through CoLab
  2. You can setup an s3 bucket (like you are already doing) and read it directly from there - - just remember you only get 20k free pulls a month (I capped that in 2 days)

Thanks for you answer Jack! My current uploading speed is about 700kb/s via CLI, do you know how to speed up this process?

unfortunately, I do not. I moved about 50gb and it took forever.

I notice your institution is Georgetown University. If you are physically there, us-east-1 is likely your best bet, but if not you can test region upload speeds to see what is fastest for you and then create a bucket for that region. Here is a link to test transfer by region

There is also a multithread option, but I am not sure how much it costs.

LINK

Thank you Jack. This suggestion is so valuable!

keep me posted! I am interested in hearing how the s3 solution works out!

No problem!

@Tuo_Wu_Georgetown_University I also have slow internet speed. Something that helped me was to fire up my own EC2 instance and remote desktop connect to it (using the year long free trial). I do most of my processing on the instance and then I only download the filtered data I need! Saves me a TON of time.