I'm trying to read files directly from s3 with pandas read_csv and am getting a Permission Access Denied error

I’m trying to read files directly from s3 with pandas read_csv and am getting a Permission Access Denied error. I installed s3fs, but I thought I didn’t have to specify credentials because they are automatically put into an accessible directory. I am on a windows machine, could anybody help? thanks

the s3 bucket is not public, so you do have to provide credentials somewhere in the request.

I also don’t think you can use pd.read_csv() directly. When I read from s3 i need to use boto and IO

sort of like this:

  s3 = boto3.client('s3')
  obj = s3.get_object(Bucket=bucket, Key=file)
  df = pd.read_csv(io.BytesIO(obj['Body'].read()), dtype=dtype)
  return(df)```
i have never actually done this with credentails before, but here is a useful stackoverflow; you can set the credentials as sn environment variable

https://stackoverflow.com/questions/37703634/how-to-import-a-text-file-on-aws-s3-into-pandas-without-writing-to-disk

@Rohan_Bansal let me know if that helps ^^

Yeah, I see that you can directly set the environment variables, but is there a way to not explicitly include them in the code. Setting the file variables like AWS_CONFIG_FILE and such didn’t seem to work.

You actually should be able to use read_csv directly. Just make sure to set up your credentials with awscli first (and I don’t think it can handle profiled credentials? Has to be default access keys).

(Although not sure if this applies to windows!)

Yeah it doesnt seem to work. Do I need to explicitly provide the path to the config file somewhere? It works if I define them explicitly in my code with os.environ

To read files directly from s3 with pandas read_csv with an AWS profile, you can do something like

import s3fs
fs = s3fs.S3FileSystem(profile_name="safegraph")
with fs.open("sg-c19-response/social-distancing/v1/2020/04/05/2020-04-05-social-distancing.csv.gz", "rb") as f:
    df = pd.read_csv(f, escapechar='\\', compression='gzip')```
It works with python 3.7.5, pandas 0.24.2, and s3fs 0.2.2