Error with Arrow package to read in multiple SafeGraph csv.gz files

scott.stetkiewicz · April 4, 2023, 8:48pm

I hit the same issue, seems like long fields (i.e. geometries, etc.) need more block space than the default settings. You can adjust the block_size parameter inline:

dat <- open_dataset(data_path, format='csv', partitioning = c('month'), block_size=1e9)

1e9 did the trick for me, though for good practice you’d probably want to play around a bit with that.