Reading In a Portion of Advan Data


When I try to download a sample dataset from Advan (weekly patterns to be exact), the dataset is so big that I can’t open it. Is there a way to only read the first 100 rows or so, that way I can see how the data looks and practice with it?

Any step by step instructions on how to properly download and sample code on how to view a portion of the sample data will help!

We have this tutorial for taking a quick look at the data without downloading

We’re also launching a large upgrade to our product in a few weeks that will allow you to see sample row and all of the column attributes for every dataset on the platform without downloading the data.

For now, you can find the Advan attributes here: Documentation (Public) - Google Drive

This will help you to use Python code direct in R without needing to convert them to R codes:


Okay thanks. So all I will need to do is copy this code exactly into python with only changing the url link?

Yes, that should work.

But still, the current function will read the entire one csv file (.gz), which can be ~280MB on your computer memory. I think it should be okay, but if you really want to read the first 100 or so rows, please add nrows = 100 parameter to pd.read_csv function as below.

# Directly read csv data from gz file
def read_data(token, file_url, timeout = 300):
    # token = tkn
    # file_url = "/api/data/v2/data/2022/12/26/ADVAN/WP/20221226-advan_wp_pat_part99_0"
    # timeout = 300
    src_url = DEWEY_MP_ROOT + file_url
    response = requests.get(src_url, headers={"Authorization": "Bearer " + token}, timeout=timeout)
        csv_df = pd.read_csv(BytesIO(response.content), nrows = 100, compression="gzip")
    except gzip.BadGzipFile: # not gzip file. try normal csv
        csv_df = pd.read_csv(BytesIO(response.content), nrows = 100)
        print("Could not read the data. Can only open gzip csv file or csv file.")
    return csv_df