Tutorial: How to Access Files via the API

To test out the Data API yourself, check out this Google Colab notebook tutorial.

Definition of Terms

What is the Data API?

Data API is a REST API that allows authorized users to programmatically and securely retrieve the data that they are entitled to in the Dewey platform. This document provides a detailed guide on how to perform the following:

  • Get a list of files that you are entitled to.
  • Download the files that you are entitled to.

Prerequisites

The user following this guide must:

  • Have valid credentials to use the Data API. A user that is part of the “End User” group will include necessary roles needed.
  • Use a REST API client, such as Postman or curl, to access the features described in this document.
  • To access purchased product files, users will need to have an active account with an active order.

Main Flow

Authorization

You must have valid credentials to use the Data API. You may use a REST API client, such as Postman or curl, to access the features described in this document.

  1. Use this base URL for the API calls:

  2. https://marketplace.deweydata.io/api/data/v2/

  3. The APIs are secured with Basic auth or OAuth/Bearer token

For Basic auth

  1. Generate a base 64 encoding of the user and password to be used for basic authorization.

  2. Use the following format for the webstore credentials “username:password” encoded in base 64.There are multiple ways to generate this online.

Example

  • User: john.smith@deweydata.io
  • Password: th1s1sapassw0rd1
  • We adjust the format to “john.smith@deweydata.io:th1s1sapassw0rd1”
    Then encode it in base 64 to “am9obi5zbWl0aEBkZXdleWRhdGEuaW86dGgxczFzYXBhc3N3MHJkMQ==” using the following command:

echo -n "john.smith@deweydata.io:th1s1sapassw0rd1" | openssl base64

This will be used in the following form as “Basic base4-authorization” in requests.

:spiral_notepad:Note
These steps aren’t required if using a client that can encode the Basic authorization, such as Postman.

For Bearer authorization:

Use this syntax to get an access token using CURL:

curl -X 'POST' 'https://marketplace.deweydata.io/api/auth/tks/get_token' -H 'accept: application/json' -H 'Authorization: Basic ${base64-auth}'

Bearer/Access token is a more secure option since it doesn’t send the password on each request. The access token will be used for requests to access our files.

Example request:

curl -X 'POST' 'https://documentation.datawebstore.net/api/auth/tks/get_token' -H 'accept: application/json' -H 'Authorization: Basic am9obi5zbWl0aEBkZXdleWRhdGEuaW86dGgxczFzYXBhc3N3MHJkMQ=='

Example response:

{

"access_token": "9YtXWpr46int9ceWeOVL-M_YRb4",

"expires_in": 26533,

"token_type": "bearer",

"refresh_token": "V-O42WO0nhi0btZndVZHqYoTng8"

}

:spiral_notepad:Note
All Curl requests in this document use Bearer tokens.
To use Basic Auth simply replace all “Authorization: Bearer <access_token>” With “Authorization: Basic <basic64_auth>”

List of files

Using the Data API, you can query a list of files that you are entitled to using the following request:

curl -H 'Accept: application/json' -H "Authorization: Bearer ${access_token}"-X GET 'https://marketplace.deweydata.io/api/data/v2/list?${username}’

Note that the username refers to the Dewey platform username.

Example request to get files accessible by john.smith@deweydata.io

curl -X GET "https://marketplace.deweydata.io/api/data/v2/list" -H "accept: application/json" -H "Authorization: Bearer am9obi5zbWl0aEBkZXdleWRhdGEuaW86dGgxczFzYXBhc3N3MHJkMQ=="

Example response body

[

{

"name": "2022",

"parent": "/api/data/v2/list",

"url": "/api/data/v2/list/2022",

"size": 0,

"createdAt": "2022-01-01T00:00:00",

"updatedAt": "2022-09-08T00:00:00",

"directory": true,

"writable": false

}

]

Using the Data API, you have the ability to filter down to specific folders in a directory.

curl -H 'Accept: application/json' -H "Authorization: Bearer ${access_token}" -X GET 'https://marketplace.deweydata.io/api/data/v2/list/${path}’

To illustrate, consider the following directory structure:

Below are example endpoints and the results that are returned by the API calls.

GET https://marketplace.deweydata.io/api/data/v2/list

[

{

"name": "2022",

"parent": "/api/data/v2/list",

"url": "/api/data/v2/list/2022",

"size": 0,

"createdAt": "2022-01-01T00:00:00",

"updatedAt": "2022-08-17T00:00:00",

"directory": true,

"writable": false

}

]

GET https://marketplace.deweydata.io/api/data/v2/list/2022/

[

{

"name": "08",

"parent": "/api/data/v2/list/2022",

"url": "/api/data/v2/list/2022/08",

"size": 0,

"createdAt": "2022-08-01T00:00:00",

"updatedAt": "2022-08-17T00:00:00",

"directory": true,

"writable": false

}

]

GET https://marketplace.deweydata.io/api/data/v2/list/2022/08/

[

{

"name": "12",

"parent": "/api/data/v2/list/2022/08",

"url": "/api/data/v2/list/2022/08/12",

"size": 0,

"createdAt": "2022-08-12T18:50:21",

"updatedAt": "2022-08-12T18:50:21",

"directory": true,

"writable": false

}

]

GET https://marketplace.deweydata.io/api/data/v2/list/2022/08/12

[

{

"name": "ONE",

"parent": "/api/data/v2/list/2022/08/12/FINDATA",

"url": "/api/data/v2/list/2022/08/12/FINDATA/ONE",

"size": 0,

"createdAt": "2022-08-12T18:50:21",

"updatedAt": "2022-08-12T18:50:21",

"directory": true,

"writable": false

}

]

GET https://marketplace.deweydata.io/api/data/v2/list/2022/08/12/FINDATA

[

{

"name": "ONE",

"parent": "/api/data/v2/list/2022/08/12/FINDATA",

"url": "/api/data/v2/list/2022/08/12/FINDATA/ONE",

"size": 0,

"createdAt": "2022-08-12T18:50:21",

"updatedAt": "2022-08-12T18:50:21",

"directory": true,

"writable": false

}

]

GET https://marketplace.deweydata.io/api/data/v2/list/2022/08/12/FINDATA/ONE

[

{

"name": "Ohio_20220510.txt",

"fid": "20220812-FINDATA_ONE_TWO_THR_0",

"parent": "/api/data/v2/list/2022/08/12/FINDATA/ONE",

"url": "/api/data/v2/data/2022/08/12/FINDATA/ONE/20220812-FINDATA_ONE_TWO_THR_0",

"size": 12,

"md5sum": "04e44a91d39dd21b53582d82a09e44de",

"createdAt": "2022-08-12T18:50:21",

"updatedAt": "2022-08-12T18:50:21",

"directory": false,

"writable": false

}

]

The table below describes the fields returned in the response.

Syntax

Name Description Type
name Name of the file or directory result String
parent Contains the full path up to the parent directory of the record String
url URL path which can be used to download the file or access the directory via API String
size Bytes int
md5sum 32-character checksum which can be used to verify that a file downloaded completed String
createdAt Gives the date and time that the file or directory was created. ISO 8601 format is used

Example: “2018-09-26T00:00:00.000Z”|String (ISO 8601 format, UTC time zone)|
|updatedAt|Gives the date and time that the file or directory was last updated. ISO 8601 format is used

Example: “2018-09-26T00:00:00.000Z”|String (ISO 8601 format, UTC time zone)|
|writable|Indicates whether the user has access to write to the given file or directory|Boolean|
|directory|Indicates whether the record represents a directory|Boolean|
|fid|File identification number that is used to uniquely identify a file|String|

Download a file

Using the Data API, you can download the data that you have access to.

The syntax for downloading a file is similar to listing files, except for the following differences:

  • “list” is replaced by “data”
  • The path ends with the fid

curl -H 'Accept: application/json' -H "Authorization: Bearer ${access_token}" -X GET 'https://marketplace.deweydata.io/api/data/v2/data/${path}'

Following the example in the list section, an example request:

curl -H "Accept: application/json" -H "Authorization: Bearer ${access_token}" -X GET "https://marketplace.deweydata.io/api/data/v2/data/2022/08/12/FINDATA/ONE/20220812-FINDATA_ONE_TWO_THR_0/"

To specify start and end of the file to retrieve:

curl -X GET 'https://marketplace.deweydata.io/api/data/v2/data/?start={start}&end={end}&path=${path}'

-H 'Accept: application/json' -H "Authorization: Bearer ${access_token}"

Following the example in the list section, an example request with start and end:

curl -X GET "https://documentation.gaas.pink/api/data/v2/data/?start=0&end=100&path=2022/08/12/FINDATA/ONE/20220812-FINDATA_ONE_TWO_THR_0" -H "accept: application/json" -H "authorization: Basic am9obi5zbWl0aEBkZXdleWRhdGEuaW86dGgxczFzYXBhc3N3MHJkMQ=="

To rename the file, use the “-o” option. The example below renames the file to test.csv.

curl -H "Accept: application/json" -H "Authorization: Bearer ${access_token}" -X GET "https://marketplace.deweydata.io/api/data/v2/data/2022/08/12/FINDATA/ONE/20220812-FINDATA_ONE_TWO_THR_0/" -o test.csv

Syntax

Parameters Description Type
path URL path which can be used to download the file or access the directory via API.
It is the URL received in the last list call referred to as URL. String - Mandatory
start Specify Start of the range in bytes.
Allows the user to define the specific byte range to download. Integer - Optional
end Specify End of the range bytes.
Allows the user to define the specific byte range to download. Integer - Optional
2 Likes

It looks like the following:

Should read:

(e.g. debian linux)

I am can use this API tool to download files. However, the code starts to become super slow after I downloaded 277 files. It has been 2 hours, the code always runing but no more files downloaded.

Do you know why? any suggestions on best practices of this?

Best
Yang

Hi @yangsong, I’m curious if this error could have been due to a network disconnection or something. A few questions:

  1. Did the downloads stop completely? Or were there still files slowly being downloaded?
  2. Did you confirm you had more than 277 files to download? The code should have stopped running after all files are done anyway.
  3. Have you tried re-running the code? If you are familiar with Python, you might be able to skip the files that are already successfully downloaded.
  4. Can you confirm the files that were downloaded look as expected? Also, specifically, does the 277th file (the final one that downloaded) look correct?

Thanks for the answer.

The downloads didn’t stop but keep running with no new files created for like 3hours. I stopped the python file and rerun the code. It works again. I guess I need just to rerun the code to fix this.

1 Like

@ryank I have another question. I wonder will the content in my api request URL change?

For example, I have a url that point to a file: /api/data/v2/data/2022/12/26/SAFEGRAPH/WP/20221226-safegraph_wp_cpgp_part29_0

If I download this URL using the python API file you provided yesterday vs today, will the content in the file change?

Best