Hi all, I made a notebook to demonstrate/test the new Python client for the Placekey API. The notebook uses dataset consisting of ~24,000 lat/lon coordinates of cellular towers. First, the requests are sent in bulk to the Python client using PlacekeyAPI.lookup_placekeys(). The rate limit is hit, an error message is given, and the completed queries are returned. The error message is helpful, but not all the entries are given a placekey because of the rate limit. Second, I wrote a simple wrapper function getPlacekeys() to send the requests in batches of 50 with a 1 second sleep in between each batch. All entries are given a placekey, but the approach is quite a bit slower.
I’m sure the wrapper function could be further optimized, but this demonstrates the tradeoff. Perhaps in the future the Python client could automatically slow down to avoid rate limit issues? Either way, the Python client makes things much more straightforward!
The client (is supposed to) rate limits itself, and it will eventually stop retrying. This is desirable in case the API goes down. But it shouldn’t be encountered in general usage, so maybe the parameters need to be adjusted or there is some other bug.
So what you are seeing is not expected. Our engineering team is looking into it and will provide updates. Your notebook will be a good test to see if we fix it.
@Ryan_Kruse_MN_State We’ve published an updated version of placekey-py that fixes the rate limiting issue. Let us know if you give it a spin or run into any other issues.
@Russ_Thompson_SafeGraph Awesome. I just tried again, and I’m getting some strange behavior… The rate limit issue is handled and all placekeys are returned, which is great. However, it seems like the query_id values returned are not the same as the originals that were sent with the request. For example, after query 100, the next query ID is 201 (when it should have been 101). The largest query ID I requested with was 23400, but the largest returned by the client/API was 46898.
This makes it more difficult to merge the placekeys back with the original data.
That’s a fun bug. The code interpolates query_id when it’s not provided in the query, but the check for that is currently whether or not the response query_id is numeric. Thanks for catching it.
@Ryan_Kruse_MN_State we think that this shoudl be fixed in v0.0.8 which is now published. When you have a chance, could you try re-running this notebook again and confirming that things work correctly? Then we can finally wrap up this project and update the public-facing notebook with the new functionality.
Again the goal is to show that using the placekey python package to hit the placekey API performs as well or better than the custom function we originally wrote.
@Ryan_Fox_Squire_SafeGraph What dataset would you like in the demo? The notebook currently uses the cell towers, which is only latitude and longitude so it only returns “where” portion of placekey which is not ideal for a demo probably
I was hoping maybe all I will need to do is update the function getPlacekeys() or maybe change some other things in Output 55 but keep everything else the same and just re-run it.
does that make sense?
I haven’t looked in detail at your notebook, but I’m hoping i can make some surgical changes to the current notebook and keep everything else the same
I think this should be ready to go in terms of replacing the other notebook. Feel free to run through it and let me know of any changes you’d like me to make
Is this a parameter that a user would/should use? I’m wondering if there is any additional advice I should call out about this parameter in the notebook.
Since the API auto-handles rate-limting etc, is there a practical impact to a user tweaking this parameter? Is there a use case where we would recommend a user change batch_size ?