Allow users to pass a client_request_token for wr.athena.read_sql_query #2473

mmalahe · 2023-09-26T11:59:33Z

Is your idea related to a problem? Please describe.
I'm running a batch computing job in an account that runs thousands of such jobs a day. The basic format of each job is to read in data via an Athena query, do some processing, and then write out json outputs. For an arbitrary subset of these jobs, identical queries may already have been run, but we're performing different operations on the data they provide and are producing different results.

In order to save on costs and time, I'd like to ensure that queries don't get rerun for those duplicate cases. I've looked into the athena_cache_settings option for read_sql_query, but the mechanism of that caching is missing some desirable properties:

I'd like there to be no limit on the lookback distance. The queries we're reusing may be tens or hundreds of thousands of executions back.
I'd prefer not to make many calls to the Athena API to do the linear search for a matching query.

Describe the solution you'd like
I think the easiest way to meet these requirements is to utilize the ClientRequestToken parameter available in StartQueryExecution (https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html). Callers to read_sql_query are then given the option to pass their own token through. If the token matches, they simply get served the existing results.

Example usage:

import hashlib
import awswrangler as wr

def get_query_hash(query):
    return hashlib.sha1(bytes(query, encoding="utf-8")).hexdigest()

query = "select * from table limit 10"
client_request_token = "select_limit_10" + get_query_hash(query)
df = wr.athena.read_sql_query(
    sql=query,
    client_request_token=client_request_token,
    ...
)

The text was updated successfully, but these errors were encountered:

mmalahe · 2023-09-28T07:06:42Z

Thanks for the quick turnaround on this!

mmalahe added the enhancement New feature or request label Sep 26, 2023

kukushking self-assigned this Sep 26, 2023

kukushking mentioned this issue Sep 26, 2023

feat: Athena - add client_request_token #2474

Merged

jaidisido closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to pass a client_request_token for wr.athena.read_sql_query #2473

Allow users to pass a client_request_token for wr.athena.read_sql_query #2473

mmalahe commented Sep 26, 2023 •

edited

Loading

mmalahe commented Sep 28, 2023

Allow users to pass a client_request_token for wr.athena.read_sql_query #2473

Allow users to pass a client_request_token for wr.athena.read_sql_query #2473

Comments

mmalahe commented Sep 26, 2023 • edited Loading

mmalahe commented Sep 28, 2023

mmalahe commented Sep 26, 2023 •

edited

Loading