Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update file_handlers.py get_requester_pays #153

Merged
merged 1 commit into from
Sep 27, 2024
Merged

Conversation

cchu70
Copy link
Contributor

@cchu70 cchu70 commented Sep 27, 2024

Problem

get_requester_pays() is incorrectly returning False for buckets that are requester-pays. I copied this code block from canine into my notebook for testing:

def gcloud_storage_client():
global STORAGE_CLIENT
with storage_client_creation_lock:
if STORAGE_CLIENT is None:
# this is the expensive operation
STORAGE_CLIENT = google.cloud.storage.Client()
return STORAGE_CLIENT
class GSFileNotExists(Exception):
pass
class HandleGSURL(FileType):
localization_mode = "url"
def get_requester_pays(self) -> bool:
"""
Returns True if the requested gs:// object or bucket resides in a
requester pays bucket
"""
bucket = re.match(r"gs://(.*?)/.*", self.path)[1]
gcs_cl = gcloud_storage_client()
bucket_obj = google.cloud.storage.Bucket(gcs_cl, bucket, user_project = self.extra_args.get("project"))
return bucket_obj.requester_pays

Here we see bucket._properties is empty, which is why this returns false.
image

Solutions

Google's API suggests a different approach: https://cloud.google.com/storage/docs/using-requester-pays#storage-get-requester-pays-status-python

bucket.reload() is used in google's own API above: https://github.com/googleapis/python-storage/blob/e3cfc4786209c77e3c879c9ff2978f4884a0d677/google/cloud/storage/client.py#L870-L877

When I add bucket.reload(), we get the expected behavior (with the _properties attribute filled):
image

Alternatively, we use Google's recommended approach directly:
image

get_requester_pays() is incorrectly returning False for buckets that are requester-pays. 

Google's API suggests slightly different approach: https://cloud.google.com/storage/docs/using-requester-pays#storage-get-requester-pays-status-python

bucket.reload() used in google's own API above: https://github.com/googleapis/python-storage/blob/e3cfc4786209c77e3c879c9ff2978f4884a0d677/google/cloud/storage/client.py#L870-L877
@cchu70
Copy link
Contributor Author

cchu70 commented Sep 27, 2024

This appears to be the case in the latest version of canine (https://github.com/getzlab/canine/tree/v0.15.7), which is used in the latest version of wolf
https://github.com/getzlab/wolF/blob/1e39dc1a8a3a14ae2cfb2c7f9b9302413751f132/setup.py#L23

@julianhess julianhess merged commit 005ba86 into master Sep 27, 2024
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants