Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP missing creds slows down significantly #740

Open
2 tasks
dmpetrov opened this issue Dec 24, 2024 · 5 comments
Open
2 tasks

GCP missing creds slows down significantly #740

dmpetrov opened this issue Dec 24, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@dmpetrov
Copy link
Member

dmpetrov commented Dec 24, 2024

Description

See quick start.

This command take 20 seconds (sometimes 30) without any progressbar if I don't have any GCP creds setup. If I put anon=True - it works instantly like it suppose to.

meta = DataChain.from_json("gs://datachain-demo/dogs-and-cats/*json", object_name="meta")

This affects all users since we use GCP in most of our examples and, I'm sure, majority of user do not have GCP creds set up.

What's needed: we need to avoid the delay when GCP creds are not set up. Ideally, without specifying anon=True which adds more code in examples and mental overhead for users.

ToDo:

  • Fix the issue
  • Remove anon=True from Readme and Get Started
@dmpetrov dmpetrov added the bug Something isn't working label Dec 24, 2024
dmpetrov added a commit that referenced this issue Dec 26, 2024
dmpetrov added a commit that referenced this issue Dec 26, 2024
@shcheklein
Copy link
Member

This is happening in SOMA co-working for me as well. The reason is the same issue as here https://github.com/iterative/studio/issues/10235 . Google lib has an exponential backoff on trying to reach out to the metadata service. So, when we don't set credentials explicitly to anon it is trying everything it could and thus hitting that exponential backoff. We need to wait for that PR that @0x2b3bfa0 made to land. And even after that users will have to specify an env variable.

I wonder if we can fix fsspec to disable metadata service if we are not on google (GC fsspec has a way to determine that AFAIU) - wdyt @0x2b3bfa0 ?

@dmpetrov
Copy link
Member Author

This is happening in SOMA co-working for me as well

I'm experiencing the same at my home wifi

@skshetry
Copy link
Member

Related: fsspec/filesystem_spec#1768

@skshetry skshetry changed the title GCP missing creds slows down siginificantly GCP missing creds slows down significantly Jan 14, 2025
@shcheklein
Copy link
Member

@skshetry thanks, I put a comment with the details there. It's a good / relevant discussion indeed.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Jan 14, 2025

I wonder if we can fix fsspec to disable metadata service if we are not on google (GC fsspec has a way to determine that AFAIU) - wdyt @0x2b3bfa0 ?

I think exactly https://github.com/iterative/studio/issues/10235#issuecomment-2420957866; I've posted a comment at fsspec/filesystem_spec#1768 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants