Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins that help to pass credentials for S3 and GCS to remote cluster workers #438

Open
dbalabka opened this issue Oct 4, 2024 · 4 comments
Labels
provider/aws/ec2 Cluster provider for AWS EC2 Instances provider/gcp/vm Cluster provider for GCP Instances question Further information is requested

Comments

@dbalabka
Copy link
Contributor

dbalabka commented Oct 4, 2024

I didn't find a simple way to pass credentials to remote workers, such as S3 and GCS, while both are widely used to store data frames.
In this ticket's scope, I propose creating plugins that will help distribute the required keys to remote workers.

GCP credentials
GCP credentials file path is stored in GOOGLE_APPLICATION_CREDENTIALS env variable. The plugin has to create a remote file and pass an env variable with a proper path to workers.

S3 credentials
Like GCP, we must update credential files and store them on each worker.

PR: #439

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Oct 7, 2024

Usually you would create an IAM instance role and profile that can access S3, then configure workers to have this role via the iam_instance_profile keyword argument.

The GCP equivalent is to create a service account that can access GCS and configure that with the service_account kwarg.

This way you don't have to pass credentials around. Is there a reason why you aren't doing it this way?

@jacobtomlinson jacobtomlinson added question Further information is requested provider/gcp/vm Cluster provider for GCP Instances provider/aws/ec2 Cluster provider for AWS EC2 Instances labels Oct 7, 2024
@dbalabka
Copy link
Contributor Author

@jacobtomlinson, sorry for not being active the last few months because of workload and vacation. Thanks for the question.

You are correct that using a proper service account or IAM role/profile is the more secure way and preferable for production workloads. However, I have a few scenarios when dynamically uploading the key can be more convenient.

For local development, the recommended way for GCP cloud is to use Application Default Credentials acquired with gcloud auth application-default login command. Previously, I provided changes and a detailed description in #429. ADC is associated with developers user account that would be preferable to reuse in workers. Otherwise, we have to create a separate service account key for each developer or automate the creation of such a key.

If dask deployed on-prem Kubernetes during local development, uploading the key is the most convenient way to provide the key to workers. Otherwise, we have to keep them in Secrets and mount them separately. However, such an approach is more suitable for production workloads.

@jacobtomlinson
Copy link
Member

I see. So we could create a plugin for the client which grabs those credentials and propagates them to the workers. Do you have any interest in implementing such a plugin?

@dbalabka
Copy link
Contributor Author

dbalabka commented Mar 1, 2025

@jacobtomlinson, right. I've submitted a PR #439. PR contains source of two separate plugins for AWS and GCP that we are already using. Both provide very convenient way to push required credentials to workers. Developer simply needs to add both plugins and no configuration needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider/aws/ec2 Cluster provider for AWS EC2 Instances provider/gcp/vm Cluster provider for GCP Instances question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants