-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update eager: make sure client secret can be specified as env var #2720
Conversation
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Signed-off-by: Niels Bantilan <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2720 +/- ##
===========================================
+ Coverage 45.47% 71.82% +26.34%
===========================================
Files 193 193
Lines 19516 19560 +44
Branches 2818 3845 +1027
===========================================
+ Hits 8875 14049 +5174
+ Misses 10208 4832 -5376
- Partials 433 679 +246 ☔ View full report in Codecov by Sentry. |
flytekit/bin/entrypoint.py
Outdated
@@ -177,6 +177,13 @@ def _dispatch_execute( | |||
for k, v in output_file_dict.items(): | |||
utils.write_proto_to_file(v.to_flyte_idl(), os.path.join(ctx.execution_state.engine_dir, k)) | |||
|
|||
# make sure an event loop exists for data persistence step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this issue comes up not just with eager. Frameworks like Langchain that have async functionality will cause the event loop to end before the flyte task can run the async data persistence operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a separate event loop just for flytekit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds right, I think someone more well-versed in async can help with that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this!
can we have
- register the task to remote cluster (sandbox or dogfood gcp) to prove it works?
- add workflow in unit tests.
Signed-off-by: Niels Bantilan <[email protected]>
Tests for this exist here and here. The changes made to the PR (adding the new The code snippet in the PR description should work on flyte sandbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
remote_cls = type(remote) | ||
return remote_cls() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that the other FlyteRemote
object can be created without a Config
and uses an environment variable. Is this okay for OSS?
Also, should this carry over the default_domain
and default_project
from the original remote obj? Or should the other remote object figure out the domain and project from the execution environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooo yeah it should probably carry over those args.
admittedly this code path is basically: "tell me you only run on serverless without saying you run on serverless" 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically client_secret_env_var
only really applies to serverless (this will fail with Flyte), unless at some point down the road Flyte provides an single API key UX.
Signed-off-by: Niels Bantilan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some questions, but this event loop thing... can you explain why it's needed in entrypoint.py but there doesn't seem to be another one for local execution? Won't both need it?
secret_requests.append(Secret(group=client_secret_group, key=client_secret_key)) | ||
except ValueError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there's nothing in the client_secret_env_var environment variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client_secret_env_var
doesn't figure into this try except block, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no yeah i know, i mean, what happens if all three (client_secret_env_var
, client_secret_group
and client_secret_key
) are None?
@@ -396,6 +398,8 @@ def eager( | |||
:param local_entrypoint: If True, the eager workflow will can be executed locally but use the provided | |||
:py:func:`~flytekit.remote.FlyteRemote` object to create task/workflow executions. This is useful for local | |||
testing against a remote Flyte cluster. | |||
:param client_secret_env_var: if specified, binds the client secret to the specified environment variable for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay with this for now, but I think we should see if we can't move this to some higher level in the future. the secret here we're talking about is the secret for hitting admin itself right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be implemented in Secret
but we haven't moved on that for a while: #1726
|
||
if client_secret_env_var is not None: | ||
# this creates a remote client where the env var client secret is sufficient for authentication | ||
os.environ[client_secret_env_var] = client_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confused what's happening here. if we're removing the asserts, then that means the group/key might be None. If they're none, won't the secrets_manager.get fail? So then what is the client_secret? Or is there some default in the secrets manager that somehow gets returned?
I don't get why we're setting the environment variable here to the value of client secret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to make it easy to authenticate with UnionRemote
on serverless (via UNION_SERVERLESS_API_KEY
env var). Secrets on serverless only needs a secret key (no group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering, in the following scenario:
client_secret_key
isNone
client_secret_group
isNone
(and this scenario is now valid because the asserts are now removed)
after line 626client_secret = secrets_manager.get(client_secret_group, client_secret_key)
what is the value ofclient_secret
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going through the secretsmanager, it looks like it raises an IndexError if you call get(group=None, key=None)
, so I don't think it ever gets past line 626. Could we move the assertion that at least one of those has be be non-None into the get
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a few more comments and went over this with @eapolinario. Could we
- Add an assert to secretsmanager.get that at least one of group/key should be non-None
- Could we move the async loop stuff to another PR? I think it'd be helpful to discuss it separately. Also do you remember how you triggered the issue? I wanna repro locally to understand more what langchain is doing. Ideally we handle in such a way that it's completely independent of any other library.
Signed-off-by: Niels Bantilan <[email protected]>
Updated PR with client secret key/group assertion.
Deleted this. For some reason I'm unable to repro the async issue I was seeing earlier on eager or with langchain async document loaders https://github.com/unionai-oss/union-rag/blob/main/union_rag/langchain.py#L138-L141 |
Why are the changes needed?
Currently, the
@eager
workflows are too strict with its handling of the client secret values. Both secret group and key need to be specified for the secret request to be passed down to the underlying task. There is also an issue with the async event loop not being available for the downstream data persistence step in the task execution.What changes were proposed in this pull request?
This PR loosens this requirement to be compatible with Flyte backends that do not require secret groups. This PR also:
FlyteRemote
object can authenticate with just the client secretHow was this patch tested?
The following workflow was tested on flyte sandbox