-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39320: [C++][FS][Azure] Add managed identity auth configuration #39321
GH-39320: [C++][FS][Azure] Add managed identity auth configuration #39321
Conversation
Looks like merging workload identity caused merge conflicts. I'll fix it tomorrow. |
e71b386
to
bdb0a00
Compare
cpp/src/arrow/filesystem/azurefs.cc
Outdated
@@ -119,6 +119,14 @@ Status AzureOptions::ConfigureDefaultCredential(const std::string& account_name) | |||
return Status::OK(); | |||
} | |||
|
|||
Status AzureOptions::ConfigureManagedIdentityCredential(const std::string& account_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that account_name
isn't used. Is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Sorry, I have not been paying enough attention 🤦♂️ . I think @felipecrv refactored this slightly. I need to check how the account name is configured now. I'll do it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It seems that both of ConfigureWorkloadIdentityCredential()
and ConfigureDefaultCredential()
are the same situation too. Could you check them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now. I have added a check so that filesystem initialisation fails if account_name_.empty()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I messed up and forgot to set account_name_
on the Configure...
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should undo something I did and move account_name_
back to a public account_name
field (the first in AzureOptions
, but remove the account_name
parameter from all the Configure...
functions. That allows us to avoid the repetition of the account_name
parameter and is more in line to how every credential constructor doesn't really need an account_name
.
@Tom-Newton what you think and can you do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need to make account_name_
public though? Personally I would just make account_name
a required argument of the AzureOptions constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good idea in normal classes, but these *Options
classes tend to be designed like simple C structs that can be constructed from any language and easily serialized. The only exception being the secret stuff that we don't want to leak and lazily initialized fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think its as you recommended now.
bdb0a00
to
d4af840
Compare
cpp/src/arrow/filesystem/azurefs.cc
Outdated
@@ -119,6 +119,14 @@ Status AzureOptions::ConfigureDefaultCredential(const std::string& account_name) | |||
return Status::OK(); | |||
} | |||
|
|||
Status AzureOptions::ConfigureManagedIdentityCredential(const std::string& account_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I messed up and forgot to set account_name_
on the Configure...
functions.
cpp/src/arrow/filesystem/azurefs.cc
Outdated
@@ -119,6 +119,14 @@ Status AzureOptions::ConfigureDefaultCredential(const std::string& account_name) | |||
return Status::OK(); | |||
} | |||
|
|||
Status AzureOptions::ConfigureManagedIdentityCredential(const std::string& account_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should undo something I did and move account_name_
back to a public account_name
field (the first in AzureOptions
, but remove the account_name
parameter from all the Configure...
functions. That allows us to avoid the repetition of the account_name
parameter and is more in line to how every credential constructor doesn't really need an account_name
.
@Tom-Newton what you think and can you do this?
credential_kind_ = CredentialKind::kTokenCredential; | ||
token_credential_ = std::make_shared<Azure::Identity::WorkloadIdentityCredential>(); | ||
return Status::OK(); | ||
} | ||
|
||
Result<std::unique_ptr<Blobs::BlobServiceClient>> AzureOptions::MakeBlobServiceClient() | ||
const { | ||
if (account_name_.empty()) { | ||
return Status::Invalid("AzureOptions doesn't contain a valid account name"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another simplification we can make:
- Rename
kAnonymous
tokDefaultCredential
- On the
kDefaultCredential
case here, initializetoken_credential_
tostd::make_shared<Azure::Identity::DefaultAzureCredential>()
if it's not initialized already.
In the kTokenCredential
case you can add a DCHECK(token_credential_)
as documentation that when credential_kind_
is set to kTokenCredential
we expect it to be set already (something that is guaranteed by credential_kind_
being private and set only by functions that also initialize token_credential_
.
What you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its useful to support both anonymous and default credential as distinct things. Admittedly, its a niche use-case but storage accounts can be public, which is when anonymous is useful https://learn.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure?tabs=portal.
Its a good question though which should be the default. I know adlfs
deafults to anonomous rather than default credential. In some respects it might be good to be consistent but also this choice of default has caused issues for me in the past e.g. flyteorg/flyte#3962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Makes sense to keep kAnonymous
then. But what would you return in AzureOptions::MakeBlobServiceClient()
when credential_kind_
is anonymous? Lazy-init can you create the client without any credentials?token_credential_
with DefaultAzureCredential
and create the client with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked and there is a constructor that doesn't take any credentials. I guess that's what we should call :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Do you mind if we do that in a separate PR to fix up anonymous and change the default to default credential. This was supposed to be a super simple PR until kou realised all the auths were broken 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
the MINGW failure is just |
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit d519544. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
…ion (apache#39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: apache#39320 Authored-by: Thomas Newton <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
…ion (apache#39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: apache#39320 Authored-by: Thomas Newton <[email protected]> Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Rationale for this change
Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code).
What changes are included in this PR?
account_name_.empty()
. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case.AzureOptions::ConfigureManagedIdentityCredential
Are these changes tested?
Added a simple test initialising a filesystem using
ConfigureManagedIdentityCredential
. This is not the most comprehensive test but its the same as what we agreed on for #39263.Are there any user-facing changes?
Managed identity authentication is now supported.