Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39320: [C++][FS][Azure] Add managed identity auth configuration #39321

Merged
Merged
18 changes: 18 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.cc
Original file line number Diff line number Diff line change
Expand Up @@ -117,27 +117,42 @@ Status AzureOptions::ConfigureClientSecretCredential(const std::string& account_
const std::string& tenant_id,
const std::string& client_id,
const std::string& client_secret) {
account_name_ = account_name;
credential_kind_ = CredentialKind::kTokenCredential;
token_credential_ = std::make_shared<Azure::Identity::ClientSecretCredential>(
tenant_id, client_id, client_secret);
return Status::OK();
}

Status AzureOptions::ConfigureDefaultCredential(const std::string& account_name) {
account_name_ = account_name;
credential_kind_ = CredentialKind::kTokenCredential;
token_credential_ = std::make_shared<Azure::Identity::DefaultAzureCredential>();
return Status::OK();
}

Status AzureOptions::ConfigureManagedIdentityCredential(const std::string& account_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that account_name isn't used. Is it needed?

Copy link
Contributor Author

@Tom-Newton Tom-Newton Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Sorry, I have not been paying enough attention 🤦‍♂️ . I think @felipecrv refactored this slightly. I need to check how the account name is configured now. I'll do it tomorrow.

Copy link
Member

@kou kou Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
It seems that both of ConfigureWorkloadIdentityCredential() and ConfigureDefaultCredential() are the same situation too. Could you check them too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now. I have added a check so that filesystem initialisation fails if account_name_.empty().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I messed up and forgot to set account_name_ on the Configure... functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should undo something I did and move account_name_ back to a public account_name field (the first in AzureOptions, but remove the account_name parameter from all the Configure... functions. That allows us to avoid the repetition of the account_name parameter and is more in line to how every credential constructor doesn't really need an account_name.

@Tom-Newton what you think and can you do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to make account_name_ public though? Personally I would just make account_name a required argument of the AzureOptions constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a good idea in normal classes, but these *Options classes tend to be designed like simple C structs that can be constructed from any language and easily serialized. The only exception being the secret stuff that we don't want to leak and lazily initialized fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think its as you recommended now.

const std::string& client_id) {
account_name_ = account_name;
credential_kind_ = CredentialKind::kTokenCredential;
token_credential_ =
std::make_shared<Azure::Identity::ManagedIdentityCredential>(client_id);
return Status::OK();
}

Status AzureOptions::ConfigureWorkloadIdentityCredential(
const std::string& account_name) {
account_name_ = account_name;
credential_kind_ = CredentialKind::kTokenCredential;
token_credential_ = std::make_shared<Azure::Identity::WorkloadIdentityCredential>();
return Status::OK();
}

Result<std::unique_ptr<Blobs::BlobServiceClient>> AzureOptions::MakeBlobServiceClient()
const {
if (account_name_.empty()) {
return Status::Invalid("AzureOptions doesn't contain a valid account name");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another simplification we can make:

  • Rename kAnonymous to kDefaultCredential
  • On the kDefaultCredential case here, initialize token_credential_ to std::make_shared<Azure::Identity::DefaultAzureCredential>() if it's not initialized already.

In the kTokenCredential case you can add a DCHECK(token_credential_) as documentation that when credential_kind_ is set to kTokenCredential we expect it to be set already (something that is guaranteed by credential_kind_ being private and set only by functions that also initialize token_credential_.

What you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its useful to support both anonymous and default credential as distinct things. Admittedly, its a niche use-case but storage accounts can be public, which is when anonymous is useful https://learn.microsoft.com/en-us/azure/storage/blobs/anonymous-read-access-configure?tabs=portal.

Its a good question though which should be the default. I know adlfs deafults to anonomous rather than default credential. In some respects it might be good to be consistent but also this choice of default has caused issues for me in the past e.g. flyteorg/flyte#3962

Copy link
Contributor

@felipecrv felipecrv Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Makes sense to keep kAnonymous then. But what would you return in AzureOptions::MakeBlobServiceClient() when credential_kind_ is anonymous? Lazy-init token_credential_ with DefaultAzureCredential and create the client with it? can you create the client without any credentials?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked and there is a constructor that doesn't take any credentials. I guess that's what we should call :-)

Copy link
Contributor Author

@Tom-Newton Tom-Newton Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Do you mind if we do that in a separate PR to fix up anonymous and change the default to default credential. This was supposed to be a super simple PR until kou realised all the auths were broken 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

switch (credential_kind_) {
case CredentialKind::kAnonymous:
break;
Expand All @@ -153,6 +168,9 @@ Result<std::unique_ptr<Blobs::BlobServiceClient>> AzureOptions::MakeBlobServiceC

Result<std::unique_ptr<DataLake::DataLakeServiceClient>>
AzureOptions::MakeDataLakeServiceClient() const {
if (account_name_.empty()) {
return Status::Invalid("AzureOptions doesn't contain a valid account name");
}
switch (credential_kind_) {
case CredentialKind::kAnonymous:
break;
Expand Down
3 changes: 3 additions & 0 deletions cpp/src/arrow/filesystem/azurefs.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,9 @@ struct ARROW_EXPORT AzureOptions {

Status ConfigureDefaultCredential(const std::string& account_name);

Status ConfigureManagedIdentityCredential(const std::string& account_name,
std::string const& clientId = std::string());
Tom-Newton marked this conversation as resolved.
Show resolved Hide resolved

Status ConfigureWorkloadIdentityCredential(const std::string& account_name);

Status ConfigureAccountKeyCredential(const std::string& account_name,
Expand Down
10 changes: 10 additions & 0 deletions cpp/src/arrow/filesystem/azurefs_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,16 @@ TEST(AzureFileSystem, InitializeFilesystemWithDefaultCredential) {
EXPECT_OK_AND_ASSIGN(auto fs, AzureFileSystem::Make(options));
}

TEST(AzureFileSystem, InitializeFilesystemWithManagedIdentityCredential) {
AzureOptions options;
ARROW_EXPECT_OK(options.ConfigureManagedIdentityCredential("dummy-account-name"));
EXPECT_OK_AND_ASSIGN(auto fs, AzureFileSystem::Make(options));

ARROW_EXPECT_OK(options.ConfigureManagedIdentityCredential("dummy-account-name",
"specific-client-id"));
EXPECT_OK_AND_ASSIGN(fs, AzureFileSystem::Make(options));
}

TEST(AzureFileSystem, InitializeFilesystemWithWorkloadIdentityCredential) {
AzureOptions options;
ARROW_EXPECT_OK(options.ConfigureWorkloadIdentityCredential("dummy-account-name"));
Expand Down
Loading