Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure client upgrade to allow identity options #15287

Merged
merged 51 commits into from
Jan 3, 2024

Conversation

georgew5656
Copy link
Contributor

@georgew5656 georgew5656 commented Oct 31, 2023

Description

Druid does not support various forms of auth that Azure Storage Accounts has added support for, like Service Client auth, Workload Identity Auth (e.g. for AKS) or managed identity Auth. Adding the AzureDefaultCredential chain to the azure clients druid uses allows support for configuring these things. See: https://learn.microsoft.com/en-us/java/api/overview/azure/identity-readme?view=azure-java-stable

In order to support this new identity library, I had to upgrade the druid azure client libraries for blob storage (they were very old and deprecated anyways).

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

  • Many of the changes from the client upgrade were pretty straightforward (I used this .NET docs a lot: https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Blobs/AzureStorageNetMigrationV12.md#class-conversion-reference)
  • Most of these changes are in AzureStorage
    Less straightforward changes
  • The client supports retrying requests by default, so I removed instances of RetryUtils in favor of using the client directly. There are several places where druid uses a different retry configuration for different azure operations, so I had to add AzureClientFactory to support creating additional clients on the fly (because the library doesn't let you override the retry policy on existing clients). I also updated AzureStorage to reuse these per-container overrided clients, the new client is stateless so it is technically fine to just recreate a client each time but i figured it would be better to reuse them as a performance optimization. A couple of the tests were removed because they were testing this retry functionality
  • I had to change the AzureCloudBlobIterator because the new client supports pagination by default (it only fetches additional pages if you request additional items)

Release note

  • Upgrade druid azure client libraries
  • Support AzureDefaultCredential for druid auth to azure services
  • I removed some of the mocks from the tests and added a test implementation (TestPagedResponse). This let me remove the ListBlobItemHolder class that was mostly there for testing.
Key changed/added classes in this PR
  • AzureClientFactory
  • AzureAccountConfig
  • AzureCloudBlobIterator
  • AzureStorage
  • AzureStorageDruidModule

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

I have tested the following scenarios for auth
SAS token auth
Key auth
Workload identity auth (using a k8s serviceAccount and managed identity to mount credentials for druid running on a pod in AKS)
Environment variable auth (using a App registration client id/secret)
Managed identity auth (I technically have not tested this directly on a VM, but I am pretty sure this will work on a VM with a managed identity by setting the managedIdentityClientId property https://github.com/Azure/azure-sdk-for-java/wiki/Azure-Identity-Examples#authenticating-in-azure-with-managed-identity)
Technically the AzureDefaultCredential supports more types of auth but I did not test any of them.

I also tested the following migration scenarios
Cluster running old azure extension using a sas token -> new azure extension using a sas token
Sas token (new client) -> workload credential auth (new client)

For actual functionality I tested all the basic druid functionality (ingesting data from azure, creating and loading segments, managing task logs, killing segments), but I didn't test any of the retry logic that the new clients promise.


public AzureStorage(
Supplier<CloudBlobClient> cloudBlobClient
Supplier<BlobServiceClient> blobServiceClient,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think i can definitely just remove this supplier and lazily call factory.create instead, but i would prefer to make that as a followup change.

@@ -37,19 +36,14 @@ public class AzureTaskLogsConfig
@NotNull
private String prefix = null;

@JsonProperty
@Min(1)
private int maxTries = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't really see this config documented anywhere and its not supported with S3TaskLogs or GoogleTaskLogs so i just removed it

Comment on lines 48 to 62
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.24.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob-batch</artifactId>
<version>12.20.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-common</artifactId>
<version>12.23.0</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these dependencies be the same version?

Copy link
Contributor Author

@georgew5656 georgew5656 Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately azure does not sync these versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://learn.microsoft.com/en-us/azure/developer/java/sdk/get-started-maven#use-the-azure-sdk-for-java-build-tool - have you looked into adding this to see if the azure libraries are added correctly?

Comment on lines 48 to 62
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob</artifactId>
<version>12.24.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-blob-batch</artifactId>
<version>12.20.0</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-common</artifactId>
<version>12.23.0</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://learn.microsoft.com/en-us/azure/developer/java/sdk/get-started-maven#use-the-azure-sdk-for-java-build-tool - have you looked into adding this to see if the azure libraries are added correctly?

@@ -46,6 +46,12 @@ public class AzureAccountConfig
@JsonProperty
private String sharedAccessStorageToken;

@JsonProperty
private String managedIdentityClientId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add validation to error out if the user sets this without setting useAzureCredentialsChain = true as called out in the docs

* @param retryCount number of retries
* @return BlobServiceClient with a custom retryCount
*/
public BlobServiceClient getRetriableBlobServiceClient(@Nonnull Integer retryCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this factory to return both a retriable and non-retriable client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explained this in a below comment

@Nullable
private BlobRequestOptions getRequestOptionsWithRetry(Integer maxAttempts)
@VisibleForTesting
BlobServiceClient getRetriableBlobServiceClient(@Nonnull Integer maxAttempts)
Copy link
Contributor

@suneet-s suneet-s Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BlobServiceClient getRetriableBlobServiceClient(@Nonnull Integer maxAttempts)
BlobServiceClient getRetriableBlobServiceClient(int maxAttempts)

{
if (maxAttempts == null) {
return null;
// To avoid keeping a ton of clients in memory, if maxAttempts is specified use a client with at least that many retries configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little confusing - why would we want max attempts to mean "at least retry attempts".

Also, if this is the desired behavior, we can push this logic into the factory instead of having it in the storage class

Copy link
Contributor Author

@georgew5656 georgew5656 Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is mainly to balance backwards compatibility with performance. the retry logic in the extension is such that some requests don't specify any retries (the default behavior in the previous client was to not retry at all), and some requests specify retries (these can come from different configs). the new azure client doesn't let you override the retry configuration for a client, so in the interest of not either 1. creating a client every time we make a request, 2. keeping a bunch of different clients cached in memory, i added the "at least retry attempts" logic, this way we can keep 2 clients around only

Copy link
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@georgew5656 georgew5656 merged commit 8e95cea into apache:master Jan 3, 2024
83 checks passed
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
pagrawal10 pushed a commit to confluentinc/druid that referenced this pull request Jun 1, 2024
* Include new dependencies

* Mostly implemented

* More azure fixes

* Tests passing

* Unit tests running

* Test running after removing storage exception

* Happy with coverage now

* Add more tests

* fix client factory

* cleanup from testing

* Remove old client

* update docs

* Exclude from spellcheck

* Add licenses

* Fix identity version

* Save work

* Add azure clients

* add licenses

* typos

* Add dependencies

* Exception is not thrown

* Fix intellij check

* Don't need to override

* specify length

* urldecode

* encode path

* Fix checks

* Revert urlencode changes

* Urlencode with azure library

* Update docs/development/extensions-core/azure.md

Co-authored-by: Abhishek Agarwal <[email protected]>

* PR changes

* Update docs/development/extensions-core/azure.md

Co-authored-by: 317brian <[email protected]>

* Deprecate AzureTaskLogsConfig.maxRetries

* Clean up azure retry block

* logic update to reuse clients

* fix comments

* Create container conditionally

* Fix key auth

* Remove container client logic

* Add some more testing

* Update comments

* Add a comment explaining client reuse

* Move logic to factory class

* use bom for dependency management

* fix license versions

---------

Co-authored-by: Abhishek Agarwal <[email protected]>
Co-authored-by: 317brian <[email protected]>
pagrawal10 pushed a commit to confluentinc/druid that referenced this pull request Jun 1, 2024
* Include new dependencies

* Mostly implemented

* More azure fixes

* Tests passing

* Unit tests running

* Test running after removing storage exception

* Happy with coverage now

* Add more tests

* fix client factory

* cleanup from testing

* Remove old client

* update docs

* Exclude from spellcheck

* Add licenses

* Fix identity version

* Save work

* Add azure clients

* add licenses

* typos

* Add dependencies

* Exception is not thrown

* Fix intellij check

* Don't need to override

* specify length

* urldecode

* encode path

* Fix checks

* Revert urlencode changes

* Urlencode with azure library

* Update docs/development/extensions-core/azure.md

Co-authored-by: Abhishek Agarwal <[email protected]>

* PR changes

* Update docs/development/extensions-core/azure.md

Co-authored-by: 317brian <[email protected]>

* Deprecate AzureTaskLogsConfig.maxRetries

* Clean up azure retry block

* logic update to reuse clients

* fix comments

* Create container conditionally

* Fix key auth

* Remove container client logic

* Add some more testing

* Update comments

* Add a comment explaining client reuse

* Move logic to factory class

* use bom for dependency management

* fix license versions

---------

Co-authored-by: Abhishek Agarwal <[email protected]>
Co-authored-by: 317brian <[email protected]>
pagrawal10 pushed a commit to confluentinc/druid that referenced this pull request Jun 3, 2024
* Include new dependencies

* Mostly implemented

* More azure fixes

* Tests passing

* Unit tests running

* Test running after removing storage exception

* Happy with coverage now

* Add more tests

* fix client factory

* cleanup from testing

* Remove old client

* update docs

* Exclude from spellcheck

* Add licenses

* Fix identity version

* Save work

* Add azure clients

* add licenses

* typos

* Add dependencies

* Exception is not thrown

* Fix intellij check

* Don't need to override

* specify length

* urldecode

* encode path

* Fix checks

* Revert urlencode changes

* Urlencode with azure library

* Update docs/development/extensions-core/azure.md

Co-authored-by: Abhishek Agarwal <[email protected]>

* PR changes

* Update docs/development/extensions-core/azure.md

Co-authored-by: 317brian <[email protected]>

* Deprecate AzureTaskLogsConfig.maxRetries

* Clean up azure retry block

* logic update to reuse clients

* fix comments

* Create container conditionally

* Fix key auth

* Remove container client logic

* Add some more testing

* Update comments

* Add a comment explaining client reuse

* Move logic to factory class

* use bom for dependency management

* fix license versions

---------

Co-authored-by: Abhishek Agarwal <[email protected]>
Co-authored-by: 317brian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants