Skip to content
This repository has been archived by the owner on Jul 9, 2023. It is now read-only.

Support AAD RBAC for CosmosDB access #52

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lilinvictorms
Copy link
Contributor

When using managed identity to access CosmosDB with dataplane SDK, it's currently not supported to do database/container provisioning operations like create/delete.

This PR adds another CosmosClient, called ProvisionClient, to the cosmosdb options. Users can provide both clients to separate their usages:

  • ProvisionClient: this client is created with account key credentials, so it's allowed to do database/container provisioning operations;
  • Client: this is for data operations only, so it can be created with managed identity to avoid credentials refresh requirements (e.g. account key rotation).

The ProvisionClient is typically only used at service bootstrap time, so no need to think about account key rotation issue.

@galvesribeiro
Copy link
Member

galvesribeiro commented Jun 25, 2021

Hello @lilinvictorms thanks for the PR.

If I understood correctly, you added a second Cosmos client just to the db/container provisioning and left another one for the regular operation. I was ok to do that with the current version of the Cosmos client. However, with recent versions it isn't necessary anymore.

What we need to do is:

  1. Upgrade the Cosmos package to <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.19.0" />. It contains support to the TokenCredential type, which allow AAD authentication (i.e. Service Principal and Managed Identity).
  2. Keep a single client, but change the Options and the registration of the CosmosClient to use the TokenCredential if provided, otherwise use the AuthKey.

With that, it should work just fine and we would avoid 2 clients.

To give you an example this is how I initialize the CosmosDb database and container in another storage provider that use it:

internal class EventStoreInitializer : ILifecycleParticipant<ISiloLifecycle>
    {
        private readonly string _name;
        private readonly ILogger _logger;
        private readonly EventStoreOptions _options;
        private readonly CosmosClient _cosmos;

        public EventStoreInitializer(string name, ILoggerFactory loggerFactory, EventStoreOptions options)
        {
            this._name = name;
            this._logger = loggerFactory.CreateLogger<EventStoreInitializer>();
            this._options = options;

            if (this._options.TokenCredential != null)
            {
                this._cosmos = new CosmosClient(this._options.AccountEndpoint, this._options.TokenCredential, new CosmosClientOptions
                {
                    ApplicationName = this._options.ApplicationName ?? EventStoreOptions.APP_NAME,
                    ConnectionMode = ConnectionMode.Direct,
                    SerializerOptions = EventStoreOptions.CosmosSerializerOptions
                });
            }
            else
            {
                this._cosmos = new CosmosClient(this._options.AccountEndpoint, this._options.AuthKey, new CosmosClientOptions
                {
                    ApplicationName = this._options.ApplicationName ?? EventStoreOptions.APP_NAME,
                    ConnectionMode = ConnectionMode.Direct,
                    SerializerOptions = EventStoreOptions.CosmosSerializerOptions
                });
            }
        }

        public void Participate(ISiloLifecycle lifecycle)
        {
            lifecycle.Subscribe(OptionFormattingUtilities.Name<EventStoreOptions>(this._name), this._options.InitStage, Init);
        }

        private async Task Init(CancellationToken ct)
        {
            var stopWatch = Stopwatch.StartNew();

            try
            {
                this._logger.LogInformation((int)EventStorageProviderErrorCode.EventStoreProvider_InitProvider, $"EventStoreGrainStorage initializing: {this._options.ToString()}");
                this._logger.LogInformation((int)EventStorageProviderErrorCode.EventStoreProvider_ParamConnectionString, "EventStoreGrainStorage is using Account: {0}", ConfigUtilities.RedactConnectionStringInfo(this._options.AccountEndpoint));

                await this._cosmos.CreateDatabaseIfNotExistsAsync(this._options.DatabaseName);
                await this._cosmos.GetDatabase(this._options.DatabaseName).DefineContainer(this._options.ContainerName, $"/SId")
                    .WithIndexingPolicy()
                        .WithAutomaticIndexing(true)
                        .WithExcludedPaths().Path("/*").Attach()
                        .Attach()
                .CreateIfNotExistsAsync();

                stopWatch.Stop();
                this._logger.LogInformation((int)EventStorageProviderErrorCode.EventStoreProvider_InitProvider, $"Initializing provider {this._name} of type {this.GetType().Name} in stage {this._options.InitStage} took {stopWatch.ElapsedMilliseconds} Milliseconds.");
            }
            catch (Exception ex)
            {
                stopWatch.Stop();
                this._logger.LogError((int)ErrorCode.Provider_ErrorFromInit, $"Initialization failed for provider {this._name} of type {this.GetType().Name} in stage {this._options.InitStage} in {stopWatch.ElapsedMilliseconds} Milliseconds.", ex);
                throw;
            }
        }
    }

And this is the Options class which supports both the AuthKey and TokenCredentials:

public class EventStoreOptions
    {
        internal const string APP_NAME = "EventStore";
        internal static readonly CosmosSerializationOptions CosmosSerializerOptions = new CosmosSerializationOptions
        {
            IgnoreNullValues = true,
            Indented = false,
            PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase
        };

        /// <summary>
        /// Azure CosmosDb Account endpoint
        /// </summary>
        public string? AccountEndpoint { get; set; }

        /// <summary>
        /// Azure CosmosDb Authentication Key
        /// </summary>
        [RedactConnectionString]
        public string? AuthKey { get; set; }

        /// <summary>
        /// Use AAD to access the storage account
        /// </summary>
        public TokenCredential? TokenCredential { get; set; }

        /// <summary>
        /// Azure CosmosDb database name
        /// </summary>
        public string DatabaseName { get; set; } = DEFAULT_DATABASE_NAME;
        public const string DEFAULT_DATABASE_NAME = "EventStore";

        /// <summary>
        /// Container name where the events are stored
        /// </summary>
        public string ContainerName { get; set; } = DEFAULT_CONTAINER_NAME;
        public const string DEFAULT_CONTAINER_NAME = "Events";

        /// <summary>
        /// Optional Application Name
        /// </summary>
        public string? ApplicationName { get; set; }

        /// <summary>
        /// Stage of silo lifecycle where storage should be initialized. Storage must be initialized prior to use.
        /// </summary>
        public int InitStage { get; set; } = DEFAULT_INIT_STAGE;
        public const int DEFAULT_INIT_STAGE = ServiceLifecycleStage.ApplicationServices;

        /// <summary>
        /// Configure the dictionary which map the Event Type string to a .Net Type.
        /// </summary>
        public Action<Dictionary<string, Type>> ConfigureSerializerTypeMap { get; set; } = default!;
    }

    /// <summary>
    /// Configuration validator for EventStoreOptions
    /// </summary>
    public class EventStoreOptionsValidator : IConfigurationValidator
    {
        private readonly EventStoreOptions _options;
        private readonly string _name;

        /// <summary>
        /// Constructor
        /// </summary>
        /// <param name="options">The option to be validated.</param>
        /// <param name="name">The option name to be validated.</param>
        public EventStoreOptionsValidator(EventStoreOptions options, string name)
        {
            this._options = options;
            this._name = name;
        }

        public void ValidateConfiguration()
        {
            if (string.IsNullOrWhiteSpace(this._options.AccountEndpoint))
            {
                throw new OrleansConfigurationException($"Configuration for EventStoreOptions {this._name} is invalid. {nameof(this._options.AccountEndpoint)} is required.");
            }

            if (string.IsNullOrWhiteSpace(this._options.DatabaseName))
            {
                throw new OrleansConfigurationException($"Configuration for EventStoreOptions {this._name} is invalid. {nameof(this._options.DatabaseName)} is required.");
            }

            if (string.IsNullOrWhiteSpace(this._options.ContainerName))
            {
                throw new OrleansConfigurationException($"Configuration for EventStoreOptions {this._name} is invalid. {nameof(this._options.ContainerName)} is required.");
            }

            if (string.IsNullOrWhiteSpace(this._options.AuthKey) && this._options.TokenCredential == null)
            {
                throw new OrleansConfigurationException($"Configuration for EventStoreOptions {_name} is invalid. Either {nameof(this._options.AuthKey)} or {nameof(this._options.TokenCredential)} is required.");
            }

            if (this._options.ConfigureSerializerTypeMap == null)
            {
                throw new OrleansConfigurationException($"Configuration for EventStoreOptions {_name} is invalid. {nameof(this._options.ConfigureSerializerTypeMap)} is missing.");
            }
        }
    }

@lilinvictorms
Copy link
Contributor Author

Hi @galvesribeiro,

Actually I did use the latest Azure Cosmos SDK version 3.19.0 with the new usage of TokenCredential with AAD identity. However when I use this dataplane SDK with AAD identity, the management operations are blocked. It's also documented by current CosmosDB document:

https://docs.microsoft.com/en-us/azure/cosmos-db/how-to-setup-rbac#permission-model

It explicitly says:

This permission model only covers database operations that let you read and write data. It does not cover any kind of management operations, like creating containers or changing their throughput. This means that you cannot use any Azure Cosmos DB data plane SDK to authenticate management operations with an AAD identity.

I'm curious if you tried management operations in your code, and if it did work for you, any special configuration you had done?

@galvesribeiro
Copy link
Member

Ahhhh, now I see what you meant...

I've commented on the the issue that was tracking the TokenCredential support. For me, it seems that this feature is half-baked. It should have the proper support for it. All other SDKs allow proper management ops.

To be honest, I've never tried in production to create the database while starting the silo since it was always provisioned by some management operation.

Still, I don't think having two CosmosClient with one using the TokenCredential and other using the AuthToken would be good.

My suggestion would be to:

  1. If AuthKey is used, the initialization code would attempt to create collection and database if they don't exist;
  2. If using TokenCredential, the initialization code would check if the Database/Collection exist. If not, it would throw an exception explaining that Azure doesn't support create Database and/or Collection when using TokenCredential, and would ask the user to make sure it was created already before starting the silo.

That way we don't enforce the user to have the AuthKey together with the TokenCredential which beats the purpose nor we have to have two clients.

What do think?

@lilinvictorms
Copy link
Contributor Author

Agree the CosmosDB RBAC solution is not ideal, however we have to live with it until it can be changed in unknown future.

I'd prefer to have this new ProvisionClient as a optional setting for end users: if they set Client as a CosmosClient created with TokenCredential, and it fails/throws exception for DB provisioning, they can be suggested to provide ProvisionClient as well. Otherwise, your suggestion is asking them to create the database/container before hand, which can be hard because the container schema is put inside Orleans.CosmosDB code.

My current usage pattern is like: configure managed identity (MI) with CosmosDB Account reader role, as well as data RBAC role. So I can provide both clients in CosmosDB options:

  • ProvisionClient: use Azure CosmosDB management SDK to get account key and then create CosmosClient
  • Client: use TokenCredential to create CosmosClient

Since ProvisionClient is optional, for users without need of credentials refresh, or they already have DB/container created like your case, they can stick with current usage (aka only set Client in the options).

I'm also wondering if there could be better solution, so any suggestion is welcome!

@lilinvictorms
Copy link
Contributor Author

Another option is to add TokenCredential into options, and change Orleans.CosmosDB code to use Cosmos management SDK for database/container provisioning. But as you commented in the issue, the management SDK is still in preview, so it's probably not ready to add it as dependency for Orleans.CosmosDB pacakge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants