[KEDA][AzureEventHub] App not scaling down #972

ttvrdon · 2023-11-02T10:55:51Z

This issue is a: (mark with an x)

bug report -> please search issues before submitting
documentation issue or request
regression (a behavior that used to work and stopped in a new release)

Issue description

AzureContainerApp is processing data from EventHub and is configured as follows:

MinReplicas = 0
MaxReplicas = 10

and is using KEDA Scale Rule of type azure-eventhub with following settings:

- type: azure-eventhub
  metadata:
      eventHubName: ...
      consumerGroup: ...
      blobContainer: ...
      checkpointStrategy: blobMetadata
      unprocessedEventThreshold: 64

When there was a high count of unprocessed messages, it was scaled to its defined maximum - 10 replicas.
However even the unprocessed count is low for long time now, App is not scaling down and stays in 10 replicas.

I created testing code to detect unprocessed count in EventHub, run it every approx. 30 secs (similar interval as scale rule evaluation) with following results:

10:16:15 - Unprocessed: 2
10:17:00 - Unprocessed: 1
10:17:47 - Unprocessed: 0
10:18:35 - Unprocessed: 0
10:19:21 - Unprocessed: 5
10:20:08 - Unprocessed: 1

In the Keda Source code I found that actual metrics are being logged: keda/pkg/scalers/azure_eventhub_scaler.go, lines 389, 396. These logs are not available in Azure Log Analytics. How to enable verbose logging for Keda?

Steps to reproduce

Setup Keda azure-eventhub scale rule as described above
Let the App scale to maximum replicas by high Event Hub messages rate
Stop the excessive ingress, keep it low and monitor the App - it will not scale back down

Expected behavior
When the Unprocessed count will be lower than threshold defined, App should scale back down (gradually down to minimum replica count)

Actual behavior
Replica count stays at maximal count.

The text was updated successfully, but these errors were encountered:

serpentfabric · 2023-11-02T16:13:20Z

can you share testing code?

ttvrdon · 2023-11-06T09:41:03Z

TestingApp.zip

using Azure.Identity;
using Azure.Storage.Blobs.Models;
using Azure.Storage.Blobs;
using Azure.Messaging.EventHubs.Consumer;

var ehNamespace = "<EventHub NamespaceName>";
var ehSharedKeyName = "<KeyName>";
var ehSharedKey = "<KeyValue>";
var ehName = "<EventHubName>";

var consumerGroup = "<ConsumerGroupName>";

var storageAccountUrl = "<StorageAccountUrl>";
var storageContainerName = "<ContainerName>";

// EH Client
var ehConsumerClient = new EventHubConsumerClient(consumerGroup, $"Endpoint=sb://{ehNamespace}.servicebus.windows.net/;SharedAccessKeyName={ehSharedKeyName};SharedAccessKey={ehSharedKey};EntityPath={ehName}");
var partitionIds = await ehConsumerClient.GetPartitionIdsAsync();

// Checkpoint blobs from SA
var checkpointBlobs = await GetCheckpointBlobClients(new Uri(storageAccountUrl), storageContainerName, ehNamespace, ehName, consumerGroup);

// Get Unprocessed events count - each 30sec
while (true)
{
    var checkpoints = await GetCheckpoints(checkpointBlobs);

    long unprocessed = 0;
    foreach (var partitionId in partitionIds)
    {
        var props = await ehConsumerClient.GetPartitionPropertiesAsync(partitionId);
        unprocessed += props.LastEnqueuedSequenceNumber - checkpoints[partitionId].sequencenumber;
    }

    Console.WriteLine($"{DateTime.UtcNow} - Unprocessed: {unprocessed}");

    await Task.Delay(TimeSpan.FromSeconds(30));
}

static async Task<Dictionary<string, (long offset, long sequencenumber)>> GetCheckpoints(IList<(string partitionId, BlobClient blobClient)> checkpointBlobClients)
{
    var checkpoints = new Dictionary<string, (long offset, long sequencenumber)>();

    foreach (var checkpoint in checkpointBlobClients)
    {
        var props = await checkpoint.blobClient.GetPropertiesAsync();

        var offset = long.Parse(props.Value.Metadata["offset"]);
        var sequenceNumber = long.Parse(props.Value.Metadata["sequencenumber"]);

        checkpoints[checkpoint.partitionId] = (offset, sequenceNumber);
    }

    return checkpoints;
}

static async Task<IList<(string partitionId, BlobClient blobClient)>> GetCheckpointBlobClients(Uri storageAccountUrl, string containerName, string ehNamespace, string ehName, string consumerGroup)
{
    var blobServiceClient = new BlobServiceClient(storageAccountUrl, new DefaultAzureCredential());
    var containerClient = blobServiceClient.GetBlobContainerClient(containerName);

    var checkpointBlobs = new List<(string partitionId, BlobClient blobClient)>();

    await foreach (BlobItem blobItem in containerClient.GetBlobsAsync(prefix: $"{ehNamespace}.servicebus.windows.net/{ehName}/{consumerGroup}/checkpoint"))
    {
        var partitionId = blobItem.Name.Substring(blobItem.Name.LastIndexOf('/') + 1);
        var blobClient = containerClient.GetBlobClient(blobItem.Name);

        checkpointBlobs.Add((partitionId, blobClient));
    }

    return checkpointBlobs;
}

joeklin · 2023-12-19T13:22:16Z

We are seeing the same issue with Redis Streams. App successfully scaled to 10 replicas but didn't scale down after all messages had been ack'd

serpentfabric · 2023-12-20T16:32:02Z

in our case we screwed up one of the secrets. but without that verbose logging, we had no idea keda was rejected its inputs and scaling out due to that. so i suspect that'll be your issue too, it's just kinda hard to tell what/why without visibility if you're not really careful to inspect every value/secret given to keda via ACA.

shibayan · 2024-04-16T02:10:48Z

I am encountering the same issue. I created the same azure-eventhub scale rule and used Dapr to process all Event Hubs messages and it did not scale down. I am thinking that the checkpoints are not being shared correctly as the scale down took place once the TTL of the message passed. (Checkpoint settings should be correct for KEDA / Dapr)

patelriki13 · 2024-04-26T03:57:19Z

Any updates on this?

I am also facing same issue.

goncalo-oliveira · 2024-07-08T14:14:55Z

I'm facing a similar issue... running 4 container apps with the KEDA azure-eventhub scale rule and had two of them that were always maxed out, even though the number of messages coming in doesn't reflect the scaling; even when there's nothing coming in, the apps keep scaled at max.

I've reviewed the configuration and found that actually the two apps that were fine, in reality were not properly configured. When the configuration was adjusted, they started suffering from the same issue.

activationUnprocessedEventThreshold: 10
blobContainer: <container_name>
connectionFromEnv: <connection_env>
consumerGroup: <consumer_group>
eventHubNameFromEnv: <hub_name_env>
storageConnectionFromEnv: <storage_connection_env>
unprocessedEventThreshold: 64

Are there any updated on this?

goncalo-oliveira · 2024-07-08T15:26:58Z

Alright... sorted out my own issue. Looking at the latest version of the scaler (2.14), I've found this new parameter (or at least, I don't remember seeing it before).

checkpointStrategy - configure the checkpoint behaviour of different Event Hub SDKs. (Values: azureFunction, blobMetadata, goSdk, default: "", Optional)

And a bit further it says

When no checkpoint strategy is specified, the Event Hub scaler will use backwards compatibility and able to scale older implementations of C#, Python or Java Event Hub SDKs. (see “Legacy checkpointing”). If this behaviour should be used, blobContainer is also required.

It came as a surprise that the default would be legacy checkpointing, to be honest, I expected the other way around. Nonetheless, after setting this to blobMetadata to suit my case, the auto-scaler started working.

Hopefully this will help someone in a similar situation.

Nhattd97 · 2024-11-07T16:45:23Z

This is related to this KEDA issue kedacore/keda#6084. The KEDA team fixed it and released it in v2.16 kedacore/keda#6260. Can we upgrade the ACA to use this version of KEDA? @tomkerkhove , could you please help take a look? Thanks

tomkerkhove · 2024-11-08T13:14:11Z

I don't work on Azure Container Apps so can't help - Sorry.

This usually takes some time though, KEDA releases need to mature first before building an SLA-based service on top of it.

Nhattd97 · 2024-11-08T16:03:34Z

Thanks @tomkerkhove for your information.

microsoft-github-policy-service bot added the Needs: triage 🔍 Pending a first pass to read, tag, and assign label Nov 2, 2023

goncalo-oliveira mentioned this issue Jul 10, 2024

[KEDA][AzureEventHub] App not scaling to zero #1225

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KEDA][AzureEventHub] App not scaling down #972

[KEDA][AzureEventHub] App not scaling down #972

ttvrdon commented Nov 2, 2023 •

edited

Loading

serpentfabric commented Nov 2, 2023

ttvrdon commented Nov 6, 2023

joeklin commented Dec 19, 2023

serpentfabric commented Dec 20, 2023

shibayan commented Apr 16, 2024 •

edited

Loading

patelriki13 commented Apr 26, 2024

goncalo-oliveira commented Jul 8, 2024 •

edited

Loading

goncalo-oliveira commented Jul 8, 2024

Nhattd97 commented Nov 7, 2024 •

edited

Loading

tomkerkhove commented Nov 8, 2024

Nhattd97 commented Nov 8, 2024

[KEDA][AzureEventHub] App not scaling down #972

[KEDA][AzureEventHub] App not scaling down #972

Comments

ttvrdon commented Nov 2, 2023 • edited Loading

This issue is a: (mark with an x)

Issue description

Steps to reproduce

serpentfabric commented Nov 2, 2023

ttvrdon commented Nov 6, 2023

joeklin commented Dec 19, 2023

serpentfabric commented Dec 20, 2023

shibayan commented Apr 16, 2024 • edited Loading

patelriki13 commented Apr 26, 2024

goncalo-oliveira commented Jul 8, 2024 • edited Loading

goncalo-oliveira commented Jul 8, 2024

Nhattd97 commented Nov 7, 2024 • edited Loading

tomkerkhove commented Nov 8, 2024

Nhattd97 commented Nov 8, 2024

ttvrdon commented Nov 2, 2023 •

edited

Loading

shibayan commented Apr 16, 2024 •

edited

Loading

goncalo-oliveira commented Jul 8, 2024 •

edited

Loading

Nhattd97 commented Nov 7, 2024 •

edited

Loading