fix: Selenium Grid in case multiple scaler triggers are activate #6437

VietND96 · 2024-12-23T01:26:16Z

Implement strictly the condition to trigger the scaler in complex cases that Selenium Grid includes autoscaling Nodes (with multiple scaledObject with different triggers metadata), non-autoscaling Nodes, relay Nodes, and so on.

In general, update the scaler logic should be aligned with Selenium Grid core in DefaultSlotMatcher - which can be inspected https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/grid/data/DefaultSlotMatcher.java

It is also recommended to set the browserName, browserVersion, and platformName to align across request capabilities, Node stereotype and scaler trigger params to have correct scaling behavior.

In terms of SlotMatcher in Grid, there is no value latest for the capability browserVersion. So, remove that value latest and keep the default value EMPTY.

For example, client sends the request (in Python binding)

options = ChromeOptions()
options.set_capability('platformName', 'Linux')
driver = webdriver.Remote(options=options, command_executor=SELENIUM_GRID_URL)

It is equivalent to scaler trigger params.

  triggers:
    - type: selenium-grid
      metadata:
        url: 'http://selenium-hub:4444/graphql'
        browserName: 'chrome'
        platformName: 'linux'

For example, client sends the request

options = ChromeOptions()
options.set_capability('platformName', 'Linux')
options.set_capability('browserVersion', '131.0')
driver = webdriver.Remote(options=options, command_executor=SELENIUM_GRID_URL)

The scaler trigger param should be aligned as

  triggers:
    - type: selenium-grid
      metadata:
        url: 'http://selenium-hub:4444/graphql'
        browserName: 'chrome'
        platformName: 'linux'
        browserVersion: '131.0'

Most people are using the popular browser node image from the project docker-selenium. In default Node stereotype, the short browser version is set for browserVersion, it might cause breaking backward compatiblity, or request without browserVersion could not trigger scaler anymore. Via SeleniumHQ/docker-selenium#2520, you can set SE_NODE_BROWSER_VERSION as empty to unset the default version (take newer docker image tag from that project, which contains the change onwards).

This also avoids multiple scalers, for example same browserName combines multiple browserVersion could have issue with overlapping in counting number of ongoing and pending sessions belonging to each scaler trigger.

In terms of Selenium Grid v4, it has a kind of Relay Node, which has capabilities to forward commands to the Appium instance. With this Node type, the browserName is optional. So, updating the trigger param browserName to optional.

Checklist

Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #

Relates to #

…sion triggers are active Signed-off-by: Viet Nguyen Duc <[email protected]>

Signed-off-by: Viet Nguyen Duc <[email protected]>

JorTurFer · 2024-12-23T19:24:49Z

/run-e2e selenium
Update: You can check the progress here

JorTurFer · 2024-12-23T20:10:52Z

/run-e2e selenium
Update: You can check the progress here

VietND96 · 2024-12-23T20:11:20Z

Can you try e2e run one more time, I guess pull image could take time or reach pull limit rate...Here is result in my local

EXECUTION SUMMARY
##############################################
##############################################
Passed tests:
        Execution of tests/scalers/selenium/selenium_test.go, has passed after "one" attempts

JorTurFer · 2024-12-23T22:22:09Z

It looks like issue reaching the hub ->

scale_handler	error getting scale decision	***"scaledObject.Namespace": "selenium-test-ns", "scaledObject.Name": "chrome-selenium-test-so", "scaler": "seleniumGridScaler", "error": "error requesting selenium grid endpoint: Post \"[http://selenium-hub.selenium-test-ns:4444/graphql\](http://selenium-hub.selenium-test-ns:4444/graphql/)": dial tcp ***0.0.***5***.***4***:4444: connect: connection refused"***
github.com/***/keda/v***/pkg/scaling.(*scaleHandler).getScalerState
	/workspace/pkg/scaling/scale_handler.go:780
github.com/***/keda/v***/pkg/scaling.(*scaleHandler).getScaledObjectState.func***

I'm checking another issue in my local, once I finish I'll try to check this too

VietND96 · 2024-12-24T11:56:16Z

@JorTurFer, I also done other tests in downstream repos. Let's rerun the e2e job here to see any luck :)

JorTurFer · 2024-12-24T12:10:29Z

/run-e2e selenium
Update: You can check the progress here

JorTurFer · 2024-12-24T12:46:24Z

Now I don't see connection issues trying to reach the hub, but it doesn't pull the values from the hub. I see 6 items in the queue:

But KEDA doesn't scale out the workload and no error is shown either

VietND96 · 2024-12-24T12:59:21Z

Thanks for your info, I am checking it in local again now

VietND96 · 2024-12-24T13:08:31Z

The cluster can be accessed, right? how about when kubectl get deployment, can it list 3 deploy resources, since I updated it to use image selenium/node-chrome:nightly (similar for firefox, edge). Not sure the deployment can be created?

JorTurFer · 2024-12-24T13:10:27Z

The cluster can be accessed, right? how about when kubectl get deployment, can it list 3 deploy resources, since I updated it to use image selenium/node-chrome:nightly (similar for firefox, edge). Not sure the deployment can be created?

let me check

JorTurFer · 2024-12-24T13:19:40Z

Currently, I can see 6 items in the queue:

And I have tried to scale out manually the workloads to 1, and KEDA has scaled in to 0, so the current value calculated is 0. The deployments are correct

JorTurFer · 2024-12-24T13:24:30Z

I've added a print, and this is the content of the Hub response's body (the missing 2 are most probably because when I started the pod manually, they were processed)

{
  "data": {
    "grid": {
      "sessionCount": 0,
      "maxSession": 0,
      "totalSlots": 0
    },
    "nodesInfo": {
      "nodes": [
      ]
    },
    "sessionsInfo": {
      "sessionQueueRequests": [
        "{\n  \"acceptInsecureCerts\": true,\n  \"browserName\": \"firefox\",\n  \"moz:firefoxOptions\": {\n    \"args\": [\n      \"-remote-debugging-port\",\n      \"40627\"\n    ]\n  }\n}",
        "{\n  \"acceptInsecureCerts\": true,\n  \"browserName\": \"chrome\"\n}",
        "{\n  \"acceptInsecureCerts\": true,\n  \"browserName\": \"firefox\",\n  \"moz:firefoxOptions\": {\n    \"args\": [\n      \"-remote-debugging-port\",\n      \"40947\"\n    ]\n  }\n}",
        "{\n  \"acceptInsecureCerts\": true,\n  \"browserName\": \"chrome\"\n}"
      ]
    }
  }
}

JorTurFer · 2024-12-24T13:29:18Z

Now I'm debugging this function:

// This function checks if the request capabilities match the scaler metadata
func checkRequestCapabilitiesMatch(request Capability, browserName string, browserVersion string, _ string, platformName string) bool {
	// Check if browserName matches
	browserNameMatch := request.BrowserName == "" && browserName == "" ||
		strings.EqualFold(browserName, request.BrowserName)

	// Check if browserVersion matches
	browserVersionMatch := (request.BrowserVersion == "" && browserVersion == "") ||
		(request.BrowserVersion == "stable" && browserVersion == "") ||
		(strings.HasPrefix(browserVersion, request.BrowserVersion) && request.BrowserVersion != "" && browserVersion != "")

	// Check if platformName matches
	platformNameMatch := request.PlatformName == "" && platformName == "" ||
		strings.EqualFold(platformName, request.PlatformName)

	return browserNameMatch && browserVersionMatch && platformNameMatch
}

And the platformName doesn't match

	// Check if platformName matches
	platformNameMatch := request.PlatformName == "" && platformName == "" ||
		strings.EqualFold(platformName, request.PlatformName)

So the result of the function is false. The current value for request.PlatformName is "". As I understand, it means that the isn't any platform requirement, so the logic should be something like (and most probably, this is the same for the other 2 checks)

	// Check if platformName matches
	platformNameMatch := request.PlatformName == "" ||
		strings.EqualFold(platformName, request.PlatformName)

WDYT? I've tested that change locally and it works

VietND96 · 2024-12-24T13:33:32Z

Yes, that is reason I also updated in deployment container env

        env:
        - name: SE_NODE_BROWSER_VERSION
          value: ''
        - name: SE_NODE_PLATFORM_NAME
          value: ''

To override the Node stereotypes be aligned with scaler metadata

Signed-off-by: Viet Nguyen Duc <[email protected]>

JorTurFer · 2024-12-24T13:36:21Z

Does it make sense to update the logic to not check the platform/browser/version if the request doesn't specify it? I mean, from my naive pov, if the request doesn't define those values it's because any node can handle that request.

VietND96 · 2024-12-24T13:36:52Z

I changed imagePullPolicy: Always, I guess it might used the cache images without changes yet

JorTurFer · 2024-12-24T13:37:10Z

/run-e2e selenium
Update: You can check the progress here

JorTurFer · 2024-12-24T13:37:22Z

I changed imagePullPolicy: Always, I guess it might used the cache images without changes yet

I'm checking it locally right now

JorTurFer · 2024-12-24T13:39:42Z

The problem that I see is that the nodes aren't starting, so any change applied in nodes isn't taken into account, at least until the node starts. As all the nodes are deployed scaled to 0, any configuration in the node won't be reflected in the hub

JorTurFer · 2024-12-24T13:40:42Z

yeah, locally it still not working, 6 items and no nodes

VietND96 · 2024-12-24T13:46:36Z

Without set platfromName as EMPTY in Node config, it looks like it assumes as Windows by default in Grid...then the scaler scale it down to 0, due to mismatch with metadata

VietND96 · 2024-12-24T13:47:51Z

Give me some time, I will fix the test need to enforce platformName is Linux same as Node stereotype and scaler metadata

JorTurFer · 2024-12-24T13:51:01Z

I've written you by Slack, maybe it'll be faster :)

Signed-off-by: Viet Nguyen Duc <[email protected]>

JorTurFer · 2024-12-24T16:18:47Z

/run-e2e selenium
Update: You can check the progress here

…acore#6437) * fix: Selenium Grid scaler avoids overlapping when multiple browserVersion triggers are active Signed-off-by: Viet Nguyen Duc <[email protected]> * Update CHANGELOG Signed-off-by: Viet Nguyen Duc <[email protected]> * Fix e2e template test Signed-off-by: Viet Nguyen Duc <[email protected]> * Change imagePullPolicy to Always to take latest change Signed-off-by: Viet Nguyen Duc <[email protected]> * Update platformName default value as empty Signed-off-by: Viet Nguyen Duc <[email protected]> --------- Signed-off-by: Viet Nguyen Duc <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>

VietND96 requested a review from a team as a code owner December 23, 2024 01:26

VietND96 force-pushed the main branch 3 times, most recently from 0153063 to 8615239 Compare December 23, 2024 06:07

fix: Selenium Grid scaler avoids overlapping when multiple browserVer…

8898130

…sion triggers are active Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 force-pushed the main branch from 8615239 to 8898130 Compare December 23, 2024 10:07

VietND96 mentioned this pull request Dec 23, 2024

Update docs for Selenium Grid scaler kedacore/keda-docs#1515

Open

1 task

VietND96 changed the title ~~fix: Selenium Grid in case multiple browserVersion scalers are active~~ fix: Selenium Grid in case multiple scaler triggers are activate Dec 23, 2024

Update CHANGELOG

9a3535e

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 mentioned this pull request Dec 23, 2024

Release: 2.16.1 #6435

Open

24 tasks

Fix e2e template test

1a775ca

Signed-off-by: Viet Nguyen Duc <[email protected]>

JorTurFer approved these changes Dec 23, 2024

View reviewed changes

VietND96 mentioned this pull request Dec 24, 2024

K8s: Selenium Grid in case multiple scaler triggers are activate SeleniumHQ/docker-selenium#2515

Merged

8 tasks

Merge branch 'main' into main

388d2fe

Change imagePullPolicy to Always to take latest change

81f5ed0

Signed-off-by: Viet Nguyen Duc <[email protected]>

Update platformName default value as empty

b545179

Signed-off-by: Viet Nguyen Duc <[email protected]>

JorTurFer merged commit 5db5a3a into kedacore:main Dec 24, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Selenium Grid in case multiple scaler triggers are activate #6437

fix: Selenium Grid in case multiple scaler triggers are activate #6437

VietND96 commented Dec 23, 2024 •

edited

Loading

JorTurFer commented Dec 23, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 23, 2024 •

edited by github-actions bot

Loading

VietND96 commented Dec 23, 2024

JorTurFer commented Dec 23, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 •

edited

Loading

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading

fix: Selenium Grid in case multiple scaler triggers are activate #6437

fix: Selenium Grid in case multiple scaler triggers are activate #6437

Conversation

VietND96 commented Dec 23, 2024 • edited Loading

Checklist

JorTurFer commented Dec 23, 2024 • edited by github-actions bot Loading

JorTurFer commented Dec 23, 2024 • edited by github-actions bot Loading

VietND96 commented Dec 23, 2024

JorTurFer commented Dec 23, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 • edited Loading

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 • edited by github-actions bot Loading

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

VietND96 commented Dec 24, 2024

VietND96 commented Dec 24, 2024

JorTurFer commented Dec 24, 2024

JorTurFer commented Dec 24, 2024 • edited by github-actions bot Loading

VietND96 commented Dec 23, 2024 •

edited

Loading

JorTurFer commented Dec 23, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 23, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 24, 2024 •

edited

Loading

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading

JorTurFer commented Dec 24, 2024 •

edited by github-actions bot

Loading