Selenium Trigger schedules two Jobs #4833

maxnitze · 2023-07-31T09:39:53Z

Report

KEDA starts a second job for Selenium tests once the Pod of the first job reports ready. Then one of the job takes over the test and is completed afterwards. The other keeps running indefinitely doing nothing.

Expected Behavior

When I start a Selenium test, I expect only one job to be started.

Actual Behavior

A second job is started once the first reports ready.

Steps to Reproduce the Problem

define a ScaledJob with selenium-trigger
start a Selenium test using the selenium grid

Logs from KEDA operator

2023-07-30T12:56:19Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 0}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 0}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Creating jobs   {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Effective number of max jobs": 1}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Creating jobs   {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of jobs": 1}
2023-07-30T12:56:19Z    INFO    scaleexecutor   Created jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of jobs": 1}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 0}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 1}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Creating jobs   {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Effective number of max jobs": 1}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Creating jobs   {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of jobs": 1}
2023-07-30T12:56:29Z    INFO    scaleexecutor   Created jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of jobs": 1}
2023-07-30T12:56:39Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 2}
2023-07-30T12:56:39Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:39Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 0}
2023-07-30T12:56:39Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:49Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 0}
2023-07-30T12:56:49Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-firefox-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}
2023-07-30T12:56:49Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of running Jobs": 2}
2023-07-30T12:56:49Z    INFO    scaleexecutor   Scaling Jobs    {"scaledJob.Name": "selenium-chrome-node", "scaledJob.Namespace": "selenium-grid-keda", "Number of pending Jobs ": 0}

Not sure, if this is enough. But this is the log I see when starting a job.

KEDA Version

2.11.2

Kubernetes Version

1.23

Platform

Other

Scaler Details

Selenium

Anything else?

It seems to be inconsistent which of the job gets the task assigned. Most of the time the second pod got it. But today I had a case where the image was not available on the node of the second job yet. So it started pulling (which took some time). In the meantime the Selenium test was executed in the Pod of the first job.

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-07-31T16:03:51Z

Hi
Could you share your ScaledJob?

maxnitze · 2023-07-31T16:11:08Z

It is generated from this template in the docker selenium chart with these values:

selenium-grid:
  ingress:
    enabled: true
    [ ... ]

  hub:
    [ ... ]

  autoscaling:
    enableWithExistingKEDA: true
    scalingType: job

  chromeNode:
    enabled: true
    maxReplicaCount: 16
    extraEnvironmentVariables:
      - name: TZ
        value: Europe/Berlin
      - name: SCREEN_WIDTH
        value: "1920"
      - name: SCREEN_HEIGHT
        value: "1080"

Here's the deployed ScaledJob:

---
# Source: selenium-grid/charts/selenium-grid/templates/chrome-node-scaledjobs.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: selenium-chrome-node
  namespace: selenium-grid
  annotations:
    helm.sh/hook: post-install,post-upgrade
  labels:
    app: selenium-chrome-node
    app.kubernetes.io/name: selenium-chrome-node
    app.kubernetes.io/managed-by: helm
    app.kubernetes.io/instance: selenium-grid
    app.kubernetes.io/version: 4.10.0-20230607
    app.kubernetes.io/component: selenium-grid-4.10.0-20230607
    helm.sh/chart: selenium-grid-0.19.0
spec:
  maxReplicaCount: 16
  pollingInterval: 10
  scalingStrategy:
    strategy: accurate
  triggers:
    - type: selenium-grid
      metadata:
        browserName: chrome
        unsafeSsl: "true"
        url: 'http://selenium-hub.selenium-grid:4444/graphql'
  jobTargetRef:
    parallelism: 1
    completions: 1
    backoffLimit: 0
    template:
      metadata:
        labels:
          app: selenium-chrome-node
          app.kubernetes.io/name: selenium-chrome-node
          app.kubernetes.io/managed-by: helm
          app.kubernetes.io/instance: selenium-grid
          app.kubernetes.io/version: 4.10.0-20230607
          app.kubernetes.io/component: selenium-grid-4.10.0-20230607
          helm.sh/chart: selenium-grid-0.19.0
        annotations:
          checksum/event-bus-configmap: 067216946d8fd5d28d5536ce6c29523a20ad868f23c81cacef3edade6508cf01
      spec:
        restartPolicy: Never
        containers:
          - name: selenium-chrome-node
            image: selenium/node-chrome:4.10.0-20230607
            imagePullPolicy: IfNotPresent
            env:
              - name: TZ
                value: Europe/Berlin
              - name: SCREEN_WIDTH
                value: "1920"
              - name: SCREEN_HEIGHT
                value: "1080"
            envFrom:
              - configMapRef:
                  name: selenium-event-bus-config
              - configMapRef:
                  name: selenium-node-config
            ports:
              - containerPort: 5555
                protocol: TCP
            volumeMounts:
              - name: dshm
                mountPath: /dev/shm
            resources:
              limits:
                cpu: "1"
                memory: 1Gi
              requests:
                cpu: "1"
                memory: 1Gi
            
            
        terminationGracePeriodSeconds: 30
        volumes:
          - name: dshm
            emptyDir:
              medium: Memory
              sizeLimit: 1Gi

JorTurFer · 2023-07-31T16:24:54Z

Could you try deploying the chart with this value set to default.

accurate does some calculations that could generate 1.xxx, deploying 2 jobs in some cases.

If that doesn't solve the issue, please enable debug logs in KEDA operator pod

maxnitze · 2023-07-31T16:35:57Z

Can confirm, this seems to work.

Could you explain what those cases are? Or can I somewhere read up on them? What is the downside of setting the strategy to default instead of accurate?

maxnitze · 2023-07-31T16:41:16Z

I read through the strategy part in here, but I did no quite get what this means, tbh.

Are there any downsides for my use-case (scheduling Chrome and Firefox pods for Selenium tests)? If not would it make sense to create a PR in the Selenium repo to fix the default there?

JorTurFer · 2023-07-31T17:55:24Z

I don't think that you will have any trouble with the change. TBH, IDK why they set accurate. We suggest using accurate only in the case of knowing that we job is completed just at the end and not in the meantime. Docs explain how they work (a bit below) but the main difference is how both strategies take into account the current jobs.

JorTurFer · 2023-07-31T17:56:06Z

Accurate uses the pending job count and Default uses the running job count to calculate how to scale them, but in general, I always use default. I don't know if opening a PR to change it on Selenium repo is worth, but definitively I'd open an issue asking about this topic, maybe they have a good reason that I don't see (I don't know about selenium more than the minimum required for the scaler)

JorTurFer · 2023-07-31T18:11:39Z

As this isn't a KEDA issue, I close it.
Feel free to reopen it if you think that it's something in KEDA

maxnitze · 2023-08-08T12:02:47Z

Unfortunately this is not the fix. The Selenium Grid does not include the already running sessions in its queue anymore. So the default session does not work for us (see #4865 where I started a discussion about the calculation in the default strategy).

I tried to come up with a custom strategy to get this working, but I don't think it is possible with the config values given.

Could you explain the calculation for a scale in the accurate strategy to me?

maxValue[sic] = min(scaledJob.MaxReplicaCount(), divideWithCeil(queueLength, targetAverageValue))

(I assume the maxValue in the docs should be maxScale)

if (maxScale + runningJobCount) > maxReplicaCount {
	return maxReplicaCount - runningJobCount
}
return maxScale - pendingJobCount

see https://keda.sh/docs/2.11/concepts/scaling-jobs/

Could you elaborate where exactly the issue with the additional job comes from? Why does the scale calculation only include the pendingJobCount in the case, where there are enough "free slots" for all sessions? Is nthat maybe the reason?

maxnitze · 2023-08-08T13:11:00Z

Hey @JorTurFer ,

unfortunately I cannot re-open this issue. And I'm still not sure how to set the strategy in my case.

JorTurFer · 2023-08-08T13:41:50Z

I have reopened the issue but I'm on vacations till 15th, maybe anyone can help or if not, I'll check it after coming back

stale · 2023-10-07T15:12:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

JorTurFer · 2023-10-07T15:45:55Z

f**k me, I didn't answer it. I couldn't reproduce it. If you enable debug logs, you will see the queue value and the desired jobs, could you share that info? and sorry because I thought that I answered :(

maxnitze · 2023-11-04T13:22:33Z

After we tested this in August we decided to go live with it as-is or now with the accurate strategy. We regularly had a look, if there were additional jobs waiting, but we do not seem to have the issue anymore. Maybe it was just something that happens, when there is only a very, very limited number of jobs? I don't know.

I still don't understand the calculation of the scale as much as I would like. But since this seems to be a non-issue when working at scale (at least for us), we did not follow up on this anymore.

Thanks anyways :)

maxnitze added the bug Something isn't working label Jul 31, 2023

keda-automation added this to Roadmap - KEDA Core Jul 31, 2023

github-project-automation bot moved this to To Triage in Roadmap - KEDA Core Jul 31, 2023

JorTurFer closed this as completed Jul 31, 2023

github-project-automation bot moved this from To Triage to Ready To Ship in Roadmap - KEDA Core Jul 31, 2023

JorTurFer moved this from Ready To Ship to Abanoned in Roadmap - KEDA Core Jul 31, 2023

maxnitze mentioned this issue Jul 31, 2023

[🐛 Bug]: Each test-execution starts multiple jobs SeleniumHQ/docker-selenium#1904

Closed

JorTurFer reopened this Aug 8, 2023

github-project-automation bot moved this from Abanoned to Proposed in Roadmap - KEDA Core Aug 8, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label Oct 7, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Oct 7, 2023

maxnitze closed this as completed Nov 4, 2023

github-project-automation bot moved this from Proposed to Ready To Ship in Roadmap - KEDA Core Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selenium Trigger schedules two Jobs #4833

Selenium Trigger schedules two Jobs #4833

maxnitze commented Jul 31, 2023

JorTurFer commented Jul 31, 2023

maxnitze commented Jul 31, 2023

JorTurFer commented Jul 31, 2023

maxnitze commented Jul 31, 2023

maxnitze commented Jul 31, 2023

JorTurFer commented Jul 31, 2023

JorTurFer commented Jul 31, 2023 •

edited

Loading

JorTurFer commented Jul 31, 2023

maxnitze commented Aug 8, 2023

maxnitze commented Aug 8, 2023

JorTurFer commented Aug 8, 2023

stale bot commented Oct 7, 2023

JorTurFer commented Oct 7, 2023

maxnitze commented Nov 4, 2023

Selenium Trigger schedules two Jobs #4833

Selenium Trigger schedules two Jobs #4833

Comments

maxnitze commented Jul 31, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Jul 31, 2023

maxnitze commented Jul 31, 2023

JorTurFer commented Jul 31, 2023

maxnitze commented Jul 31, 2023

maxnitze commented Jul 31, 2023

JorTurFer commented Jul 31, 2023

JorTurFer commented Jul 31, 2023 • edited Loading

JorTurFer commented Jul 31, 2023

maxnitze commented Aug 8, 2023

maxnitze commented Aug 8, 2023

JorTurFer commented Aug 8, 2023

stale bot commented Oct 7, 2023

JorTurFer commented Oct 7, 2023

maxnitze commented Nov 4, 2023

JorTurFer commented Jul 31, 2023 •

edited

Loading