-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selenium Trigger schedules two Jobs #4833
Comments
Hi |
It is generated from this template in the docker selenium chart with these values: selenium-grid:
ingress:
enabled: true
[ ... ]
hub:
[ ... ]
autoscaling:
enableWithExistingKEDA: true
scalingType: job
chromeNode:
enabled: true
maxReplicaCount: 16
extraEnvironmentVariables:
- name: TZ
value: Europe/Berlin
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080" Here's the deployed ---
# Source: selenium-grid/charts/selenium-grid/templates/chrome-node-scaledjobs.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: selenium-chrome-node
namespace: selenium-grid
annotations:
helm.sh/hook: post-install,post-upgrade
labels:
app: selenium-chrome-node
app.kubernetes.io/name: selenium-chrome-node
app.kubernetes.io/managed-by: helm
app.kubernetes.io/instance: selenium-grid
app.kubernetes.io/version: 4.10.0-20230607
app.kubernetes.io/component: selenium-grid-4.10.0-20230607
helm.sh/chart: selenium-grid-0.19.0
spec:
maxReplicaCount: 16
pollingInterval: 10
scalingStrategy:
strategy: accurate
triggers:
- type: selenium-grid
metadata:
browserName: chrome
unsafeSsl: "true"
url: 'http://selenium-hub.selenium-grid:4444/graphql'
jobTargetRef:
parallelism: 1
completions: 1
backoffLimit: 0
template:
metadata:
labels:
app: selenium-chrome-node
app.kubernetes.io/name: selenium-chrome-node
app.kubernetes.io/managed-by: helm
app.kubernetes.io/instance: selenium-grid
app.kubernetes.io/version: 4.10.0-20230607
app.kubernetes.io/component: selenium-grid-4.10.0-20230607
helm.sh/chart: selenium-grid-0.19.0
annotations:
checksum/event-bus-configmap: 067216946d8fd5d28d5536ce6c29523a20ad868f23c81cacef3edade6508cf01
spec:
restartPolicy: Never
containers:
- name: selenium-chrome-node
image: selenium/node-chrome:4.10.0-20230607
imagePullPolicy: IfNotPresent
env:
- name: TZ
value: Europe/Berlin
- name: SCREEN_WIDTH
value: "1920"
- name: SCREEN_HEIGHT
value: "1080"
envFrom:
- configMapRef:
name: selenium-event-bus-config
- configMapRef:
name: selenium-node-config
ports:
- containerPort: 5555
protocol: TCP
volumeMounts:
- name: dshm
mountPath: /dev/shm
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationGracePeriodSeconds: 30
volumes:
- name: dshm
emptyDir:
medium: Memory
sizeLimit: 1Gi
|
Could you try deploying the chart with this value set to
If that doesn't solve the issue, please enable debug logs in KEDA operator pod |
Can confirm, this seems to work. Could you explain what those cases are? Or can I somewhere read up on them? What is the downside of setting the strategy to |
I read through the Are there any downsides for my use-case (scheduling Chrome and Firefox pods for Selenium tests)? If not would it make sense to create a PR in the Selenium repo to fix the default there? |
I don't think that you will have any trouble with the change. TBH, IDK why they set |
Accurate uses the pending job count and Default uses the running job count to calculate how to scale them, but in general, I always use |
As this isn't a KEDA issue, I close it. |
Unfortunately this is not the fix. The Selenium Grid does not include the already running sessions in its queue anymore. So the I tried to come up with a custom strategy to get this working, but I don't think it is possible with the config values given. Could you explain the calculation for a scale in the maxValue[sic] = min(scaledJob.MaxReplicaCount(), divideWithCeil(queueLength, targetAverageValue)) (I assume the if (maxScale + runningJobCount) > maxReplicaCount {
return maxReplicaCount - runningJobCount
}
return maxScale - pendingJobCount see https://keda.sh/docs/2.11/concepts/scaling-jobs/ Could you elaborate where exactly the issue with the additional job comes from? Why does the |
Hey @JorTurFer , unfortunately I cannot re-open this issue. And I'm still not sure how to set the strategy in my case. |
I have reopened the issue but I'm on vacations till 15th, maybe anyone can help or if not, I'll check it after coming back |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
f**k me, I didn't answer it. I couldn't reproduce it. If you enable debug logs, you will see the queue value and the desired jobs, could you share that info? and sorry because I thought that I answered :( |
After we tested this in August we decided to go live with it as-is or now with the I still don't understand the calculation of the Thanks anyways :) |
Report
KEDA starts a second job for Selenium tests once the Pod of the first job reports ready. Then one of the job takes over the test and is completed afterwards. The other keeps running indefinitely doing nothing.
Expected Behavior
When I start a Selenium test, I expect only one job to be started.
Actual Behavior
A second job is started once the first reports ready.
Steps to Reproduce the Problem
ScaledJob
with selenium-triggerLogs from KEDA operator
Not sure, if this is enough. But this is the log I see when starting a job.
KEDA Version
2.11.2
Kubernetes Version
1.23
Platform
Other
Scaler Details
Selenium
Anything else?
It seems to be inconsistent which of the job gets the task assigned. Most of the time the second pod got it. But today I had a case where the image was not available on the node of the second job yet. So it started pulling (which took some time). In the meantime the Selenium test was executed in the Pod of the first job.
The text was updated successfully, but these errors were encountered: