Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra.ci/release.ci jobs are failing with pod not ready #4

Closed
halkeye opened this issue Jan 6, 2022 · 7 comments
Closed

infra.ci/release.ci jobs are failing with pod not ready #4

halkeye opened this issue Jan 6, 2022 · 7 comments
Assignees
Labels
bug Something isn't working kubernetes

Comments

@halkeye
Copy link
Member

halkeye commented Jan 6, 2022

Service

  • infra.ci.jenkins.io
  • release.ci.jenkins.io

Summary

Since the 1.31.x line of the kubernetes-plugin, our controllers using AKS for pod agents have their build failing 100% with the following error:

io.fabric8.kubernetes.client.KubernetesClientException: not ready after 5000 MILLISECONDS
	at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:176)
	at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:322)
	at io.fabric8.kubernetes.client.dsl.internal.core.v1.PodOperationsImpl.exec(PodOperationsImpl.java:84)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.doLaunch(ContainerExecDecorator.java:427)
	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecDecorator$1.launch(ContainerExecDecorator.java:344)
	at hudson.Launcher$ProcStarter.start(Launcher.java:507)
	at org.jenkinsci.plugins.durabletask.PowershellScript.doLaunch(PowershellScript.java:176)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launchWithCookie(FileMonitoringTask.java:139)
	at org.jenkinsci.plugins.durabletask.FileMonitoringTask.launch(FileMonitoringTask.java:132)
	at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution.start(DurableTaskStep.java:324)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:319)
	at org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:193)
	at org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:122)
	at jdk.internal.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1213)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:42)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:163)
	at org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
	at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:158)
	at org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:161)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:165)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
	at org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:135)
	at com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
	at WorkflowScript.run(WorkflowScript:273)
	at ___cps.transform___(Native Method)
	at com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:86)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:113)
	at com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixArg(FunctionCallBlock.java:83)
	at jdk.internal.reflect.GeneratedMethodAccessor183.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
	at com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
	at com.cloudbees.groovy.cps.Next.step(Next.java:83)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:174)
	at com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:163)
	at org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:129)
	at org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:268)
	at com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:163)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
	at org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
	at org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:185)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:402)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:96)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:314)
	at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:278)
	at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
	at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
	at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

There is NO error if the pod definition has only a single container named jnlp because there is no kubectl exec call (the pipeline steps are executed in the container named jnlp through the agent.jar process, and not by any call to the Kubernetes API.

This issue is already reported by some JIRA issues:

Reproduction steps

@halkeye halkeye added the triage Incoming issues that need review label Jan 6, 2022
@dduportal dduportal self-assigned this Jan 6, 2022
halkeye added a commit to halkeye/jenkins.io that referenced this issue Jan 6, 2022
halkeye added a commit to jenkins-infra/plugin-site that referenced this issue Jan 6, 2022
@dduportal dduportal added bug Something isn't working and removed triage Incoming issues that need review labels Jan 6, 2022
@dduportal dduportal changed the title infra.ci jobs are failing with pod not ready infra.ci/release.ci jobs are failing with pod not ready Jan 11, 2022
@dduportal
Copy link
Contributor

https://support.cloudbees.com/hc/en-us/articles/360054642231-Considerations-for-Kubernetes-Clients-Connections-when-using-Kubernetes-Plugin describes the 4 different leverages that we can use for now:

  • The 2 leverages related to increasing timeout + allowing more parallel connections to Kubernetes API have already been exhausted: 10000 allowed connections + 5 min timeout (which is HUGE) are not solving the issue and not changing anything
  • "merging" sh/container/other" pipelien steps to leverage the amount connections to Kubernetes API are not working: even with a freshly restarted controller with
  • Short term for infra and release: we take the "all in one container" road: each of our usual containers are going to switch their base image to the jenkinsci/inbound-agent:jdk11 one (or jenkinsci/inbound-agent:alpine-jdk11).

@dduportal
Copy link
Contributor

@dduportal
Copy link
Contributor

@dduportal
Copy link
Contributor

  • Note: Windows package image for the release to do

@dduportal
Copy link
Contributor

dduportal added a commit to dduportal/release that referenced this issue Jan 12, 2022
…jnlp container pod for windows package

Signed-off-by: Damien Duportal <[email protected]>
@dduportal
Copy link
Contributor

Closing at it seems that we covered all images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kubernetes
Projects
None yet
Development

No branches or pull requests

2 participants