puma/file.kubernetes.html

<!DOCTYPE html>
<html>
<head>

<meta charset='UTF-8'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, user-scalable=no'>
<meta name='apple-touch-fullscreen'       content='yes'>
<meta name='apple-mobile-web-app-capable' content='yes'>
<meta name='apple-mobile-web-app-status-bar-style' content='rgba(228,228,228,1.0)'>

<title>File: Kubernetes &mdash; Puma master</title>

<link rel='stylesheet'  type='text/css' href='../css/y_fonts.css' />
<link rel='stylesheet'  type='text/css' href='../css/highlight.github.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_style.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_list.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_color.css' />

<script type='text/javascript'>
  var pathId = "kubernetes",
    relpath = '';

  var t2Info = {
    CSEP: '.',
    ISEP: '#',
    NSEP: '::'
  };
</script>

<script type='text/javascript' charset='utf-8' src='../js/highlight.pack.js'></script>
<script type='text/javascript' charset='utf-8' src='../js/y_app.js'></script>

</head>
<body>
<svg id='y_wait' class viewBox='0 0 90 90'></svg>
<div id='settings' class='hidden'></div>
<div id='y_list' class='d h'>
  <header id='list_header'></header>
  <nav id= 'list_nav' class='y_nav l_nav'>
    <ul id='list_items'></ul>
  </nav>
</div>
<div id='y_toc' class='f h'>
  <header id='toc_header'></header>
  <nav id= 'toc_nav' class='y_nav t_nav'>
  <ol id='toc_items'></ol>
  </nav>
</div>
<div id='y_main' tabindex='-1'>
  <header id='y_header'>
    <div id='y_menu'>
      <a id='home_no_xhr' href='/'>Home</a> &raquo; 
      <a href='.'>Puma master</a> &raquo; 
      <a href='_index.html'>Index</a> &raquo; 
      <span class='title'><a id='t2_doc_top' href='#'>File: Kubernetes&nbsp;&#x25B2;</a></span>
          </div>

    <a id='list_href' href="class_list.html"></a>
    <div id='y_measure_em' class='y_measure'></div>
    <div id='y_measure_vh' class='y_measure'></div>
    <span id='y_measure_50pre' class='y_measure'><code>123456789_123456789_123456789_123456789_123456789_</code></span>
  </header>
<div id='content' class='file'>
<h1>Kubernetes</h1>

<h2>Running Puma in Kubernetes</h2>

<p>In general running <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> in Kubernetes works as-is, no special configuration is needed beyond what you would write anyway to get a new Kubernetes Deployment going. There is one known interaction between the way Kubernetes handles pod termination and how <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> handles <code>SIGINT</code>, where some request might be sent to <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> after it has already entered graceful shutdown mode and is no longer accepting requests. This can lead to dropped requests during rolling deploys. A workaround for this is listed at the end of this article.</p>

<h2>Basic setup</h2>

<p>Assuming you already have a running cluster and docker image repository, you can run a simple <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> app with the following example Dockerfile and Deployment specification. These are meant as examples only and are deliberately very minimal to the point of skipping many options that are recommended for running in production, like healthchecks and envvar configuration with ConfigMaps. In general you should check the <a href="https://kubernetes.io/docs/home/">Kubernetes documentation</a> and <a href="https://docs.docker.com/">Docker documentation</a> for a more comprehensive overview of the available options.</p>

<p>A basic Dockerfile example: </p>

<pre class="code ruby"><code class="ruby"><span class='const'>FROM</span> <span class='label'>ruby:</span><span class='float'>2.5</span><span class='op'>-</span><span class='id identifier rubyid_alpine'>alpine</span> <span class='comment'># can be updated to newer ruby versions
</span><span class='const'>RUN</span> <span class='id identifier rubyid_apk'>apk</span> <span class='id identifier rubyid_update'>update</span> <span class='op'>&amp;&amp;</span> <span class='id identifier rubyid_apk'>apk</span> <span class='id identifier rubyid_add'>add</span> <span class='id identifier rubyid_build'>build</span><span class='op'>-</span><span class='id identifier rubyid_base'>base</span> <span class='comment'># and any other packages you need 
</span>
<span class='comment'># Only rebuild gem bundle if Gemfile changes
</span><span class='const'>COPY</span> <span class='const'>Gemfile</span> <span class='const'>Gemfile</span>.<span class='id identifier rubyid_lock'>lock</span> .<span class='op'>/</span>
<span class='const'>RUN</span> <span class='id identifier rubyid_bundle'>bundle</span> <span class='id identifier rubyid_install'>install</span>

<span class='comment'># Copy over the rest of the files
</span><span class='const'>COPY</span> . .

<span class='comment'># Open up port and start the service
</span><span class='const'>EXPOSE</span> <span class='int'>9292</span>
<span class='const'>CMD</span> <span class='id identifier rubyid_bundle'>bundle</span> <span class='id identifier rubyid_exec'>exec</span> <span class='id identifier rubyid_rackup'>rackup</span> <span class='op'>-</span><span class='id identifier rubyid_o'>o</span> <span class='float'>0.0</span></code></pre>

<p>A sample <code>deployment.yaml</code>:</p>

<pre class="code ruby"><code class="ruby"><span class='op'>-</span><span class='op'>-</span><span class='op'>-</span>
<span class='id identifier rubyid_apiVersion'>apiVersion</span><span class='op'>:</span> <span class='id identifier rubyid_apps'>apps</span><span class='op'>/</span><span class='id identifier rubyid_v1'>v1</span>
<span class='id identifier rubyid_kind'>kind</span><span class='op'>:</span> <span class='const'>Deployment</span>
<span class='id identifier rubyid_metadata'>metadata</span><span class='op'>:</span>
  <span class='id identifier rubyid_name'>name</span><span class='op'>:</span> <span class='id identifier rubyid_my'>my</span><span class='op'>-</span><span class='id identifier rubyid_awesome'>awesome</span><span class='op'>-</span><span class='id identifier rubyid_puma'>puma</span><span class='op'>-</span><span class='id identifier rubyid_app'>app</span>
<span class='id identifier rubyid_spec'>spec</span><span class='op'>:</span>
  <span class='id identifier rubyid_selector'>selector</span><span class='op'>:</span>
    <span class='id identifier rubyid_matchLabels'>matchLabels</span><span class='op'>:</span>
      <span class='id identifier rubyid_app'>app</span><span class='op'>:</span> <span class='id identifier rubyid_my'>my</span><span class='op'>-</span><span class='id identifier rubyid_awesome'>awesome</span><span class='op'>-</span><span class='id identifier rubyid_puma'>puma</span><span class='op'>-</span><span class='id identifier rubyid_app'>app</span>
  <span class='id identifier rubyid_template'>template</span><span class='op'>:</span>
    <span class='id identifier rubyid_metadata'>metadata</span><span class='op'>:</span>
      <span class='id identifier rubyid_labels'>labels</span><span class='op'>:</span>
        <span class='id identifier rubyid_app'>app</span><span class='op'>:</span> <span class='id identifier rubyid_my'>my</span><span class='op'>-</span><span class='id identifier rubyid_awesome'>awesome</span><span class='op'>-</span><span class='id identifier rubyid_puma'>puma</span><span class='op'>-</span><span class='id identifier rubyid_app'>app</span>
        <span class='id identifier rubyid_service'>service</span><span class='op'>:</span> <span class='id identifier rubyid_my'>my</span><span class='op'>-</span><span class='id identifier rubyid_awesome'>awesome</span><span class='op'>-</span><span class='id identifier rubyid_puma'>puma</span><span class='op'>-</span><span class='id identifier rubyid_app'>app</span>
    <span class='id identifier rubyid_spec'>spec</span><span class='op'>:</span>
      <span class='id identifier rubyid_containers'>containers</span><span class='op'>:</span>
      <span class='op'>-</span> <span class='id identifier rubyid_name'>name</span><span class='op'>:</span> <span class='id identifier rubyid_my'>my</span><span class='op'>-</span><span class='id identifier rubyid_awesome'>awesome</span><span class='op'>-</span><span class='id identifier rubyid_puma'>puma</span><span class='op'>-</span><span class='id identifier rubyid_app'>app</span>
        <span class='id identifier rubyid_image'>image</span><span class='op'>:</span> <span class='op'>&lt;</span><span class='id identifier rubyid_your'>your</span> <span class='id identifier rubyid_image'>image</span> <span class='id identifier rubyid_here'>here</span><span class='op'>&gt;</span>
        <span class='id identifier rubyid_ports'>ports</span><span class='op'>:</span>
        <span class='op'>-</span> <span class='id identifier rubyid_containerPort'>containerPort</span><span class='op'>:</span> <span class='int'>9292</span></code></pre>

<h2>Graceful shutdown and pod termination</h2>

<p>For some high-throughput systems, it is possible that some HTTP requests will return responses with response codes in the 5XX range during a rolling deploy to a new version. This is caused by <a href="https://cloud.google.com/blog/products/gcp/kubernetes-best-practices-terminating-with-grace">the way that Kubernetes terminates a pod during rolling deploys</a>:</p>

<ol>
<li>The replication controller determines a pod should be shut down.</li>
<li>The Pod is set to the “Terminating” State and removed from the endpoints list of all Services, so that it receives no more requests.</li>
<li>The pods pre-stop hook get called. The default for this is to send <code>SIGTERM</code> to the process inside the pod.</li>
<li>The pod has up to <code>terminationGracePeriodSeconds</code> (default: 30 seconds) to gracefully shut down. Puma will do this (after it receives SIGTERM) by closing down the socket that accepts new requests and finishing any requests already running before exiting the Puma process.</li>
<li>If the pod is still running after <code>terminationGracePeriodSeconds</code> has elapsed, the pod receives <code>SIGKILL</code> to make sure the process inside it stops. After that, the container exits and all other Kubernetes objects associated with it are cleaned up.</li>
</ol>

<p>There is a subtle race condition between step 2 and 3: The replication controller does not synchronously remove the pod from the Services AND THEN call the pre-stop hook of the pod, but rather it asynchronously sends &quot;remove this pod from your endpoints&quot; requests to the Services and then immediately proceeds to invoke the pods&#39; pre-stop hook. If the Service controller (typically something like nginx or haproxy) receives this request handles this request &quot;too&quot; late (due to internal lag or network latency between the replication and Service controllers) then it is possible that the Service controller will send one or more requests to a Puma process which has already shut down its listening socket. These requests will then fail with 5XX error codes.</p>

<p>The way Kubernetes works this way, rather than handling step 2 synchronously, is due to the CAP theorem: in a distributed system there is no way to guarantee that any message will arrive promptly. In particular, waiting for all Service controllers to report back might get stuck for an indefinite time if one of them has already been terminated or if there has been a net split. A way to work around this is to add a sleep to the pre-stop hook of the same time as the <code>terminationGracePeriodSeconds</code> time. This will allow the Puma process to keep serving new requests during the entire grace period, although it will no longer receive new requests after all Service controllers have propagated the removal of the pod from their endpoint lists. Then, after <code>terminationGracePeriodSeconds</code>, the pod receives <code>SIGKILL</code> and closes down. If your process can&#39;t handle SIGKILL properly, for example because it needs to release locks in different services, you can also sleep for a shorter period (and/or increase <code>terminationGracePeriodSeconds</code>) as long as the time slept is longer than the time that your Service controllers take to propagate the pod removal. The downside of this workaround is that all pods will take at minimum the amount of time slept to shut down and this will increase the time required for your rolling deploy.</p>

<p>More discussions and links to relevant articles can be found in <a href="https://github.com/puma/puma/issues/2343">https://github.com/puma/puma/issues/2343</a>.</p>

<h2>Workers Per Pod, and Other Config Issues</h2>

<p>With containerization, you will have to make a decision about how &quot;big&quot; to make each pod. Should you run 2 pods with 50 workers each? 25 pods, each with 4 workers? 100 pods, with each Puma running in single mode? Each scenario represents the same total amount of capacity (100 Puma processes that can respond to requests), but there are tradeoffs to make.</p>

<ul>
<li>Worker counts should be somewhere between 4 and 32 in most cases. You want more than 4 in order to minimize time spent in request queueing for a free Puma worker, but probably less than ~32 because otherwise autoscaling is working in too large of an increment or they probably won&#39;t fit very well into your nodes. In any queueing system, queue time is proportional to 1/n, where n is the number of things pulling from the queue. Each pod will have its own request queue (i.e., the socket backlog). If you have 4 pods with 1 worker each (4 request queues), wait times are, proportionally, about 4 times higher than if you had 1 pod with 4 workers (1 request queue).</li>
<li>Unless you have a very I/O-heavy application (50%+ time spent waiting on IO), use the default thread count (5 for MRI). Using higher numbers of threads with low I/O wait (&lt;50%) will lead to additional request queueing time (latency!) and additional memory usage.</li>
<li>More processes per pod reduces memory usage per process, because of copy-on-write memory and because the cost of the single master process is &quot;amortized&quot; over more child processes.</li>
<li>Don&#39;t run less than 4 processes per pod if you can. Low numbers of processes per pod will lead to high request queueing, which means you will have to run more pods.</li>
<li>If multithreaded, allocate 1 CPU per worker. If single threaded, allocate 0.75 cpus per worker. Most web applications spend about 25% of their time in I/O - but when you&#39;re running multi-threaded, your Puma process will have higher CPU usage and should be able to fully saturate a CPU core.</li>
<li>Most Puma processes will use about ~512MB-1GB per worker, and about 1GB for the master process. However, you probably shouldn&#39;t bother with setting memory limits lower than around 2GB per process, because most places you are deploying will have 2GB of RAM per CPU. A sensible memory limit for a Puma configuration of 4 child workers might be something like 8 GB (1 GB for the master, 7GB for the 4 children).</li>
</ul>

<div id='footer'></div>
</div> <!-- content  -->
</div> <!-- y_main   -->
</body>
</html>