puma/file.deployment.html

<!DOCTYPE html>
<html>
<head>

<meta charset='UTF-8'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, user-scalable=no'>
<meta name='apple-touch-fullscreen'       content='yes'>
<meta name='apple-mobile-web-app-capable' content='yes'>
<meta name='apple-mobile-web-app-status-bar-style' content='rgba(228,228,228,1.0)'>

<title>File: Deployment engineering for Puma &mdash; Puma master</title>

<link rel='stylesheet'  type='text/css' href='../css/y_fonts.css' />
<link rel='stylesheet'  type='text/css' href='../css/highlight.github.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_style.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_list.css' />
<link rel='stylesheet'  type='text/css' href='../css/y_color.css' />

<script type='text/javascript'>
  var pathId = "deployment",
    relpath = '';

  var t2Info = {
    CSEP: '.',
    ISEP: '#',
    NSEP: '::'
  };
</script>

<script type='text/javascript' charset='utf-8' src='../js/highlight.pack.js'></script>
<script type='text/javascript' charset='utf-8' src='../js/y_app.js'></script>

</head>
<body>
<svg id='y_wait' class viewBox='0 0 90 90'></svg>
<div id='settings' class='hidden'></div>
<div id='y_list' class='d h'>
  <header id='list_header'></header>
  <nav id= 'list_nav' class='y_nav l_nav'>
    <ul id='list_items'></ul>
  </nav>
</div>
<div id='y_toc' class='f h'>
  <header id='toc_header'></header>
  <nav id= 'toc_nav' class='y_nav t_nav'>
  <ol id='toc_items'></ol>
  </nav>
</div>
<div id='y_main' tabindex='-1'>
  <header id='y_header'>
    <div id='y_menu'>
      <a id='home_no_xhr' href='/'>Home</a> &raquo; 
      <a href='.'>Puma master</a> &raquo; 
      <a href='_index.html'>Index</a> &raquo; 
      <span class='title'><a id='t2_doc_top' href='#'>File: Deployment engineering for Puma&nbsp;&#x25B2;</a></span>
          </div>

    <a id='list_href' href="class_list.html"></a>
    <div id='y_measure_em' class='y_measure'></div>
    <div id='y_measure_vh' class='y_measure'></div>
    <span id='y_measure_50pre' class='y_measure'><code>123456789_123456789_123456789_123456789_123456789_</code></span>
  </header>
<div id='content' class='file'>
<h1>Deployment engineering for <a href="Puma.html" title="Puma (module)"><code>Puma</code></a></h1>

<p><a href="Puma.html" title="Puma (module)"><code>Puma</code></a> expects to be run in a deployed environment eventually. You can use it as
your development server, but most people use it in their production deployments.</p>

<p>To that end, this document serves as a foundation of wisdom regarding deploying
<a href="Puma.html" title="Puma (module)"><code>Puma</code></a> to production while increasing happiness and decreasing downtime.</p>

<h2>Specifying Puma</h2>

<p>Most people will specify <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> by including <code>gem &quot;puma&quot;</code> in a Gemfile, so we&#39;ll
assume this is how you&#39;re using <a href="Puma.html" title="Puma (module)"><code>Puma</code></a>.</p>

<h2>Single vs. Cluster mode</h2>

<p>Initially, <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> was conceived as a thread-only web server, but support for
processes was added in version 2.</p>

<p>To run <code>puma</code> in single mode (i.e., as a development environment), set the
number of workers to 0; anything higher will run in cluster mode.</p>

<p>Here are some tips for cluster mode:</p>

<h3>MRI</h3>

<ul>
<li>Use cluster mode and set the number of workers to 1.5x the number of CPU cores
in the machine, starting from a minimum of 2.</li>
<li>Set the number of threads to desired concurrent requests/number of workers.
Puma defaults to 5, and that&#39;s a decent number.</li>
</ul>

<h4>Migrating from Unicorn</h4>

<ul>
<li>If you&#39;re migrating from unicorn though, here are some settings to start with:

<ul>
<li>Set workers to half the number of unicorn workers you&#39;re using</li>
<li>Set threads to 2</li>
<li>Enjoy 50% memory savings</li>
</ul></li>
<li>As you grow more confident in the thread-safety of your app, you can tune the
workers down and the threads up.</li>
</ul>

<h4>Ubuntu / Systemd (Systemctl) Installation</h4>

<p>See <a href="file.systemd.html">systemd.md</a></p>

<h4>Worker utilization</h4>

<p><strong>How do you know if you&#39;ve got enough (or too many workers)?</strong></p>

<p>A good question. Due to MRI&#39;s GIL, only one thread can be executing Ruby code at
a time. But since so many apps are waiting on <a href="IO.html" title="IO (class)"><code>IO</code></a> from DBs, etc., they can
utilize threads to use the process more efficiently.</p>

<p>Generally, you never want processes that are pegged all the time. That can mean
there is more work to do than the process can get through. On the other hand, if
you have processes that sit around doing nothing, then they&#39;re just eating up
resources.</p>

<p>Watch your CPU utilization over time and aim for about 70% on average. 70%
utilization means you&#39;ve got capacity still but aren&#39;t starving threads.</p>

<p><strong>Measuring utilization</strong></p>

<p>Using a timestamp header from an upstream proxy server (e.g., <code>nginx</code> or
<code>haproxy</code>) makes it possible to indicate how long requests have been waiting for
a <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> thread to become available.</p>

<ul>
<li>Have your upstream proxy set a header with the time it received the request:

<ul>
<li>nginx: <code>proxy_set_header X-Request-Start &quot;${msec}&quot;;</code></li>
<li>haproxy &gt;= 1.9: <code>http-request set-header X-Request-Start
t=%[date()]%[date_us()]</code></li>
<li>haproxy &lt; 1.9: <code>http-request set-header X-Request-Start t=%[date()]</code></li>
</ul></li>
<li>In your <a href="Rack.html" title="Rack (module)"><code>Rack</code></a> middleware, determine the amount of time elapsed since
<code>X-Request-Start</code>.</li>
<li>To improve accuracy, you will want to subtract time spent waiting for slow
clients:

<ul>
<li><code>env[&#39;puma.request_body_wait&#39;]</code> contains the number of milliseconds Puma
spent waiting for the client to send the request body.</li>
<li>haproxy: <code>%Th</code> (TLS handshake time) and <code>%Ti</code> (idle time before request)
can can also be added as headers.</li>
</ul></li>
</ul>

<h2>Should I daemonize?</h2>

<p>The Puma 5.0 release removed daemonization. For older versions and alternatives,
continue reading.</p>

<p>I prefer not to daemonize my servers and use something like <code>runit</code> or <code>systemd</code>
to monitor them as child processes. This gives them fast response to crashes and
makes it easy to figure out what is going on. Additionally, unlike <code>unicorn</code>,
<a href="Puma.html" title="Puma (module)"><code>Puma</code></a> does not require daemonization to do zero-downtime restarts.</p>

<p>I see people using daemonization because they start puma directly via Capistrano
task and thus want it to live on past the <code>cap deploy</code>. To these people, I say:
You need to be using a process monitor. Nothing is making sure <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> stays up in
this scenario! You&#39;re just waiting for something weird to happen, <a href="Puma.html" title="Puma (module)"><code>Puma</code></a> to die,
and to get paged at 3 AM. Do yourself a favor, at least the process monitoring
your OS comes with, be it <code>sysvinit</code> or <code>systemd</code>. Or branch out and use <code>runit</code>
or hell, even <code>monit</code>.</p>

<h2>Restarting</h2>

<p>You probably will want to deploy some new code at some point, and you&#39;d like
<a href="Puma.html" title="Puma (module)"><code>Puma</code></a> to start running that new code. There are a few options for restarting
<a href="Puma.html" title="Puma (module)"><code>Puma</code></a>, described separately in our <a href="file.restart.html">restart documentation</a>.</p>

<div id='footer'></div>
</div> <!-- content  -->
</div> <!-- y_main   -->
</body>
</html>