Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUD-3578 Improve Kubernetes Probes docs #7264

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions docs/content/3.middleware/2.guides/9.kubernetes-probes.md
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this document is related to applications AND middleware, it should probably be moved to the Alokai Cloud docs section instead of staying in the middleware section.

Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Kubernetes Probes
sethidden marked this conversation as resolved.
Show resolved Hide resolved

Alokai Cloud customers' middleware and frontend apps are deployed in Kubernetes. This document explains the Kubernetes mechanisms used by Alokai Cloud to ensure that customers' applications are starting, running and exiting correctly. Implementing your application so that it works in accordance with those mechanisms ensures better detectability of issues and lower error rates of your deployment.

In order for some of those mechanisms to work, you - as the application developer - need to ensure certain REST endpoints exist in your application and that they respond with the correct status code. The sections below explain how to implement those endpoints.

## Liveness probes

### The purpose of liveness probes

Without any kind of probes set up for your application, the only failure scenario where your app will be considered as not working correctly is if the process exits on startup or any time during the application runtime. For example, if you run `npm start` but the application secrets are missing from the environment, most applications' processes will immediately exit. The operating system could also kill the process due to lack of memory.

Liveness probes can help you recover the application from a broken state when only restart can solve the problem. The application gets probed every few seconds to determine if the server can respond. If a timeout or error response is received instead of a success status code, the app is considered to be in a dead state (e.g. caught in an infinite loop due to an edge-case and developer error) and gets restarted.

This helps your application keep handling traffic despite an issue that causes it to lock, which gives you time to fix the underlying issue without impacting company operations (as much).

### Implementation of liveness probes in your application

In order for your application to be covered by liveness probes, you need to ensure that it responds with a `HTTP 200 OK` status code to a GET request on a `[your app URL]/healthz` endpoint.

In the case of the Alokai middleware - as of version 3.0.0 of the `@vue-storefront/middleware` package, a liveness probe is enabled by default. You can launch the middleware locally and send a GET request to `http://localhost:4000/healthz` endpoint. The response will be a HTTP 200 OK containing the body "`ok`".

In the case of the frontend apps, `/healthz` endpoints are also present in Nuxt and Next templates generated from the Alokai CLI. If you will be deploying your own custom fronted app to Alokai Cloud, and your app is not generated from Alokai CLI, you will need to add the `/healthz` endpoint manually.

<!-- https://github.com/search?q=repo%3Avuestorefront%2Funified-storefronts%20healthz&type=code -->

::danger
The below instructions help you create a simple `/healthz` endpoint that responds with a HTTP 200 OK status code and the text `ok` in the response body. You may be tempted to instead make `/healthz` a real application route in your app, which if queried will respond with the HTML of your actual application. At first glance it may seem more robust, as in theory it tests a larger part of your application stack.
<br>
In reality, requesting a full app route as part of a liveness check - especially during periods of heavy traffic - can lead to new connection being opened that will never be resolved. A full page app can take hundreds of miliseconds to respond and contain a few kilobytes of payload. The simple route described below will respond in a few miliseconds with a two byte body size.
<br>
Do not make `/healthz` or `/readyz` a full app route. Instead keep it as simple as possible, by following the instructions below. Do not consider liveness probes as a fully fledged application smoke test, but as a simple check if the app can serve a basic HTTP request.
::

#### Nuxt

In Nuxt, to create a `/healthz` endpoint, use [server endpoints](https://nuxt.com/docs/getting-started/server#server-endpoints-middleware). Create a <nobr>`[Nuxt fronted app folder]/server/routes/healthz.ts`</nobr> file with the following contents:
```ts[server/routes/healthz.ts]
export default defineEventHandler(() => 'ok');
```


#### Next: App router
If using Next.js with app router, use [route handlers](https://nextjs.org/docs/app/api-reference/file-conventions/route):
::steps
#step-1
Create a `[Next app directory]/app/healthz/route.ts` file

#step-2
Paste the following content inside
```ts[app/healthz/route.ts]
import { NextResponse } from 'next/server';

export function GET() {
return NextResponse.json({ status: 'ok' }, { status: 200 });
}
```
::

#### Next: Pages router
If using Next.js with pages router, use [rewrites](https://nextjs.org/docs/pages/api-reference/next-config-js/rewrites) together with [API routes](https://nextjs.org/docs/pages/building-your-application/routing/api-routes):

::steps
#step-1
Add the following code to your `next.config.js`:
```ts{3-10}[next.config.js]
module.exports = {
// ...
async rewrites() {
return [
{
source: '/healthz',
destination: '/api/healthz',
},
];
},
}
```
This is necessary because by default Next's API routes have an `/api` subpath, but we need it to be <nobr>`/healthz`</nobr> and not <nobr>`/api/healthz`</nobr>.

#step-2
Create a `[Next app directory]/pages/api/healthz.ts` file

#step-3
Paste the following content inside:
```ts[pages/api/healthz.ts]
import { NextApiRequest, NextApiResponse } from 'next';

export default function handler(_req: NextApiRequest, res: NextApiResponse) {
res.status(200).send('ok');
}
```
::

## Readiness probes

### The purpose of readiness probes

Applications in Kubernetes are often hosted in such a way that multiple duplicate instances of an application exist side-by-side simultaneously. Such a duplicate instance is called a replica. Readiness probes allow an application replica to temporarily mark itself as unable to serve requests in a Kubernetes cluster. A *liveness* probe can pass while a *readiness* probe fails - meaning that in general, the application is up, but is still waiting for something to happen so that it can serve requests (e.g. waiting for some secondary, dependent service to become online, like Redis cache).

The `/readyz` endpoints of your application is queried automatically by Alokai Cloud every few seconds to check whether requests should be routed to the queried application replica. One such case - where traffic will should stop directed to a replica - is if an application instance is being killed (if it receives a `SIGTERM` signal). If such signal is received, the replica should stop accepting *new* traffic, but wait a few seconds before shutting down to handle any *old* requests that are still being handled by that particular replica.

You can read more about Kubernetes readiness probes in the [official documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes).

### Built-in middleware readiness probes

As of version 5.0.0 of the `@vue-storefront/middleware` package, you can launch the middleware locally and send a GET request to the `http://localhost:4000/readyz` endpoint. The response will contain either a success message or a list of errors describing why the readiness probe failed.

To add custom readiness probes to the built-in `@vue-storefront/middleware` readiness probe feature, pass them to the `readinessProbes` property when calling `createServer`.

```ts
const customReadinessProbe = async () => {
const dependentServiceRunning = await axios.get('http://someservice:3000/healthz');
if(dependentServiceRunning.status !== 200) {
throw new Error('Service that the middleware depends on is offline. The middleware is temporarily not ready to accept connections.')
}
}
const app = await createServer(config, { readinessProbes: [customReadinessProbe]});
```

In order for custom readiness probes to be implemented correctly, they need to do two things:
1. they must all be async or return a promise (the return value is not checked, it's expected to be void/undefined)
2. they must all throw an exception when you want a readiness probe to fail

### Implementation of readiness probes in your own application

Readiness probes are more difficult to implement than liveness probes. In liveness probes, a simple stateless REST endpoint handler was sufficient. Readiness probes, on the other hand, need to monitor the signals that the application process received. In addition to that, you can write your own readiness conditions, such as checking if an external service that your application depends on is online.

If your application uses Node's `http` module to serve HTTP requests, you can use the [`@godaddy/terminus`](https://www.npmjs.com/package/@godaddy/terminus) NPM package to implement readiness checks.

29 changes: 0 additions & 29 deletions docs/content/3.middleware/2.guides/9.readiness-probes.md

This file was deleted.

Loading