-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUD-3578 Improve Kubernetes Probes docs #7264
Open
sethidden
wants to merge
2
commits into
main
Choose a base branch
from
CLOUD-3578-improve-kubernetes-probes-docs
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+130
−29
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
130 changes: 130 additions & 0 deletions
130
docs/content/3.middleware/2.guides/9.kubernetes-probes.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# Kubernetes Probes | ||
sethidden marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Alokai Cloud customers' middleware and frontend apps are deployed in Kubernetes. This document explains the Kubernetes mechanisms used by Alokai Cloud to ensure that customers' applications are starting, running and exiting correctly. Implementing your application so that it works in accordance with those mechanisms ensures better detectability of issues and lower error rates of your deployment. | ||
|
||
In order for some of those mechanisms to work, you - as the application developer - need to ensure certain REST endpoints exist in your application and that they respond with the correct status code. The sections below explain how to implement those endpoints. | ||
|
||
## Liveness probes | ||
|
||
### The purpose of liveness probes | ||
|
||
Without any kind of probes set up for your application, the only failure scenario where your app will be considered as not working correctly is if the process exits on startup or any time during the application runtime. For example, if you run `npm start` but the application secrets are missing from the environment, most applications' processes will immediately exit. The operating system could also kill the process due to lack of memory. | ||
|
||
Liveness probes can help you recover the application from a broken state when only restart can solve the problem. The application gets probed every few seconds to determine if the server can respond. If a timeout or error response is received instead of a success status code, the app is considered to be in a dead state (e.g. caught in an infinite loop due to an edge-case and developer error) and gets restarted. | ||
|
||
This helps your application keep handling traffic despite an issue that causes it to lock, which gives you time to fix the underlying issue without impacting company operations (as much). | ||
|
||
### Implementation of liveness probes in your application | ||
|
||
In order for your application to be covered by liveness probes, you need to ensure that it responds with a `HTTP 200 OK` status code to a GET request on a `[your app URL]/healthz` endpoint. | ||
|
||
In the case of the Alokai middleware - as of version 3.0.0 of the `@vue-storefront/middleware` package, a liveness probe is enabled by default. You can launch the middleware locally and send a GET request to `http://localhost:4000/healthz` endpoint. The response will be a HTTP 200 OK containing the body "`ok`". | ||
|
||
In the case of the frontend apps, `/healthz` endpoints are also present in Nuxt and Next templates generated from the Alokai CLI. If you will be deploying your own custom fronted app to Alokai Cloud, and your app is not generated from Alokai CLI, you will need to add the `/healthz` endpoint manually. | ||
|
||
<!-- https://github.com/search?q=repo%3Avuestorefront%2Funified-storefronts%20healthz&type=code --> | ||
|
||
::danger | ||
The below instructions help you create a simple `/healthz` endpoint that responds with a HTTP 200 OK status code and the text `ok` in the response body. You may be tempted to instead make `/healthz` a real application route in your app, which if queried will respond with the HTML of your actual application. At first glance it may seem more robust, as in theory it tests a larger part of your application stack. | ||
<br> | ||
In reality, requesting a full app route as part of a liveness check - especially during periods of heavy traffic - can lead to new connection being opened that will never be resolved. A full page app can take hundreds of miliseconds to respond and contain a few kilobytes of payload. The simple route described below will respond in a few miliseconds with a two byte body size. | ||
<br> | ||
Do not make `/healthz` or `/readyz` a full app route. Instead keep it as simple as possible, by following the instructions below. Do not consider liveness probes as a fully fledged application smoke test, but as a simple check if the app can serve a basic HTTP request. | ||
:: | ||
|
||
#### Nuxt | ||
|
||
In Nuxt, to create a `/healthz` endpoint, use [server endpoints](https://nuxt.com/docs/getting-started/server#server-endpoints-middleware). Create a <nobr>`[Nuxt fronted app folder]/server/routes/healthz.ts`</nobr> file with the following contents: | ||
```ts[server/routes/healthz.ts] | ||
export default defineEventHandler(() => 'ok'); | ||
``` | ||
|
||
|
||
#### Next: App router | ||
If using Next.js with app router, use [route handlers](https://nextjs.org/docs/app/api-reference/file-conventions/route): | ||
::steps | ||
#step-1 | ||
Create a `[Next app directory]/app/healthz/route.ts` file | ||
|
||
#step-2 | ||
Paste the following content inside | ||
```ts[app/healthz/route.ts] | ||
import { NextResponse } from 'next/server'; | ||
|
||
export function GET() { | ||
return NextResponse.json({ status: 'ok' }, { status: 200 }); | ||
} | ||
``` | ||
:: | ||
|
||
#### Next: Pages router | ||
If using Next.js with pages router, use [rewrites](https://nextjs.org/docs/pages/api-reference/next-config-js/rewrites) together with [API routes](https://nextjs.org/docs/pages/building-your-application/routing/api-routes): | ||
|
||
::steps | ||
#step-1 | ||
Add the following code to your `next.config.js`: | ||
```ts{3-10}[next.config.js] | ||
module.exports = { | ||
// ... | ||
async rewrites() { | ||
return [ | ||
{ | ||
source: '/healthz', | ||
destination: '/api/healthz', | ||
}, | ||
]; | ||
}, | ||
} | ||
``` | ||
This is necessary because by default Next's API routes have an `/api` subpath, but we need it to be <nobr>`/healthz`</nobr> and not <nobr>`/api/healthz`</nobr>. | ||
|
||
#step-2 | ||
Create a `[Next app directory]/pages/api/healthz.ts` file | ||
|
||
#step-3 | ||
Paste the following content inside: | ||
```ts[pages/api/healthz.ts] | ||
import { NextApiRequest, NextApiResponse } from 'next'; | ||
|
||
export default function handler(_req: NextApiRequest, res: NextApiResponse) { | ||
res.status(200).send('ok'); | ||
} | ||
``` | ||
:: | ||
|
||
## Readiness probes | ||
|
||
### The purpose of readiness probes | ||
|
||
Applications in Kubernetes are often hosted in such a way that multiple duplicate instances of an application exist side-by-side simultaneously. Such a duplicate instance is called a replica. Readiness probes allow an application replica to temporarily mark itself as unable to serve requests in a Kubernetes cluster. A *liveness* probe can pass while a *readiness* probe fails - meaning that in general, the application is up, but is still waiting for something to happen so that it can serve requests (e.g. waiting for some secondary, dependent service to become online, like Redis cache). | ||
|
||
The `/readyz` endpoints of your application is queried automatically by Alokai Cloud every few seconds to check whether requests should be routed to the queried application replica. One such case - where traffic will should stop directed to a replica - is if an application instance is being killed (if it receives a `SIGTERM` signal). If such signal is received, the replica should stop accepting *new* traffic, but wait a few seconds before shutting down to handle any *old* requests that are still being handled by that particular replica. | ||
|
||
You can read more about Kubernetes readiness probes in the [official documentation](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes). | ||
|
||
### Built-in middleware readiness probes | ||
|
||
As of version 5.0.0 of the `@vue-storefront/middleware` package, you can launch the middleware locally and send a GET request to the `http://localhost:4000/readyz` endpoint. The response will contain either a success message or a list of errors describing why the readiness probe failed. | ||
|
||
To add custom readiness probes to the built-in `@vue-storefront/middleware` readiness probe feature, pass them to the `readinessProbes` property when calling `createServer`. | ||
|
||
```ts | ||
const customReadinessProbe = async () => { | ||
const dependentServiceRunning = await axios.get('http://someservice:3000/healthz'); | ||
if(dependentServiceRunning.status !== 200) { | ||
throw new Error('Service that the middleware depends on is offline. The middleware is temporarily not ready to accept connections.') | ||
} | ||
} | ||
const app = await createServer(config, { readinessProbes: [customReadinessProbe]}); | ||
``` | ||
|
||
In order for custom readiness probes to be implemented correctly, they need to do two things: | ||
1. they must all be async or return a promise (the return value is not checked, it's expected to be void/undefined) | ||
2. they must all throw an exception when you want a readiness probe to fail | ||
|
||
### Implementation of readiness probes in your own application | ||
|
||
Readiness probes are more difficult to implement than liveness probes. In liveness probes, a simple stateless REST endpoint handler was sufficient. Readiness probes, on the other hand, need to monitor the signals that the application process received. In addition to that, you can write your own readiness conditions, such as checking if an external service that your application depends on is online. | ||
|
||
If your application uses Node's `http` module to serve HTTP requests, you can use the [`@godaddy/terminus`](https://www.npmjs.com/package/@godaddy/terminus) NPM package to implement readiness checks. | ||
|
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this document is related to applications AND middleware, it should probably be moved to the Alokai Cloud docs section instead of staying in the middleware section.