Skip to content

Commit

Permalink
(docs) Extend alerting page with flow failure notifications (#409)
Browse files Browse the repository at this point in the history
* (docs) Extend alerting page with flow failure notifications

* fix image

* Update 02.storage.md

* Update index.md
  • Loading branch information
anna-geller authored Aug 4, 2023
1 parent 1ac605f commit afba2e2
Show file tree
Hide file tree
Showing 21 changed files with 209 additions and 85 deletions.
2 changes: 1 addition & 1 deletion content/blogs/2022-02-01-kestra-opensource.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Today, our team is proud to announce a first public release of Kestra, an open-s


## What is Kestra?
Kestra is :
Kestra is:
- **an orchestrator**: Build a complex pipeline in couple of minutes.
- **a scheduler**: Launch your flows whatever your need!
- **a rich ui**: Create, run, and monitor all your flows with a real-time user interface.
Expand Down
2 changes: 1 addition & 1 deletion content/blogs/2022-04-05-debezium-without-kafka-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Moreover, even in the most simplified Debezium deployment, there are typically a

Although Debezium shines as an efficient and performance-oriented solution in real-time Change Data Capture (CDC) use cases, this perpetual operation can become a resource drain. The always-on nature of Debezium's connectors may lead to an overuse of resources, especially in scenarios where changes are not frequent or the data volume is low. Hence, while Debezium's continuous monitoring feature is an asset in certain scenarios, it may be an overkill and resource drain in others.

For example, from [Amazon MSK connect](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-connectors.html) documentation :
For example, from [Amazon MSK connect](https://docs.aws.amazon.com/msk/latest/developerguide/msk-connect-connectors.html) documentation:
> Each MCU represents 1 vCPU of compute and 4 GiB of memory.
<p align="center">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ At first, we designed Kestra to have only one **huge** stream for all the proces

Here is the last version of our main and only Kafka Stream with many topics 🙉:
![Kestra Topology](/blogs/2023-02-23-techniques-kafka-streams-developer/topology.jpg)
Yes, this is a huge Kafka Stream. It was working well despite its complexity. But the major drawbacks were :
Yes, this is a huge Kafka Stream. It was working well despite its complexity. But the major drawbacks were:
- **Monitoring**: All the metrics are under the same consumer group.
- **Debugging**: Each topic is consumed independently during a crash. When a message fails, the whole process crashes.
- **Lag**: This is the most important one. Since Kafka Streams optimize the consumption of messages by themselves, a topic with large outputs could lead to lag on unrelated topics. In that case, it is impossible to understand the lag on our consumers.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Data Warehouse solutions can sometimes become costly, especially with an uptick

## Track BigQuery usage ##

BigQuery is very fast and easy to use. But it can be very expensive, especially when running scheduled queries : you’re not here to look at the bytes billed every time you’re query run.
BigQuery is very fast and easy to use. But it can be very expensive, especially when running scheduled queries: you’re not here to look at the bytes billed every time you’re query run.

In the example below, we create a Kestra task to run a query on a Big Query table. To schedule our query, we just add a Kestra Trigger with a cron property allowing us to run the query every hour.

Expand Down Expand Up @@ -65,7 +65,7 @@ For instance, a spike in bytes processed during certain hours may suggest the ne

The Metric Dashboard feature goes beyond just tracking cloud resource usage. Its versatility allows you to monitor a broad array of metrics, from ones like BigQuery to custom metrics that cater to your specific data processing tasks. You can gain insights into your ETL processes, track the number of processed lines, monitor processing time per batch, and so much more.

In future releases we will add [proper SLA capabilities](https://github.com/kestra-io/kestra/issues/1246) : the idea would be to let users use the Metrics presented in this article as a SLA to stop a Flow to run or trigger an alert for example. You will be also able to have a complete dashboard with all the metrics gathered by Kestra with an overview on what happen in your data pipeline.
In future releases we will add [proper SLA capabilities](https://github.com/kestra-io/kestra/issues/1246): the idea would be to let users use the Metrics presented in this article as a SLA to stop a Flow to run or trigger an alert for example. You will be also able to have a complete dashboard with all the metrics gathered by Kestra with an overview on what happen in your data pipeline.

For a deeper exploration of the potential applications of the Metric Dashboard you can learn more with our [documentation](https://kestra.io/docs/plugin-developer-guide/outputs#use-cases-for-metrics).

Expand Down
2 changes: 1 addition & 1 deletion content/docs/05.developer-guide/03.scripts.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ tasks:
- id: custom-dependencies
type: io.kestra.plugin.scripts.python.Script
runner: PROCESS
script : |
script: |
import pandas as pd
import requests
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The `jq` filter apply a [JQ expression](https://stedolan.github.io/jq/) to a var
```


Another example, if the current context is :
Another example, if the current context is:
```json
{
"outputs": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ title: Date functions


#### Arguments:
- `format`: Format parameters is one of :
- `format`: Format parameters is one of:
- `full`: Sunday, September 8, 2013 at 4:19:12 PM Central European Summer Time
- `long`: September 8, 2013 at 4:19:12 PM CEST
- `medium`: Sep 8, 2013, 4:19:12 PM
Expand Down Expand Up @@ -76,7 +76,7 @@ title: Date functions
{{ dateAdd yourDate -1 "DAYS" }}
```
- `quantity`: an integer value positive or negative
- `format`: Format parameters is one of :
- `format`: Format parameters is one of:
- `NANOS`
- `MICROS`
- `MILLIS`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Convert an object to is JSON representation
{{json output['task-id']}}
```

Example, if the current context is :
Example, if the current context is:
```json
{
"outputs": {
Expand Down Expand Up @@ -43,7 +43,7 @@ Internally, [Jackson JQ](https://github.com/eiiches/jackson-jq) is used and supp
::


Example, if the current context is :
Example, if the current context is:
```json
{
"outputs": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ One special case for input variables is the `FILE` type, where the file is prepe
One of Kestra's most important abilities is to use all outputs from previous tasks in the next one.

### Without dynamic tasks (Each)
This is the simplest and most common way to use outputs in the next task. In order to fetch a variable, just use `{{ outputs.ID.NAME }}` where :
This is the simplest and most common way to use outputs in the next task. In order to fetch a variable, just use `{{ outputs.ID.NAME }}` where:
* `ID` is the task id
* `NAME` is the name of the output. Each task type can have any outputs that are documented on the part outputs of their docs. For example, [Bash task](../../../../plugins/core/tasks/scripts/io.kestra.core.tasks.scripts.Bash.md#outputs) can have `{{ outputs.ID.exitCode }}`, `{{ outputs.ID.outputFiles }}`, `{{ outputs.ID.stdErrLineCount }}`, etc...

Expand Down
2 changes: 1 addition & 1 deletion content/docs/05.developer-guide/05.outputs.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ tasks:
value: ["s1", "s2", "s3"]
- id: use
type: io.kestra.core.tasks.debugs.Return
format: "Previous task produced output : {{ outputs.sub.s1.value }}"
format: "Previous task produced output: {{ outputs.sub.s1.value }}"
```

The `outputs.sub.s1.value` variable reaches the `value` of the `sub` task of the `s1` iteration.
Expand Down
2 changes: 1 addition & 1 deletion content/docs/05.developer-guide/07.errors-handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ The following example defines a retry for the `retry-sample` task with a maximum
### Duration

Some options above have to be filled with a duration notation.
Durations are expressed in [ISO 8601 Durations](https://en.wikipedia.org/wiki/ISO_8601#Durations), here are some examples :
Durations are expressed in [ISO 8601 Durations](https://en.wikipedia.org/wiki/ISO_8601#Durations), here are some examples:

| name | description |
| ---------- | ----------- |
Expand Down
2 changes: 1 addition & 1 deletion content/docs/05.developer-guide/12.best-practice.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The execution of a flow is an object that will contain:
- Theirs outputs
- Their state histories

Here is an example of a TaskRun :
Here is an example of a TaskRun:
```json
{
"id": "5cBZ1JF8kim8fbFg13bumX",
Expand Down
81 changes: 47 additions & 34 deletions content/docs/09.administrator-guide/01.configuration/02.storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,81 +2,70 @@
title: Storage configuration
---

Kestra needs an [internal storage](../../08.architecture.md#the-internal-storage) to store data processed by flow tasks (files from flow inputs and data stored as task outputs).
Kestra needs an [internal storage](../../08.architecture.md#the-internal-storage) to store data processed by tasks. This includes files from flow inputs and data stored as task outputs.

The default internal storage implementation is the local storage which is **not suitable for production** as it will store data inside a local folder on the host filesystem.

This local storage can be configures with:
This local storage can be configured as follows:

```yaml
kestra:
storage:
type: local
local:
base-path: /tmp/kestra/storage/
base-path: /tmp/kestra/storage/ # your custom path
```
Other internal storage types are:
- [Storage GCS](#gcs) for [Google Cloud Storage](https://cloud.google.com/storage)
- [Storage Minio](#minio) compatible with [AWS S3](https://aws.amazon.com/s3/) and all others *S3 like* storage
Other internal storage types include:
- [Storage Minio](#minio) compatible with [AWS S3](https://aws.amazon.com/s3/) and all others *S3 like* storage services
- [Storage Azure](#azure) for [Azure Blob Storage](https://azure.microsoft.com/en-us/services/storage/blobs/)
- [Storage GCS](#gcs) for [Google Cloud Storage](https://cloud.google.com/storage)
## GCS
First, you need to be sure to have the GCS storage plugin installed. You can install it with the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-gcs:LATEST`, it will download the plugin jar in the Kestra plugins directory.
Then, you need to enable the storage with this configuration:
## S3
```yaml
First, make sure that the Minio storage plugin is installed in your environment. You can install it with the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-minio:LATEST`. This command will download the plugin's jar file into the plugins directory.

Then, enable the storage using the following configuration:

```yaml
kestra:
storage:
type: gcs
gcs:
bucket: "<your-bucket-name>"
service-account: "<service-account key as JSON or use default credentials>"
project-id: "<project-id or use default projectId>"
type: minio
minio:
accessKey: "<your-aws-access-key-id>"
secretKey: "<your-aws-secret-access-key>"
region: "<your-aws-region>"
bucket: "<your-s3-bucket-name>"
```

If you didn't configure the `kestra.storage.gcs.service-account` option, Kestra will use the default service account, meaning that it will:
- use the service account defined on the cluster for GKE.
- use the service account defined on the compute instance for GCE.

You can also provide the environment variable `GOOGLE_APPLICATION_CREDENTIALS` with a path to a JSON GCP service account key.

More details can be found [here](https://cloud.google.com/docs/authentication/production).

## Minio

First, you need to be sure to have the Minio storage plugin installed. You can install it with the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-minio:LATEST`, it will download the plugin jar in the Kestra plugins directory.

Then, you need to enable the storage with this configuration:
If you use Minio or similar S3-compatible storage options, you can follow the same process as shown above to install the Minio storage plugin. Then, make sure to include the Minio's `endpoint` and `port` in the storage configuration:

```yaml
kestra:
storage:
type: minio
minio:
endpoint: "<your-endpoint>"
port: "<your-port>"
secure: "<your-secure>"
accessKey: "<your-accessKey>"
secretKey: "<your-secretKey>"
region: "<your-region>"
secure: "<your-secure>"
bucket: "<your-bucket>"
```

## Azure

First, you need to be sure to have the Azure storage plugin installed. You can install it with the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-azure:LATEST`, it will download the plugin jar in the Kestra plugins directory.
First, install the Azure storage plugin. To do that, you can leverage the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-azure:LATEST`. This command will download the plugin's jar file into the plugins directory.

Then, you need to enable the storage with this configuration (adapt depending on authentication method):
Adjust the storage configuration shown below depending on your chosen authentication method:

```yaml
kestra:
storage:
type: azure
Expand All @@ -88,3 +77,27 @@ kestra:
shared-key-account-access-key: "<shared-key-account-access-key>"
sas-token: "<sas-token>"
```

## GCS
You can install the GCS storage plugin using the following Kestra command:
`./kestra plugins install io.kestra.storage:storage-gcs:LATEST`. This command will download the plugin's jar file into the plugins directory.

Then, you can enable the storage using the following configuration:

```yaml
kestra:
storage:
type: gcs
gcs:
bucket: "<your-bucket-name>"
service-account: "<service-account key as JSON or use default credentials>"
project-id: "<project-id or use default projectId>"
```

If you haven't configured the `kestra.storage.gcs.service-account` option, Kestra will use the default service account, which is:
- the service account defined on the cluster (for GKE deployments)
- the service account defined on the compute instance (for GCE deployments).

You can also provide the environment variable `GOOGLE_APPLICATION_CREDENTIALS` with a path to a JSON file containing GCP service account key.

You can find more details in the [GCP documentation](https://cloud.google.com/docs/authentication/production).
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: Worker Isolation configuration

By default, Kestra uses a [shared worker](../../../08.architecture.md#worker) to handle workloads. This is fine for most use cases, but when you are using a shared Kestra instance between multiple teams, since the worker shares the same file system, this can allow people to access temporary files created by Kestra with powerful tasks like [Groovy](../../../../plugins/plugin-script-groovy/tasks/io.kestra.plugin.scripts.groovy.Eval.md), [Jython](../../../../plugins/plugin-script-jython/tasks/io.kestra.plugin.scripts.jython.Eval.md), etc...

You can use the following to opt-in to real isolation of file systems using advanced Kestra EE Java security :
You can use the following to opt-in to real isolation of file systems using advanced Kestra EE Java security:

```yaml
kestra:
Expand Down
4 changes: 2 additions & 2 deletions content/docs/09.administrator-guide/02.deployment/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@
title: Installation guide
---

We provide different ways to deploy Kestra, the recommended ways are :
We provide different ways to deploy Kestra, the recommended ways are:
- [Docker](./01.docker.md) for a local installation.
- [Kubernetes](./02.kubernetes.md) for a production installation.

More information :
More information:

<ChildTableOfContents />
Loading

1 comment on commit afba2e2

@vercel
Copy link

@vercel vercel bot commented on afba2e2 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

kestra-io – ./

kestra-io-kestra.vercel.app
kestra-io.vercel.app
kestra-io-git-main-kestra.vercel.app

Please sign in to comment.