Apache Airflow™ is an open-source platform for orchestrating batch workflows. In Airflow, pipelines are defined in Python code as directed acyclic graphs (DAGs), meaning you can generate workflows dynamically and connect them with virtually any technology, either through ready-made packages from third-party providers or your own extensions. A modular architecture of Airflow and a built-in message queue ensures its high scalability. Airflow’s user interface provides overviews and in-depth views of pipelines and tasks.
Scalable open-source tool for orchestrating batch workflows in Python.
- Developing, maintaining, and scheduling workflows as code.
- Connecting workflows with other technologies.
- Monitoring workflows and tasks.
{% note warning %}
If you are going to use this product in production, we recommend to configure it according to the Airflow recommendations.
{% endnote %}
Before installing this product:
-
Generate a secret key for the Airflow webserver:
python3 -c 'import secrets; print(secrets.token_hex(16))'
For each Managed Service for Kubernetes cluster that you install Airflow on, you should generate a new webserver secret key. For more details, see the Airflow documentation.
-
Create a repository with your DAGs on GitHub or other platform.
To install the product:
- Configure the application.
- Click Install.
- Wait for the application to change its status to
Deployed
.
-
Install kubectl and configure it to work with the created cluster.
-
To check that Airflow is working, access its UI:
-
Set up port forwarding:
kubectl -n <namespace> port-forward \ services/<application_name>-webserver 8080:8080
-
Go to http://localhost:8080 in your browser and log into the UI as
admin
using the password you created earlier. -
After logging in, reset the admin's password: under the user picture, click Your profile → Reset my password.
-
By using the application, you agree to their terms and conditions: the helm-chart and Apache Airflow™.