This sample demonstrates how to send Apache Airflow logs to an Azure Log Analytics workspace using the Azure Monitor Log Ingestion API.
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. By default, Airflow supports logging into the local file system, which is suitable for development environments and for quick debugging. For production scenarios it is recommended to use persistent logs storage. Thus, in this sample there is an end-to-end setup of sending Airflow logs into Azure Monitor, so they can be easily analyzed.
This sample shows two parts:
- Airflow configuration to send logs to Azure Blob Storage (it is based on this documentation),
- Ingesting logs from Azure Blob Storage into Azure Monitor using a new ETL-like data collection pipeline with Data Collection Rule, Data Collection Endpoint and Logs Ingestion API.
Apache Airflow runs as a Docker compose project, and the required Azure infrastructure is created using terraform.
Whenever an Apache workflow is triggered a new log file is created and uploaded to Azure Blob Storage.
This causes a LogIngestion
Azure Function to be triggered.
The function uses the Azure Monitor Ingestion client library for .NET. It sets up a Logs Ingestion client with previously created data collection endpoint. And it calls the Logs Ingestion API specifying parsed data from Blob Storage files (aka source data) and a data collection rule id.
The data collection rule understands the structure of the source data, transforms the data to a format expected by the target table, and specifies a Log Analytics workspace and table to send the transformed data (it is a custom Log Analytics table called AirflowLogs_CL
).
For more details about the Logs Ingestion API, check this documentation.
- Docker
- Azure subscription
- Azure CLI (at least the version 2.47)
- Terraform (at least the version 1.5.4)
- .NET (at least the version 6.0)
To prepare the required Azure infrastructure and start Apache Airflow components, run:
cd src
./run_all_in_one.sh
Follow the steps returned in the output of the previous step and log in to Airflow.
You will see tree available DAG workflows. Open one of them by clicking on its name:
Trigger it with Trigger DAG
button.
There should be at least one task executed, click on it to make the Logs button appear.
Click the Logs button and you should see Found remote logs
message, which indicates that the log file was sent to a configured Azure Blob Storage.
Go to the Azure portal, you should see a new resource group starting with rg-logs-ingestion
. Check that the Log Ingestion Azure Function was triggered successfully (usually there is a 5 min delay until past executions show up):
Finally, open the Log Analytics workspace from the same resource group and run the query against the AirflowLogs_CL
table, where you will find Airflow logs.
- Open command line and navigate to directory containing the infra code:
cd src/infra
- Login into your Azure subscription:
az login
- Init terraform:
terraform init
- Provision infrastructure:
terraform apply
- Use the returned values in your
.env
file:terraform output -json
To start application
cd src
docker compose up --build -d
Follow steps from the Generate and view logs section.
To delete the Azure infrastructure, stop and delete Airflow containers, cached Docker images and volumes execute:
./remove_all.sh