Airflow Munchkin is a simple code generator that improves first stages of developing Airflow operators for Google services including Google Cloud Platform and Marketing Platform. Moreover we like cats, thus this name.
- Free software: MIT license
- Documentation: https://airflow-munchkin.readthedocs.io.
Munchkin is a code generator that helps developers to scaffold operators and hooks. By using Munchkin you will get:
- hook class (including method descriptions, arguments types etc.)
- operators classes (including descriptions, arguments types and execute method)
- base unit tests for both, hook and operators
- example DAG with links to how to guide
- skeleton of howto.rst that should include information how end user can use the operators
- short information that should be added to
airflow.docs.integration.rst
- skeleton of system test for operators
In other words, you get everything that can be seen as a "boring work".
Munchkin does not perform the interesting part of implementing an operator which includes:
- making operators idempotent
- handling exceptions
- converting an operator to a sensor (if required)
- adding nice how-to information
It's very simple. Here is a step by step guide:
- Select a Google service
- Determine if the service has a Python client (you can check it here)
- If a client exist and it has a method you want to use then you should use Munchkin for client
- If there's no Python client then the operators will be based on the Discovery API - in this case you have to determine the API endpoint using the explorer. If you can't find the service, use Google to find myService API to determine the path used in REST requests. Finally use Munchkin for discovery.
Generator for Python clients is located under airflow_munchkin.main_client
. To use it you have to modify
the Integration information in main function:
integration_info: Integration = Integration(
service_name="Cloud Memorystore",
class_prefix="CloudMemorystore",
file_prefix="gcp_cloud_memorystore",
client_path="google.cloud.redis_v1.CloudRedisClient",
)
The most important part of the integration is client_path
which indicates the 'client' object. Additionally
you can define class and file prefixes and service name.
Generator for Discovery API is located under airflow_munchkin.main_discovery
. To use it you have to modify
the Integration information in main
function:
integration = DiscoveryIntegration(
api_path="doubleclickbidmanager.queries",
version="v1",
methods=None,
service_name="DisplayVideo",
object_name="Report",
class_prefix="Google",
package_name=resolve_package_name(service_name),
)
The most important part of the integration is api_path
which indicates the API endpoint. It could be
full path to a resource (ex. dfareporting.campaigns
) or to a single method (ex. dfareporting.campaigns.insert
).
You also have to provide valid api_version
.
If you use the path for a resource then you can specify for which methods operators should be generated
(ex. methods=['get', 'list']
). Otherwise all methods will be parsed. Additionally you can specify service_name
,
class_prefix
and object_name
. Object name is used to obtain better operators class names and it's added after method
name (ex. ServiceNameMethodOBJECTOperator, DisplayVideoGetReportOperator).
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.