-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIP-82 Save references between assets and triggers #43826
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,7 @@ | |
) | ||
from airflow.models.dag import DAG, DagModel, DagOwnerAttributes, DagTag | ||
from airflow.models.dagrun import DagRun | ||
from airflow.models.trigger import Trigger | ||
from airflow.utils.sqlalchemy import with_row_locks | ||
from airflow.utils.timezone import utcnow | ||
from airflow.utils.types import DagRunType | ||
|
@@ -55,6 +56,7 @@ | |
from sqlalchemy.orm import Session | ||
from sqlalchemy.sql import Select | ||
|
||
from airflow.triggers.base import BaseTrigger | ||
from airflow.typing_compat import Self | ||
|
||
log = logging.getLogger(__name__) | ||
|
@@ -425,3 +427,75 @@ def add_task_asset_references( | |
for task_id, asset_id in referenced_outlets | ||
if (task_id, asset_id) not in orm_refs | ||
) | ||
|
||
def add_asset_trigger_references( | ||
self, dags: dict[str, DagModel], assets: dict[tuple[str, str], AssetModel], *, session: Session | ||
) -> None: | ||
# Update references from assets being used | ||
refs_to_add: dict[tuple[str, str], set[str]] = {} | ||
refs_to_remove: dict[tuple[str, str], set[str]] = {} | ||
triggers: dict[str, BaseTrigger] = {} | ||
for name_uri, asset in self.assets.items(): | ||
asset_model = assets[name_uri] | ||
trigger_class_path_to_trigger_dict: dict[str, BaseTrigger] = { | ||
trigger.serialize()[0]: trigger for trigger in asset.watchers | ||
} | ||
triggers.update(trigger_class_path_to_trigger_dict) | ||
|
||
trigger_class_paths_from_asset: set[str] = set(trigger_class_path_to_trigger_dict.keys()) | ||
trigger_class_paths_from_asset_model: set[str] = { | ||
trigger.classpath for trigger in asset_model.triggers | ||
} | ||
|
||
# Optimization: no diff between the DB and DAG definitions, no update needed | ||
if trigger_class_paths_from_asset == trigger_class_paths_from_asset_model: | ||
continue | ||
|
||
diff_to_add = trigger_class_paths_from_asset - trigger_class_paths_from_asset_model | ||
diff_to_remove = trigger_class_paths_from_asset_model - trigger_class_paths_from_asset | ||
if diff_to_add: | ||
refs_to_add[name_uri] = diff_to_add | ||
if diff_to_remove: | ||
refs_to_remove[name_uri] = diff_to_remove | ||
|
||
if refs_to_add: | ||
all_classpaths = {classpath for classpaths in refs_to_add.values() for classpath in classpaths} | ||
orm_triggers: dict[str, Trigger] = { | ||
trigger.classpath: trigger | ||
for trigger in session.scalars(select(Trigger).where(Trigger.classpath.in_(all_classpaths))) | ||
} | ||
|
||
# Create new triggers | ||
new_trigger_models = [ | ||
trigger | ||
for trigger in [ | ||
Trigger.from_object(triggers[classpath]) | ||
for classpath in all_classpaths | ||
if classpath not in orm_triggers | ||
] | ||
] | ||
session.add_all(new_trigger_models) | ||
orm_triggers.update((trigger.classpath, trigger) for trigger in new_trigger_models) | ||
|
||
# Add new references | ||
for name_uri, classpaths in refs_to_add.items(): | ||
asset_model = assets[name_uri] | ||
asset_model.triggers.extend( | ||
[orm_triggers.get(trigger_class_path) for trigger_class_path in classpaths] | ||
) | ||
|
||
if refs_to_remove: | ||
# Remove old references | ||
for name_uri, classpaths in refs_to_remove.items(): | ||
asset_model = assets[name_uri] | ||
asset_model.triggers = [ | ||
trigger for trigger in asset_model.triggers if trigger.classpath not in classpaths | ||
] | ||
|
||
# Remove references from assets no longer used | ||
orphan_assets = session.scalars( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @uranusjr do we need to check AssetActive here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An asset without an AssetActive entry is not referenced anywhere, and trigerring an event to such an asset will therefore simply do nothing. So not checking AssetActive here is not useful in practice, but maybe theoratically a possibility? It depends on what we want the user to be able to do, I guess. @vincbeck Do you think a user should be able to trigger an event on an asset that does not actually exist in any DAGs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting, I did not know that notion of
Absolutely not, that's what I am doing (or trying to do) here but even further. |
||
select(AssetModel).filter(~AssetModel.consuming_dags.any()).filter(AssetModel.triggers.any()) | ||
) | ||
for asset_model in orphan_assets: | ||
if (asset_model.name, asset_model.uri) not in self.assets: | ||
asset_model.triggers = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if watcher is a good name for this. What do we expect this to do? If I understand AIP-82 correctly, an external event would fire the trigger, and the trigger would create events for assets associated to it.
Assuming my understanding is correct, the triggers here are not watchers of the asset; rather, the asset watches the triggers. The relationship is the other way around. So it is probably better to call this
watch
instead? Or maybe this attribute should live on the trigger instead, something likeTell me what you think on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming ... so hard haha. I see your point.
The reason why I called it
watchers
is because the triggers willwatch
some external resource and send event on updates. In that sense, to me, the triggers are watchers. I am not strongly againwatch
if you think it makes more sense. To be very honest, betweenwatchers
andwatch
I dont mind, I think the both of them makes sense.However, I definitely want the attribute on the asset class, I think it makes more sense and a more deliberate choice for the user to say, I have this asset and I want this asset to be updated when these triggers fire.