- The
DummyOperator
implementation ofstart_operator
was replaced with aPostgresOperator
. The latter callscreate_tables.sql
, which was moved to the same folder asudac_example_dag.py
and hence gets called with every execution of the DAG. - The
CREATE
statements increate_tables.sql
were expanded withIF NOT EXISTS
in order to be callable repeatedly without conflicts. - The
LoadDimensionOperator
was implemented with a flagappend=False
and aprimary_key=""
parameter:- if
append=False
, the original table is deleted and the entire data will be replaced with the new data - if
append=True
, only the rows from the original table with duplicate primary keys will be deleted. This roughly corresponds to anON CONFLICT DO UPDATE
call (which is not available in the Postgres version that Redshift is using).
- if
- The data quality operator is used to check if there are any rows with null value of
artistid
in theartists
table. This is an exemplary check and many other checks might be performed here as well.
The following solution was used as guidance: