Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/redshift): collapse lineage to permanent table #9704

Merged
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
1f3cf18
table lineage
sid-acryl Jan 16, 2024
f40abf3
Merge branch 'master' into master+ing-474-redshift-temp-tables-lineage
sid-acryl Jan 16, 2024
fb7f206
lint fix
sid-acryl Jan 16, 2024
c074975
remove unused function
sid-acryl Jan 16, 2024
fd5c28e
fix a issue
sid-acryl Jan 16, 2024
82f2099
fix a bug
sid-acryl Jan 16, 2024
c020982
concat sql
sid-acryl Jan 17, 2024
d149a00
Merge branch 'master' into master+ing-473-redshift-temp-tables-lineage
sid-acryl Jan 17, 2024
19bfa2e
address review comments
sid-acryl Jan 17, 2024
8a8e021
add support for alter tables
hsheth2 Jan 18, 2024
14cf5e7
handle large sql text
sid-acryl Jan 18, 2024
fa92f78
Merge branch 'master+ing-473-redshift-temp-tables-lineage' of github.…
sid-acryl Jan 18, 2024
ac54630
fixes
hsheth2 Jan 18, 2024
c9707f9
undo tweak
hsheth2 Jan 18, 2024
e5024a5
address review comments
sid-acryl Jan 18, 2024
bfbcd86
Merge branch 'master+ing-473-redshift-temp-tables-lineage' of github.…
sid-acryl Jan 18, 2024
9f4eb1c
Adding debug log to track temp table processing
treff7es Jan 18, 2024
fa5e3c2
Filtering out duplicate temp table creation queries
treff7es Jan 18, 2024
af80bdc
report better errors on parse failure
hsheth2 Jan 18, 2024
1fbc6d3
address review comments
sid-acryl Jan 19, 2024
b1110c2
Merge branch 'master+ing-473-redshift-temp-tables-lineage' of github.…
sid-acryl Jan 19, 2024
6e97ef8
Merge branch 'master' into master+ing-473-redshift-temp-tables-lineage
sid-acryl Jan 19, 2024
c730a55
Capturing temp table ddls without truncation and keeping line breaks.
treff7es Jan 19, 2024
fbdca9f
Restructure query to not hit agglist limit
treff7es Jan 19, 2024
f8296c2
refactor alter tables
hsheth2 Jan 19, 2024
4093ff2
Fixing query
treff7es Jan 19, 2024
536e0a3
use raw query string + filter on seq < 320
hsheth2 Jan 19, 2024
e4d3297
Fixing queries
treff7es Jan 19, 2024
a48e77b
fix newlines
hsheth2 Jan 19, 2024
ecfc561
remove extra condition
hsheth2 Jan 20, 2024
e08c83e
test case for collapse lineage
sid-acryl Jan 22, 2024
98a65d5
collapse cll
sid-acryl Jan 24, 2024
3d566ab
Merge branch 'master' into master+ing-473-redshift-collapse-cll
sid-acryl Jan 24, 2024
280d7b2
review comments
sid-acryl Jan 24, 2024
38c606d
Merge branch 'master+ing-473-redshift-collapse-cll' of github.com:sid…
sid-acryl Jan 24, 2024
a053682
lint fix
sid-acryl Jan 24, 2024
92af918
lint fix
sid-acryl Jan 24, 2024
af9a3f1
cll test case
sid-acryl Jan 24, 2024
13a4fee
add non check
sid-acryl Jan 25, 2024
bf53dcc
remove unwanted comment
sid-acryl Jan 25, 2024
1403284
Merge branch 'master' into master+ing-473-redshift-collapse-cll
sid-acryl Jan 25, 2024
ff0e359
Adding recursive cll resolution on temp tables
treff7es Jan 26, 2024
c235c4e
Merge branch 'master' into master+ing-473-redshift-collapse-cll
treff7es Jan 26, 2024
21e89b7
Fix merge issues
treff7es Jan 26, 2024
b498135
fix linter issues
treff7es Jan 26, 2024
3f31365
Remove unused import
treff7es Jan 26, 2024
e736e85
isort
treff7es Jan 26, 2024
f30baa0
remove unused import
treff7es Jan 26, 2024
0cfa95f
Adding some extra debug message
treff7es Jan 29, 2024
c150f29
Adding more debug line
treff7es Jan 29, 2024
6f2cf70
Merge branch 'master' into master+ing-473-redshift-collapse-cll
treff7es Jan 29, 2024
d6d17ec
Black formatting
treff7es Jan 29, 2024
249e0ed
Getting every create ddl and treating as temp table
treff7es Jan 29, 2024
c098c78
fixing query
treff7es Jan 29, 2024
a012ede
Fixing prefix
treff7es Jan 29, 2024
9a742ff
Modifying create table regexp to work with non temp tables
treff7es Jan 30, 2024
60747d6
Adding filter to our own query
treff7es Jan 30, 2024
9ee7b48
Merge branch 'master' into master+ing-473-redshift-collapse-cll
sid-acryl Jan 31, 2024
a3e9b0d
Mock the redshift connection object for test-case test_collapse_temp_…
sid-acryl Jan 31, 2024
886a513
Merge branch 'master' into master+ing-473-redshift-collapse-cll
sid-acryl Jan 31, 2024
5cacfd8
fix test
hsheth2 Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions metadata-ingestion/src/datahub/ingestion/source/redshift/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,10 @@ class RedshiftConfig(
description="The default schema to use if the sql parser fails to parse the schema with `sql_based` lineage collector",
)

include_table_lineage: Optional[bool] = Field(
include_table_lineage: bool = Field(
default=True, description="Whether table lineage should be ingested."
)
include_copy_lineage: Optional[bool] = Field(
include_copy_lineage: bool = Field(
default=True,
description="Whether lineage should be collected from copy commands",
)
Expand All @@ -107,17 +107,15 @@ class RedshiftConfig(
description="Generate usage statistic. email_domain config parameter needs to be set if enabled",
)

include_unload_lineage: Optional[bool] = Field(
include_unload_lineage: bool = Field(
default=True,
description="Whether lineage should be collected from unload commands",
)

capture_lineage_query_parser_failures: Optional[bool] = Field(
hide_from_schema=True,
include_table_rename_lineage: bool = Field(
default=False,
description="Whether to capture lineage query parser errors with dataset properties for debugging",
description="Whether we should follow `alter table ... rename to` statements when computing lineage. ",
)

table_lineage_mode: Optional[LineageMode] = Field(
default=LineageMode.STL_SCAN_BASED,
description="Which table lineage collector mode to use. Available modes are: [stl_scan_based, sql_based, mixed]",
Expand All @@ -139,6 +137,11 @@ class RedshiftConfig(
description="When enabled, emits lineage as incremental to existing lineage already in DataHub. When disabled, re-states lineage on each run. This config works with rest-sink only.",
)

resolve_temp_table_in_lineage: bool = Field(
default=False,
description="Whether to resolve temp table appear in lineage to upstream permanent tables.",
)

@root_validator(pre=True)
def check_email_is_set_on_usage(cls, values):
if values.get("include_usage_statistics"):
Expand Down
Loading
Loading