Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the case of temp table, the redshift Lineage cannot be obtained accurately. #4895

Closed
yingyingqiqi opened this issue May 11, 2022 · 3 comments
Assignees
Labels
accepted An Issue that is confirmed as a bug by the DataHub Maintainers. bug Bug report ingestion PR or Issue related to the ingestion of metadata

Comments

@yingyingqiqi
Copy link

Describe the bug
In the case of temp table, the redshift Lineage cannot be obtained accurately.

To Reproduce

DROP TABLE IF EXISTS tmp_table;
create temp table tmp_table as
select
a,
b
from tableAA

insert into tableBB select * from tmp_table

tableBB Lineage,not exist tmp_table、tableAA

Expected behavior
tableBB Lineage,exist tmp_table、tableAA

@yingyingqiqi yingyingqiqi added the bug Bug report label May 11, 2022
@yingyingqiqi
Copy link
Author

stl_scan_based_lineage_query: str = """
select
distinct cluster,
target_schema,
target_table,
username,
source_schema,
source_table
from
(
select
distinct tbl as target_table_id,
sti.schema as target_schema,
sti.table as target_table,
sti.database as cluster,
query,
starttime
from
stl_insert
join SVV_TABLE_INFO sti on
sti.table_id = tbl
where starttime >= '{start_time}'
and starttime < '{end_time}'
and cluster = '{db_name}'
) as target_tables
join ( (
select
pu.usename::varchar(40) as username,
ss.tbl as source_table_id,
sti.schema as source_schema,
sti.table as source_table,
scan_type,
sq.query as query
from
(
select
distinct userid,
query,
tbl,
type as scan_type
from
stl_scan
) ss
join SVV_TABLE_INFO sti on
sti.table_id = ss.tbl
left join pg_user pu on
pu.usesysid = ss.userid
left join stl_query sq on
ss.query = sq.query
where
pu.usename <> 'rdsdb')
) as source_tables
using (query)
where
scan_type in (1, 2, 3)
order by cluster, target_schema, target_table, starttime asc

The possible reason is that SVV_TABLE_INFO Do not permanently save temporary table information,Only including temporary tables created by a user for the current session. I can't find all the places where the temporary tables are redshift stored, and I may need to parse the consanguinity through ddl.

@maggiehays maggiehays added the ingestion PR or Issue related to the ingestion of metadata label Jul 8, 2022
@treff7es
Copy link
Contributor

Currently, we drop tables from a lineage that does not exist anymore.
Possible solution can be to resolve those connections where A -> TempB -> C to A -> C if TempB does not exists anymore.

@treff7es treff7es added the accepted An Issue that is confirmed as a bug by the DataHub Maintainers. label Sep 16, 2022
@hsheth2
Copy link
Collaborator

hsheth2 commented Feb 12, 2024

This was fixed by #9704

@hsheth2 hsheth2 closed this as completed Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted An Issue that is confirmed as a bug by the DataHub Maintainers. bug Bug report ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

No branches or pull requests

4 participants