Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table does not show under dataset--> tables, if the table - with the same name - in the past has been dropped (via Glue) and afterwards being recreated. #992

Closed
enr0c opened this issue Jan 22, 2024 · 4 comments

Comments

@enr0c
Copy link

enr0c commented Jan 22, 2024

Describe the bug

Table does not show under dataset--> tables, if the table - with the same name - in the past has been dropped (via Glue) and afterwards being recreated.

How to Reproduce

  1. Create a table with by crawling
  2. synchronize, so the table shows up
  3. offboard it, by going to glue and drop the table
  4. Re-create the table by crawling
  5. Glue Catalog shows the table, but data.all does not show if after synchronization

Expected behavior

The new created table, with the same name as the previously deleted/offboarded table, shall show up.
Or more general:
all tables/views that are shown under a database, shall be shown in data.all

Your project

No response

Screenshots

No response

OS

Linux

Python version

3.9

AWS data.all version

good question ;)

Additional context

No response

@noah-paige
Copy link
Contributor

Hi @enr0c - thank you for opening this issue, I believe a recent PR that was merged last week should resolve the above bug you are facing.

As I understand, the cause of the issue:

  • Crawling and initial sync of tables for a glue DB works (as expected)
  • Once the glue table is deleted the table no longer shows (as expected)
    • In the backend the LastGlueTableStatus of the table is set to Deleted in data.all's DB
  • When the table is recreated - the glue tables are fetched again but not displayed on UI (error)
    • Backend logic did not appropriately update the Glue Table Status again (this is fixed in the PR linked above)

Merging the linked PR above I think would be the fix for the above issue but please do report back if that does not work or if you are still facing issues similar to the above, thanks

@enr0c
Copy link
Author

enr0c commented Jan 25, 2024

Hi @noah-paige,
many thanks for your explanation. I do confirm, the new version of data.all fixes this bug.
One question for understanding: You say the status of the table is set to Deleted in data.all's DB. What is the schema and table name in data,alls aurora database where this is stored?

And another question:
Is it expected to not display views that users create in the dataset view in data,all after synchronizing? Are there 'just' tables expected to show up? Is this a bug or a feature :)?

Feel free to close the issue!
Thanks and BR

@dlpzx
Copy link
Contributor

dlpzx commented Jan 29, 2024

Hi @enr0c, thanks for confirming, let's answer your questions :)

  • What is the schema and table name in data.all's aurora database where this is stored? --> Yes, the table name and columns are stored in RDS. The table information is stored in an RDS table called dataset_table and the columns are stored in dataset_table_column
  • Is it expected to not display views that users create in the dataset view in data,all after synchronizing? --> When the tables in the glue catalog are synchronized to data.all catalog, there is a filter to ensure that the table S3 location belongs to the dataset S3 Bucket. This filter ensures that the S3 location where the data is stored is the one that data.all has registered in Lake Formation. But it has its drawbacks, tables whose data is stored in a different S3 bucket and views are not synced either (see screenshot below) because we cannot ensure that the data stored is stored in the dataset S3 Bucket. We can work together on overcoming this issue, but yes, it is not shown by design.
image

If you wish to continue the discussion the best would be to open a dedicated issue :)

@dlpzx
Copy link
Contributor

dlpzx commented Feb 14, 2024

Hi @enr0c, the discussion on views not appearing in the catalog is addressed in #1028. We will close this issue and continue the discussion there.

@dlpzx dlpzx closed this as completed Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants