Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.all in local environment starts in a broken db state #788

Closed
zsaltys opened this issue Oct 12, 2023 · 7 comments · Fixed by #1033
Closed

data.all in local environment starts in a broken db state #788

zsaltys opened this issue Oct 12, 2023 · 7 comments · Fixed by #1033
Assignees
Labels
effort: medium priority: medium status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up type: bug Something isn't working
Milestone

Comments

@zsaltys
Copy link
Contributor

zsaltys commented Oct 12, 2023

Describe the bug

With the current 2.0 branch of data.all if we start data.all in a local dev environment then it will start with HALF the tables missing. This is the first issue and requires running alembic migrations to fix.

Upon running alembic migrations we find that they don't run with the error:

sqlalchemy.exc.ProgrammingError: (pg.ProgrammingError) ERROR: column "privateSubnetIds" of relation "vpc" already exists

coming from:

INFO [alembic.runtime.migration] Running upgrade bd271a2780b2 -> 5d5102986ce5, Add subnet ids columns

We can then attempt to drop all the tables with:

python backend/migrations/drop_tables.py

And run the migrations again but then data.all won't start because permission table will be missing permissions such as MANAGE_GLOSSARY, MANAGE_DATASETS

How to Reproduce

Start data.all in dev environment from scratch and attempt to upgrade the local database with latest migrations

Expected behavior

  1. 2.0 branch local dev environment should start with all the tables installed
  2. If I drop all the tables and attempt to rerun all the migrations, this should work and data.all should start without issues
  3. https://awslabs.github.io/aws-dataall/deploy-locally/ should be updated to include explaining the alembic migrations and how to run them to update to the latest state and how to add new ones.
  4. Ideally there should be tests that test that data.all can start with migrations rebuilt from scratch

Your project

No response

Screenshots

No response

OS

NA

Python version

NA

AWS data.all version

2.0

Additional context

No response

@anmolsgandhi anmolsgandhi added type: bug Something isn't working status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up labels Oct 12, 2023
@dlpzx
Copy link
Contributor

dlpzx commented Oct 16, 2023

Hi @zsaltys, thanks for opening the issue. I am trying to reproduce the problem.

@dlpzx dlpzx self-assigned this Oct 16, 2023
@dlpzx
Copy link
Contributor

dlpzx commented Oct 16, 2023

Test 1: build docker images from scratch

After cleaning up all images and everything from Docker locally I recreated the db image.

As a result I obtain 39 tables in the schema dkrcompose:
image

If I now go to an actual deployment of data.all for the current main branch and check the amount of tables, I see that the aurora database contains 49 tables:

image
image
image

The missing tables in the local deployment are:

  1. alembic_version
  2. dataset_quality_rule
  3. dataset_table_profiling_job
  4. group_member
  5. item_tags
  6. redshiftcluster
  7. redshiftcluster_dataset
  8. redshiftcluster_datasettable
  9. tag
  10. user

I am not sure if you are referring to these tables. Most of them are deprecated and we could clean them up with an alembic migration script but they should not be affecting your local deployment

@dlpzx
Copy link
Contributor

dlpzx commented Oct 16, 2023

Test 2: drop tables and re-create them using alembic locally

Using the commands provided in the makefile I am dropping the local tables locally and re-creating them. I can successfully re-create 49 tables.
image

Issue found! 🐛
Checking the permissions created in the new schema we can compare it with a real deployment and we see that there are permissions missing.
image

In the AWS deployment:
image

@anmolsgandhi anmolsgandhi added this to the v2.2.0 milestone Oct 16, 2023
dlpzx added a commit that referenced this issue Nov 7, 2023
… issues (#860)

### Feature or Bugfix
- Feature
- Bugfix

### Detail
- Add all applicable GitHub workflows to PRs pointing at `v2m*` branches
- fix semgrep finding issue from GitHub workflows from migration script
for notifications type --> added `nosemgrep` as no user input is passed
to the SQL query and only code administrators will have access to the
query.
- fix migration validation: this one is tricky as it succeeds when
running it locally and on a real pipeline. It turns out that the issue
was not on the migration script itself but on the way we dropped and
updated tables in the validation migration stage. For dropping tables,
we were using a different schema that the one used in upgrade database.
This PR removes the schema_name variable and uses the envname as schema
for all cases. One final note, this issue might be related to #788.

Here some screenshots of the resulting local schema for the notification
table after running `make drop-tables` and `make upgrade-db`
<img width="962" alt="image"
src="https://github.com/awslabs/aws-dataall/assets/71252798/0d020d7b-915c-436f-a767-8290d0ac3480">


### Relates
- V2.1 release

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
@anmolsgandhi anmolsgandhi modified the milestones: v2.2.0, v2.3.0 Dec 12, 2023
@dlpzx
Copy link
Contributor

dlpzx commented Jan 17, 2024

I am picking up this issue again for with the current version of the code (2.2)

@dlpzx
Copy link
Contributor

dlpzx commented Jan 17, 2024

1. Start local dockerized application [Needs-more-info]

  • When we run docker-compose up the postgres container is created with no tables or schemas
  • in the GraphQL container, local_graphql_server.pyuses sqlalchemy declarative_base to create all tables with this function: Base.metadata.create_all(engine.engine)

This means that

  • alembic scripts are not executed
  • the number of tables depends on the modules that are enabled

@zsaltys I do not need to run alembic when starting the docker containers. There might be mismatches in comparison with the alembic migration tables but I cannot see that any data.all functionalities are affected.
Can you show me the tables that are missing?

2. Drop and rerun all migrations [Investigation in progress]

In the local environment it is not needed to work with alembic. I would only run alembic locally to test migration scripts that need to be added directly with alembic commands. Similar to the scripts that are executed in the CodeBuild test migrations and in the actual db-migration stage.

Having said that, I think we need to do an exercise of clean-up and make sure that the local development starts off with all the tables and permissions needed.

<style> </style>
Tables In RDS In dkrcompose
activity X X
alembic_version X  
consumptionrole X X
dashboard X X
dashboardshare X X
datapipeline X X
datapipelineenvironments X X
dataset X X
dataset_bucket X X
dataset_profiling_run X X
dataset_quality_rule X  
dataset_storage_location X X
dataset_table X X
dataset_table_column X X
dataset_table_profiling_job X  
environment X X
environment_group_permission X X
environment_parameters X X
feed_message X X
glossary_node X X
group X X
group_member X  
item_tags X  
keyvaluetag X X
notification X X
organization X X
organization_group X X
permission X X
redshiftcluster X  
redshiftcluster_dataset X  
redshiftcluster_datasettable X  
resource_policy X X
resource_policy_permission X X
sagemaker_notebook X X
sagemaker_studio_domain X X
sagemaker_studio_user_profile X X
share_object X X
share_object_item X X
stack X X
tag X  
task X X
tenant X X
tenant_policy X X
tenant_policy_permission X X
term_link X X
user X  
vote X X
vpc X X
worksheet X X
worksheet_query_result X X

3. More documentations [In separate PR]

https://awslabs.github.io/aws-dataall/deploy-locally/ should be updated to include explaining the alembic migrations and how to run them to update to the latest state and how to add new ones. --> fully agree. We can tackle this in a separate PR to the github pages

4. Tests [Done]

It is already implemented in the db-migration tests in the quality gate: deploy/stacks/pipeline.py:464

@dlpzx
Copy link
Contributor

dlpzx commented Jan 29, 2024

Updates from offline discussion

Action points for this issue:

  • Design the best way to develop migration scripts locally
  • Update documentation (README or GitHub pages) and add a piece on how to develop migration scripts locally
  • Clean-up RDS tables that are no longer needed

@dlpzx dlpzx removed their assignment Jan 29, 2024
dlpzx pushed a commit that referenced this issue Feb 23, 2024
### Feature or Bugfix
- Bugfix
- Refactoring

### Detail
1. Alembic migrations are now synchronised with the current code-state
2. `make generate-migrations` command is added. It can be applied right
after the docker container with psql is started
3. Readme about migrations

### Relates
[data.all in local environment starts in a broken db state
#788](#788)

### Security
N/A


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.

---------

Co-authored-by: Sofia Sazonova <[email protected]>
@noah-paige noah-paige linked a pull request Feb 23, 2024 that will close this issue
@noah-paige
Copy link
Contributor

Handled the above fix in #1033 - closing this issue, please re-open if any additional comments / concerns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort: medium priority: medium status: not-picked-yet At the moment we have not picked this item. Anyone can pick it up type: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants