-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data.all in local environment starts in a broken db state #788
Comments
Hi @zsaltys, thanks for opening the issue. I am trying to reproduce the problem. |
… issues (#860) ### Feature or Bugfix - Feature - Bugfix ### Detail - Add all applicable GitHub workflows to PRs pointing at `v2m*` branches - fix semgrep finding issue from GitHub workflows from migration script for notifications type --> added `nosemgrep` as no user input is passed to the SQL query and only code administrators will have access to the query. - fix migration validation: this one is tricky as it succeeds when running it locally and on a real pipeline. It turns out that the issue was not on the migration script itself but on the way we dropped and updated tables in the validation migration stage. For dropping tables, we were using a different schema that the one used in upgrade database. This PR removes the schema_name variable and uses the envname as schema for all cases. One final note, this issue might be related to #788. Here some screenshots of the resulting local schema for the notification table after running `make drop-tables` and `make upgrade-db` <img width="962" alt="image" src="https://github.com/awslabs/aws-dataall/assets/71252798/0d020d7b-915c-436f-a767-8290d0ac3480"> ### Relates - V2.1 release ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
I am picking up this issue again for with the current version of the code (2.2) |
1. Start local dockerized application [Needs-more-info]
This means that
@zsaltys I do not need to run alembic when starting the docker containers. There might be mismatches in comparison with the alembic migration tables but I cannot see that any data.all functionalities are affected. 2. Drop and rerun all migrations [Investigation in progress]In the local environment it is not needed to work with alembic. I would only run alembic locally to test migration scripts that need to be added directly with alembic commands. Similar to the scripts that are executed in the CodeBuild test migrations and in the actual db-migration stage. Having said that, I think we need to do an exercise of clean-up and make sure that the local development starts off with all the tables and permissions needed. <style> </style>
3. More documentations [In separate PR]https://awslabs.github.io/aws-dataall/deploy-locally/ should be updated to include explaining the alembic migrations and how to run them to update to the latest state and how to add new ones. --> fully agree. We can tackle this in a separate PR to the github pages 4. Tests [Done]It is already implemented in the db-migration tests in the quality gate: |
Updates from offline discussionAction points for this issue:
|
### Feature or Bugfix - Bugfix - Refactoring ### Detail 1. Alembic migrations are now synchronised with the current code-state 2. `make generate-migrations` command is added. It can be applied right after the docker container with psql is started 3. Readme about migrations ### Relates [data.all in local environment starts in a broken db state #788](#788) ### Security N/A By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: Sofia Sazonova <[email protected]>
Handled the above fix in #1033 - closing this issue, please re-open if any additional comments / concerns |
Describe the bug
With the current 2.0 branch of data.all if we start data.all in a local dev environment then it will start with HALF the tables missing. This is the first issue and requires running alembic migrations to fix.
Upon running alembic migrations we find that they don't run with the error:
coming from:
We can then attempt to drop all the tables with:
python backend/migrations/drop_tables.py
And run the migrations again but then data.all won't start because permission table will be missing permissions such as MANAGE_GLOSSARY, MANAGE_DATASETS
How to Reproduce
Start data.all in dev environment from scratch and attempt to upgrade the local database with latest migrations
Expected behavior
Your project
No response
Screenshots
No response
OS
NA
Python version
NA
AWS data.all version
2.0
Additional context
No response
The text was updated successfully, but these errors were encountered: