Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add staging environment #3666

Merged
merged 3 commits into from
Jun 12, 2024
Merged

Add staging environment #3666

merged 3 commits into from
Jun 12, 2024

Conversation

jdangerx
Copy link
Member

Overview

Relates to catalyst-cooperative/pudl-usage-metrics#128

I wanted to set up nginx to get actual IPs from people.

But that seemed sort of risky to do on the live production app, so I wanted to get a staging environment set up so that we can actually test changes to the application setup before having them go live on production.

And also, deploying the whole damn database is quite slow, so I wanted a nice way to deploy only a small SQLite file while I'm futzing with the infrastructure. I had been commenting things out but decided it was time to make a proper CLI option.

So basically it's a yak-shave. But not a super long one.

Testing

Deployed just ferc1/ferc2 xbrls to staging:

$ python publish.py --staging --only ferc1_xbrl.sqlite --only ferc2_xbrl.sqlite 

Also ran with the full database list, but passed --build-only since staging environment isn't sized to handle all the dbs:

$ python publish.py --staging -- --build-only

And saw that the Dockerfile included the right databases:

ENV DATABASES '/data/pudl.sqlite /data/ferc1_dbf.sqlite /data/ferc1_xbrl.sqlite /data/ferc2_dbf.sqlite /data/ferc2_xbrl.sqlite /data/ferc60_dbf.sqlite /data/ferc60_xbrl.sqlite /data/ferc6_dbf.sqlite /data/ferc6_xbrl.sqlite /data/ferc714_xbrl.sqlite /data/censusdp1tract.sqlite'

So that seems fine!

To-do list

Preview Give feedback

Nobody wants to accidentally deploy to prod.
@@ -4,7 +4,8 @@ set -eux
shopt -s nullglob

find /data/ -name '*.sqlite' -delete
ls
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is helpful for debugging and I figured I'd leave it in.

mv all_dbs.tar.zst /data
zstd -f -d /data/all_dbs.tar.zst -o /data/all_dbs.tar
tar -xf /data/all_dbs.tar --directory /data
datasette serve --host 0.0.0.0 /data/pudl.sqlite /data/ferc*.sqlite /data/censusdp1tract.sqlite --cors --inspect-file inspect-data.json --metadata metadata.yml --setting sql_time_limit_ms 5000 --port $PORT
datasette serve --host 0.0.0.0 ${DATABASES} --cors --inspect-file inspect-data.json --metadata metadata.yml --setting sql_time_limit_ms 5000 --port $PORT
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lets us configure the databases & their ordering via env var in Python. Sort of janky, but 🤷

@@ -27,7 +27,8 @@

import click

from pudl.helpers import check_tables_have_metadata, create_datasette_metadata_yaml
from pudl.helpers import check_tables_have_metadata
from pudl.metadata.classes import DatasetteMetadata
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e-belfer Turns out that create_datasette_metadata_yaml function that was using this import was:

  • only used in one place
  • 3 lines long

So I decided to inline it in the one place it's used. One fewer circular import to deal with!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reference to it in test/integration/datasette_metadata_test.py (about how we're not using it there) which should get updated/removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you mean in the comment? I can remove that :)

@@ -85,11 +90,17 @@ def inspect_data(datasets: list[str], pudl_output: Path) -> str:

@click.command(context_settings={"help_option_names": ["-h", "--help"]})
@click.option(
"--fly",
"-f",
"--production",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed --fly to --production now that we have multiple fly environments.

"deploy",
flag_value="fly",
help="Deploy Datasette to fly.io.",
default=True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of the default here to make it harder to accidentally deploy to production.

@@ -27,7 +27,8 @@

import click

from pudl.helpers import check_tables_have_metadata, create_datasette_metadata_yaml
from pudl.helpers import check_tables_have_metadata
from pudl.metadata.classes import DatasetteMetadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reference to it in test/integration/datasette_metadata_test.py (about how we're not using it there) which should get updated/removed.

processes = ["app"]

[[vm]]
memory = "1gb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess 1GB is enough since it's not going to load the whole SQLite DBs into memory, but each of the DBF and XBRL derived Form 1 DBs are close to 1GB (which is kind of wild since one of them covers 27 years, and the other covers 2 years)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy enough to change if necessary, but I figured most of the staging stuff is gonna be with small subsets of the data anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slash, it totally worked for my weird little nginx journey today.

@jdangerx jdangerx enabled auto-merge June 12, 2024 20:49
@jdangerx jdangerx added this pull request to the merge queue Jun 12, 2024
Merged via the queue into main with commit 5d04952 Jun 12, 2024
12 checks passed
@jdangerx jdangerx deleted the 3664-fly-staging-environment branch June 12, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants