-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dgen docker and packer support #54
base: master
Are you sure you want to change the base?
Changes from all commits
5cd981c
a9bb8fe
75c39bb
e17b1bc
4219b3a
db74d98
c0f7b73
85e9631
3772474
1622180
b6df9cd
20f8eff
05b1c6d
3018c09
d2386ef
f3399f1
ac26e8f
bc6e07a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,144 @@ | ||||||
# dGen Docker Usage Guide | ||||||
|
||||||
By default, the dgen container uses the [Deleware residential dataset](https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/dgen_db.sql). | ||||||
|
||||||
You can customize the dataset, see the [Customizing the Dataset](#customizing-the-dataset) section below. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Overall clarity here. But wondering your thoughts on changing the section name to "Customizing the Model" instead of "Dataset" |
||||||
|
||||||
### Mac and Linux quick start | ||||||
|
||||||
This quickstart uses docker-compose to run dgen. The default path to store dgen data files and excel configurations is ~/dgen_data/. This path is shared with your running containers, you can change this path but you will need to edit the `docker-compose.yml` to reflect the data directory of your choice. | ||||||
|
||||||
Prerequisites assume you are using a Mac and you already installed [Docker Desktop](https://docs.docker.com/desktop/setup/install/mac-install/) | ||||||
|
||||||
##### Create the data directory | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
```bash | ||||||
$ mkdir -p ~/dgen_data/ | ||||||
$ chmod 755 ~/dgen_data/ | ||||||
$ ls -l ~/dgen_data/ # Its expected to be empty, after starting dgen you will see data files in this location. | ||||||
``` | ||||||
|
||||||
##### Startup the dgen containers | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
``` bash | ||||||
$ cd dgen/docker/ | ||||||
$ docker-compose up --build -d | ||||||
[+] Running 2/2 | ||||||
✔ Container dgen_1 Started 0.1s | ||||||
✔ Container postgis_1 Started 0.0s | ||||||
``` | ||||||
|
||||||
##### Connect to the running containers | ||||||
``` bash | ||||||
$ docker attach dgen_1 # Attach to dgen environment container | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Would it be better to remove the word "environment" from the comment to avoid any confusion with the Conda environment? Fine with whichever you think is better! |
||||||
$ docker attach $(sudo docker ps --filter "name=dgen" --format "{{.ID}}") # If dgen_1 is not found | ||||||
(dg3n) dgen@cc6e2e5f70b5:/opt/dgen_os/python$ python dgen_model.py # Run scenario | ||||||
(dg3n) dgen@cc6e2e5f70b5:/opt/dgen_os/python$ exit # to exit | ||||||
$ docker-compose up -d # If you exit, you have to re-up the container if you want to re-attach | ||||||
``` | ||||||
|
||||||
### Troubleshooting common issues | ||||||
|
||||||
#### psycopg2.OperationalError: connection to server | ||||||
|
||||||
Wait 5-10 minutes for the postgres database to finish starting. | ||||||
|
||||||
#### General errors and issues | ||||||
|
||||||
Try clearing your `~/dgen_data` and starting over. Make sure to provide time for the datasets to fully download on the re-attempt. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```bash | ||||||
$ docker-compose down | ||||||
$ rm -f ~/dgen_data/* | ||||||
$ docker system prune -a | ||||||
$ docker volume prune -f | ||||||
``` | ||||||
|
||||||
### Disabling auto-start for the dgen virtual environment | ||||||
|
||||||
By default, logging into the `dgen` container automatically activates the `dgen` virtual environment. For ease of use, its recommended to leave this the default. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Additionally, would switching "logging into" to "starting" make more sense here? |
||||||
|
||||||
To disable this behavior, edit the `docker-compose.yml` file in this directory and set the following environment variable: | ||||||
|
||||||
```yaml | ||||||
services: | ||||||
dgen: | ||||||
environment: | ||||||
DGEN_DISABLE_AUTO_START: 1 | ||||||
``` | ||||||
|
||||||
### Customizing the dataset | ||||||
|
||||||
By default, the dgen container uses the [Deleware residential dataset](https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/dgen_db.sql). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
You can find more datasets using the links below: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- [DGen Dataset Submissions on OpenEI](https://data.openei.org/submissions/1931) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
- [DGen Dataset S3 Viewer](https://data.openei.org/s3_viewer?bucket=oedi-data-lake&prefix=dgen%2F) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
You can customize the dataset used by overriding the DGEN_DATAFILE_URL and DGEN_AGENTFILE_URL variables in `docker-compose.yml` and then editing `~/dgen_data/input_sheet_final.xlsm` using Excel. | ||||||
|
||||||
Below will walk through the process of using the [Colorado residential dataset](https://oedi-data-lake.s3.amazonaws.com/dgen/co_final_db/dgen_db.sql). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Switched the hyperlink to just the landing page of the CO files so the added instructions below will make more sense to the users |
||||||
|
||||||
Update the `docker-compose.yml` to use the co_final_db sql download and set the variable to force remove the database. This will result in dataloss from previous runs, if this is a concern, please make backups before proceeding with the below steps. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```yaml | ||||||
services: | ||||||
postgis: | ||||||
environment: | ||||||
DGEN_DATAFILE_URL: https://oedi-data-lake.s3.amazonaws.com/dgen/co_final_db/dgen_db.sql | ||||||
DGEN_AGENTFILE_URL: https://oedi-data-lake.s3.amazonaws.com/dgen/co_final_db/agent_df_base_res_co_revised.pkl | ||||||
DGEN_FORCE_DELETE_DATABASE: 1 # Clear all the data in the database to reload the Colorado dataset, Warning this will remove your existing data. | ||||||
``` | ||||||
|
||||||
Edit the excel document `~/dgen_data/input_sheet_final.xlsm` using Excel (Enable macros), edit the Region to Analyize to `Colorado` and Markets to `Only Residential`, then click Save Scenario. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Spelling and clarity |
||||||
|
||||||
Restart your containers with the above options. This will remove all your existing data and download the new Colorado dataset. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```bash | ||||||
$ docker-compose down | ||||||
[+] Running 3/3 | ||||||
✔ Container dgen_1 Removed 9.2s | ||||||
✔ Container postgis_1 Removed 0.1s | ||||||
|
||||||
$ docker-compose up -d | ||||||
[+] Running 2/2 | ||||||
✔ Container dgen_1 Started 0.1s | ||||||
✔ Container postgis_1 Started 0.2s | ||||||
``` | ||||||
|
||||||
After you load the new Colorado dataset, remove the DGEN_FORCE_DELETE_DATABASE option to prevent future accidental data loss. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```yaml | ||||||
services: | ||||||
postgis: | ||||||
environment: | ||||||
DGEN_FORCE_DELETE_DATABASE: 0 | ||||||
``` | ||||||
|
||||||
You can now attach to the dgen container and monitor the data download. This may take 5-10 minutes depending on your internet speed, if the file size is increasing its still downloading. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
```bash | ||||||
$ docker attach dgen_1 | ||||||
(dg3n) dgen@cc6e2e5f70b5:/opt/dgen_os/python$ ls -lh /data/dgen_db.sql | ||||||
-rw-r--r-- 1 dgen dgen 705M Jan 29 2025 /data/dgen_db.sql | ||||||
(dg3n) dgen@cc6e2e5f70b5:/opt/dgen_os/python$ python dgen_model.py # Run scenario | ||||||
(dg3n) dgen@cc6e2e5f70b5:/opt/dgen_os/python$ exit # to exit | ||||||
``` | ||||||
|
||||||
### Stop running containers | ||||||
```bash | ||||||
$ docker ps -a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this block, would it be better/possible to separate the commands from the returned statements? |
||||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||||||
259c30e6b518 docker-postgis "docker-entrypoint.s…" 12 minutes ago Up 12 minutes 0.0.0.0:5432->5432/tcp postgis_1 | ||||||
a775696276eb docker-dgen "bash --login" 12 minutes ago Up 4 seconds dgen_1 | ||||||
|
||||||
$ docker-compose down | ||||||
[+] Running 3/3 | ||||||
✔ Container dgen_1 Removed 10.1s | ||||||
✔ Container postgis_1 Removed 0.1s | ||||||
``` | ||||||
|
||||||
### Warning: This will remove old running containers and data volumes. This may be required if you need space. | ||||||
|
||||||
```bash | ||||||
$ docker system prune -a | ||||||
$ docker volume prune -f | ||||||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
FROM continuumio/miniconda3 | ||
|
||
# Setup dgen user | ||
RUN groupadd --gid 999 dgen && useradd --uid 999 --gid dgen --create-home dgen | ||
|
||
# Setup Data directory | ||
RUN mkdir -p /data && chmod 755 /data | ||
|
||
# Copy dgen files and setup permissions | ||
COPY ./dgen_os/ /opt/dgen_os/ | ||
RUN chown -R dgen: /opt/dgen_os /data | ||
|
||
# Install dgen | ||
RUN conda env create -f /opt/dgen_os/python/dg3n.yml | ||
|
||
# Setup Init script | ||
COPY docker/dgen/init.sh /docker-entrypoint-initdb.d/init-dgen.sh | ||
RUN chmod +x /docker-entrypoint-initdb.d/init-dgen.sh | ||
|
||
# Initialize Conda in the Docker environment | ||
RUN cat <<EOF >> ~dgen/.bashrc | ||
if [[ -z \${DGEN_DISABLE_AUTO_START} ]] || [[ ${DGEN_DISABLE_AUTO_START:-0} -eq 0 ]]; then | ||
conda activate dg3n | ||
cd /opt/dgen_os/python/ | ||
/docker-entrypoint-initdb.d/init-dgen.sh | ||
fi | ||
EOF | ||
|
||
# Change ownership of the bashrc file | ||
RUN chown dgen: ~dgen/.bashrc | ||
|
||
# Setup default input_sheet_final.xlsm (Deleware residential) | ||
COPY docker/dgen/input_sheet_final.xlsm /opt/dgen_os/excel/input_sheet_final.xlsm | ||
RUN chmod 755 /opt/dgen_os/excel/input_sheet_final.xlsm && chown dgen: /opt/dgen_os/excel/input_sheet_final.xlsm | ||
|
||
# Switch to non-root user | ||
USER dgen | ||
|
||
CMD ["bash", "--login"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/bin/bash | ||
set -e | ||
|
||
DB_AGENT_FILE="${DGEN_AGENTFILE:-/data/agent_df_base_revised.pkl}" | ||
FORCE_DELETE_DATABASE=${DGEN_FORCE_DELETE_DATABASE:-0} | ||
|
||
# Update the database connection parameters if using a different database name | ||
if [ ! -z "${DATABASE_HOSTNAME}" ]; then | ||
sed -i "s/127.0.0.1/${DATABASE_HOSTNAME}/g" /opt/dgen_os/python/pg_params_connect.json | ||
fi | ||
|
||
# Setup Default Input Scenarios | ||
if [[ ! -f /data/input_sheet_final.xlsm ]]; then | ||
cp /opt/dgen_os/excel/input_sheet_final.xlsm /data/input_sheet_final.xlsm | ||
fi | ||
|
||
# Setup Input Scenarios | ||
rm -f /opt/dgen_os/input_scenarios/* | ||
ln -s /data/input_sheet_final.xlsm /opt/dgen_os/input_scenarios/input_sheet_final.xlsm | ||
|
||
# Setup Input Agent | ||
rm -f /opt/dgen_os/input_agents/* | ||
ln -s ${DB_AGENT_FILE} /opt/dgen_os/input_agents/$(basename "${DB_AGENT_FILE}") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
services: | ||
dgen: | ||
build: | ||
context: ../ | ||
dockerfile: docker/dgen/Dockerfile | ||
stdin_open: true | ||
tty: true | ||
container_name: dgen_1 | ||
volumes: | ||
- ~/dgen_data/:/data | ||
environment: | ||
DATABASE_HOSTNAME: postgis | ||
DGEN_DB_USER: postgres | ||
DGEN_DB_NAME: dgen_db | ||
DGEN_DISABLE_AUTO_START: 0 # Set to 1 to Disable dropping into a dgen shell | ||
postgis: | ||
build: | ||
context: ../ | ||
dockerfile: docker/postgis/Dockerfile | ||
container_name: postgis_1 | ||
ports: | ||
- "127.0.0.1:5432:5432" | ||
volumes: | ||
- ~/dgen_data/:/data | ||
environment: | ||
POSTGRES_USER: postgres | ||
POSTGRES_PASSWORD: postgres | ||
DGEN_DATAFILE_URL: https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/dgen_db.sql | ||
DGEN_AGENTFILE_URL: https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/agent_df_base_res_de_revised.pkl | ||
DGEN_FORCE_DELETE_DATABASE: 0 # Set to 1 to drop the database and clear all the data for a fresh load | ||
restart: unless-stopped |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
FROM postgis/postgis:11-3.3 | ||
|
||
# Setup Data | ||
RUN mkdir -p /data && chmod 755 /data && chown -R postgres: /data/ | ||
|
||
# Install curl for downloading the data file | ||
RUN apt-get update && apt-get install curl -y | ||
|
||
# Setup Init script | ||
COPY docker/postgis/init.sh /docker-entrypoint-initdb.d/init-dgen-pg.sh | ||
RUN chmod +x /docker-entrypoint-initdb.d/init-dgen-pg.sh && chown postgres: /docker-entrypoint-initdb.d/init-dgen-pg.sh | ||
|
||
# Switch to Postgres user | ||
USER postgres | ||
|
||
# Expose PostgreSQL port | ||
EXPOSE 5432 | ||
|
||
# Use the default PostgreSQL entrypoint | ||
CMD ["postgres"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#!/bin/bash | ||
set -e | ||
|
||
DB_USER="${DGEN_DB_USER:-postgres}" | ||
DB_NAME="${DGEN_DB_NAME:-dgen_db}" | ||
DB_SQL_FILE="${DGEN_DATAFILE:-/data/dgen_db.sql}" | ||
DB_SQL_FILE_URL="${DGEN_DATAFILE_URL:-https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/dgen_db.sql}" | ||
DB_AGENT_FILE="${DGEN_AGENTFILE:-/data/agent_df_base_revised.pkl}" | ||
DB_AGENT_FILE_URL="${DGEN_AGENTFILE_URL:-https://oedi-data-lake.s3.amazonaws.com/dgen/de_final_db/agent_df_base_res_de_revised.pkl}" | ||
FORCE_DELETE_DATABASE=${DGEN_FORCE_DELETE_DATABASE:-0} | ||
|
||
# Clear database if FORCE_DELETE_DATABASE is enabled | ||
if [[ ${FORCE_DELETE_DATABASE} -eq 1 ]]; then | ||
echo "DGEN_FORCE_DELETE_DATABASE is set to 1. Dropping database '${DB_NAME}' if it exists..." | ||
psql -U "${DB_USER}" -tc "SELECT 1 FROM pg_database WHERE datname = '${DB_NAME}';" | grep -q 1 && \ | ||
psql -U "${DB_USER}" -c "DROP DATABASE ${DB_NAME};" | ||
echo "Database '${DB_NAME}' dropped." | ||
rm -f ${DB_SQL_FILE} | ||
echo "Datafile '${DB_SQL_FILE}' removed." | ||
rm -f ${DB_AGENT_FILE} | ||
echo "Datafile '${DB_AGENT_FILE}' removed." | ||
fi | ||
|
||
# Check if the data file already exists, download if not | ||
if [[ ! -f ${DB_AGENT_FILE} ]]; then | ||
echo "Downloading data file..." | ||
curl -o ${DB_AGENT_FILE} ${DB_AGENT_FILE_URL} | ||
fi | ||
|
||
# Check if the data file already exists, download if not | ||
if [[ ! -f ${DB_SQL_FILE} ]]; then | ||
echo "Downloading data file..." | ||
curl -o ${DB_SQL_FILE} ${DB_SQL_FILE_URL} | ||
fi | ||
|
||
# Check if the database already exists | ||
if psql -U "${DB_USER}" -tc "SELECT 1 FROM pg_database WHERE datname = '${DB_NAME}';" | grep -q 1; then | ||
echo "Database '${DB_NAME}' already exists, skipping initialization..." | ||
else | ||
# Create the database | ||
echo "Creating database ${DB_NAME}..." | ||
psql -U ${DB_USER} -c "CREATE DATABASE ${DB_NAME};" | ||
|
||
# Load the dataset into the database | ||
echo "Loading data into ${DB_NAME}..." | ||
psql -U ${DB_USER} -d ${DB_NAME} -f ${DB_SQL_FILE} | ||
echo "Database initialization complete!" | ||
fi |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# dGen Packer AMI Usage Guide | ||
|
||
This guide provides instructions on how to use the dGen AWS AMI as well as how to use Packer to build your own AWS AMI. | ||
|
||
## dGen AMI Usage | ||
|
||
#### Getting Started | ||
|
||
Launch an EC2 instance in AWS using the AMI built by Packer. You can then ssh to the instance, by default you will be dropped into a dgen shell. | ||
|
||
```bash | ||
$ ssh -i <your_ssh_key> ubuntu@<your_server_ip> | ||
ubuntu@ip-1-2-3-4:~/dgen/docker$ source ~ubuntu/dgen_start.sh | ||
(dg3n) dgen@0b702cabc2ce:/opt/dgen_os/python$ python dgen_model.py | ||
``` | ||
|
||
The first time running `~ubuntu/dgen_start.sh`, dgen will build the Docker images and download the default dataset. `This may take 10-15 minutes depending on your network connection.` | ||
|
||
#### Using a new dataset | ||
|
||
Edit the docker-compose file `/home/ubuntu/dgen/docker/docker-compose.yml`. See `using a new dataset` in the [dgen Docker Usage Guide](../docker/README.md). | ||
|
||
One challenge you must consider when using an EC2 instance is if the `/data/input_sheet_final.xlsm` needs to be edited, you must copy this file to a system with Excel that can edit the document, then you need to copy it back to the instance. | ||
|
||
#### Warning: This will remove old running containers and data volumes. This may be required if you need space. | ||
|
||
You can completely remove all the data for a fresh start with the below script and commands. `This will result in loss in your dgen data and provide a fresh start` | ||
|
||
```bash | ||
$ ~/dgen_prune_all_data.sh | ||
``` | ||
|
||
## Building an AWS AMI with Packer | ||
|
||
#### Prerequisites | ||
|
||
- [Packer](https://www.packer.io/downloads) installed | ||
- AWS account with appropriate permissions to create AMIs | ||
- AWS credentials configured (e.g., using `aws configure`) | ||
|
||
#### Packer Init | ||
|
||
```bash | ||
$ cd dgen/packer | ||
$ packer init . | ||
``` | ||
|
||
#### Customize variables and build the AWS AMI | ||
|
||
Use Packer to build the AMI. This will create an instance, provision it, and create an AMI from it. | ||
|
||
Override variables in example-vars.pkrvars.hcl that are specific for your environment. | ||
|
||
```bash | ||
$ cp example-vars.pkrvars.hcl /tmp/dgdo-vars.pkrvars.hcl | ||
$ packer validate -var-file=/tmp/dgdo-vars.pkrvars.hcl dgdo-ami.pkr.hcl | ||
$ packer build -var-file=/tmp/dgdo-vars.pkrvars.hcl dgdo-ami.pkr.hcl | ||
``` | ||
|
||
## Troubleshooting | ||
|
||
If you encounter any issues, refer to the [Packer documentation](https://www.packer.io/docs) or check the error messages for guidance. | ||
|
||
## Tests | ||
|
||
You can run automated tests on the Packer config using the below test script. It should be ran from the packer directory. | ||
|
||
```bash | ||
$ cd packer | ||
$ ./tests/test_packer.sh | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling and just consistency for naming/references of the files