rewiringamerica · nweires · Jan 25, 2024 · Jan 18, 2024 · Jan 18, 2024 · Jan 18, 2024
diff --git a/buildstockbatch/gcp/README.md b/buildstockbatch/gcp/README.md
@@ -0,0 +1,27 @@
+# Buildstock Batch on GCP
+
+![Architecture diagram](/buildstockbatch/gcp/arch.svg)
+
+Buildstock Batch runs on GCP in a few phases:
+
+  * Locally
+    - Build a Docker image that includes OpenStudio and BuildStock Batch.
+    - Push the Docker image to GCP Artifact Registry.
+    - Run sampling and split the generated buildings + upgrades into batches.
+    - Collect all the required input files (including downloading weather files)
+      and upload them to a Cloud Storage bucket.
+    - Create and start the Batch and Cloud Run jobs (described below),
+      and wait for them to finish.
+
+  * In GCP Batch
+    - Run a batch job where each task runs a small group of simulations.
+      GCP Batch uses the Docker image to run OpenStudio on Compute Engine VMs.
+    - Raw output files are written to the bucket in Cloud Storage.
+
+  * In Cloud Run
+    - Run a job for post-processing steps. Also uses the Docker image.
+    - Aggregated output files are written to the bucket in Cloud Storage.
+
+
+`gcp.py` also supports validating a project file, cleaning up old projects,
+and viewing the state of existing jobs.
diff --git a/buildstockbatch/gcp/arch.svg b/buildstockbatch/gcp/arch.svg
diff --git a/buildstockbatch/gcp/gcp.py b/buildstockbatch/gcp/gcp.py
@@ -5,16 +5,7 @@
 ~~~~~~~~~~~~~~~
 This class contains the object & methods that allow for usage of the library with GCP Batch.
 
-Architecture overview (these steps are split between GcpBatch and DockerBatchBase):
-    - Build a Docker image that includes OpenStudio and BuildStock Batch.
-    - Push the Docker image to GCP Artifact Registry.
-    - Run sampling, and split the generated buildings into batches.
-    - Collect all the required input files (including downloading weather files)
-      and upload them to Cloud Storage.
-    - Run a job on GCP Batch where each task runs one batch of simulations.
-      Uses the Docker image to run OpenStudio on Compute Engine VMs.
-    - Run a Cloud Run job for post-processing steps. Also uses the Docker image.
-    - Output files are written to a bucket in Cloud Storage.
+See the README for an overview of the architecture.
 
 :author: Robert LaThanh, Natalie Weires
 :copyright: (c) 2023 by The Alliance for Sustainable Energy
@@ -468,7 +459,7 @@ def show_jobs(self):
         """
         # GCP Batch job that runs the simulations
         if job := self.get_existing_batch_job():
-            logger.info("Batch job")
+            logger.info("--------------- Batch job ---------------")
             logger.info(f"  Name: {job.name}")
             logger.info(f"  UID: {job.uid}")
             logger.info(f"  Status: {job.status.state.name}")
@@ -490,7 +481,7 @@ def show_jobs(self):
             status = "Running"
             if last_execution.completion_time:
                 status = "Completed"
-            logger.info("Post-processing Cloud Run job")
+            logger.info("----- Post-processing Cloud Run job -----")
             logger.info(f"  Name: {job.name}")
             logger.info(f"  Status of latest run ({last_execution.name}): {status}")
             logger.debug(f"Full job info:\n{job}")

diff --git a/buildstockbatch/gcp/main.tf b/buildstockbatch/gcp/main.tf
@@ -4,10 +4,10 @@
 #   terraform init
 #
 # To see what changes will be applied:
-#   terraform plan
+#   terraform plan -var="gcp_project=myproject"
 #
 # To apply those changes:
-#   terraform apply
+#   terraform apply -var="gcp_project=myproject"
 #
 # Optionally set variables:
 #   terraform apply -var="gcp_project=myproject" -var="bucket_name=mybucket" -var="region=us-east1-b"

diff --git a/docs/installation.rst b/docs/installation.rst
@@ -246,39 +246,75 @@ Google Cloud Platform
 
 Shared, one-time GCP setup
 ..........................
-One-time GCP setup shared by all users.
+One-time GCP setup that can be shared by multiple users.
 
 1. If needed, create a GCP Project. The following steps will occur in that project.
-2. `Create a repository`_ in Artifact Registry (to store Docker images).
-3. `Create a Google Cloud Storage Bucket`_ (that will store simulation and postprocessing output).
-   Alternatively, each user can create and use their own bucket.
-4. Create a Service Account. Alternatively, each user can create their own service account, or each
-   user can install the `gcloud CLI`_. The following documentation will assume use of a Service
+2. Set up the following resources in your GCP projects. You can either do this manually or
+   using terraform.
+
+    * **Option 1**: Manual setup
+
+      * `Create a Google Cloud Storage Bucket`_ (that will store simulation and postprocessing output).
+        Alternatively, each user can create and use their own bucket.
+      * `Create a repository`_ in Artifact Registry (to store Docker images).
+        This is expected to be in the same region as the storage bucket.
+
+    * **Option 2**: Terraform
+
+      * Install `Terraform`_
+      * From the buildstockbatch/gcp/ directory, run the following with your chosen GCP project and region.
+        You can optionally specify the names of the storage bucket and artifact registery repository. See
+        `main.tf` for more details.
+
+        ::
+
+            terraform init
+            terraform apply -var="gcp_project=PROJECT" -var="region=REGION"
+
+3. Optionally, create a shared Service Account. Alternatively, each user can create their own service account,
+   or each user can install the `gcloud CLI`_. The following documentation will assume use of a Service
    Account.
 
 .. _Create a repository:
    https://cloud.google.com/artifact-registry/docs/repositories/create-repos
 .. _Create a Google Cloud Storage Bucket:
    https://cloud.google.com/storage/docs/creating-buckets
 .. _gcloud CLI: https://cloud.google.com/sdk/docs/install
+.. _Terraform: https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli
+
 
-Per-developer setup
-...................
-One-time setup that each developer needs to do on the workstation from which they'll launch and
+Per-user setup
+..............
+One-time setup that each user needs to do on the workstation from which they'll launch and
 manage BuildStockBatch runs.
 
-1. `Install Docker`_. This is needed by the script to manage Docker images (pull, push, etc).
+1. Install `Docker`_. This is needed by the script to manage Docker images (pull, push, etc).
 2. Get BuildStockBatch and set up a Python environment for it using the :ref:`python` instructions
    above (i.e., create a Python virtual environment, activate the venv, and install buildstockbatch
    to it).
 3. Download/Clone ResStock or ComStock.
-4. Create and download a `Service Account Key`_ for GCP authentication.
+4. Set up GCP authentication
+
+   * **Option 1**: Create and download a `Service Account Key`_.
+
+     * Add the location of the key file as an environment variable; e.g.,
+       ``export GOOGLE_APPLICATION_CREDENTIALS="~/path/to/service-account-key.json"``. This can be
+       done at the command line (in which case it will need to be done for every shell session that
+       will run BuildStockBatch, and it will only be in effect for only that session), or added to a
+       shell startup script (in which case it will be available to all shell sessions).
+
+   * **Option 2**: Install the `Google Cloud CLI`_ and run the following:
+
+      ::
+
+        gcloud config set project PROJECT
+        gcloud auth application-default login
+
+        gcloud auth login
+        gcloud auth configure-docker REGION-docker.pkg.dev
+
 
-    * Add the location of the key file as an environment variable; e.g.,
-      ``export GOOGLE_APPLICATION_CREDENTIALS="~/path/to/service-account-key.json"``. This can be
-      done at the command line (in which case it will need to be done for every shell session that
-      will run BuildStockBatch, and it will only be in effect for only that session), or added to a
-      shell startup script (in which case it will be available to all shell sessions).
 
-.. _Install Docker: https://www.docker.com/get-started/
+.. _Docker: https://www.docker.com/get-started/
 .. _Service Account Key: https://cloud.google.com/iam/docs/keys-create-delete
+.. _Google Cloud CLI: https://cloud.google.com/sdk/docs/install-sdk
diff --git a/docs/project_defn.rst b/docs/project_defn.rst
@@ -270,11 +270,10 @@ using `GCP Batch <https://cloud.google.com/batch>`_ and `Cloud Run <https://clou
     buildstock run locally, on Eagle, or on AWS cannot save to GCP.
 
 *  ``job_identifier``: A unique string that starts with an alphabetical character,
-   is up to 48 characters long, and only has letters, numbers or hyphens.
+   is up to 48 characters long, and only has lowercase letters, numbers and/or hyphens.
    This is used to name the GCP Batch and Cloud Run jobs to be created and
    differentiate them from other jobs.
-*  ``project``: The GCP Project ID in which the batch will be run and of the Artifact Registry
-   (where Docker images are stored).
+*  ``project``: The GCP Project ID in which the job will run.
 *  ``service_account``: Optional. The service account email address to use when running jobs on GCP.
    Default: the Compute Engine default service account of the GCP project.
 *  ``gcs``: Configuration for project data storage on GCP Cloud Storage.
@@ -287,7 +286,10 @@ using `GCP Batch <https://cloud.google.com/batch>`_ and `Cloud Run <https://clou
        may help. Default: 40 MiB
 
 *  ``region``: The GCP region in which the job will be run and the region of the Artifact Registry.
-*  ``batch_array_size``: Number of tasks to divide the simulations into. Max: 10000.
+   (e.g. ``us-central1``)
+*  ``batch_array_size``: Number of tasks to divide the simulations into. Tasks with fewer than 100
+   simulations each are recommended when using spot instances, to minimize lost/repeated work when
+   instances are preempted. Max: 10,000.
 *  ``parallelism``: Optional. Maximum number of tasks that can run in parallel. If not specified,
    uses `GCP's default behavior`_ (the lesser of ``batch_array_size`` and `job limits`_).
    Parallelism is also limited by Compute Engine quotas and limits (including vCPU quota).
@@ -298,29 +300,28 @@ using `GCP Batch <https://cloud.google.com/batch>`_ and `Cloud Run <https://clou
        repository.
 *  ``job_environment``: Optional. Specifies the computing requirements for each simulation.
 
-    *  ``vcpus``: Number of CPUs to allocate for running each simulation. Default: 1.
-    *  ``memory_mib``: Amount of RAM memory needed for each simulation in MiB. Default: 1024.
-       For large multifamily buildings this works better if set to 2048.
+    *  ``vcpus``: Optional. Number of CPUs to allocate for running each simulation. Default: 1.
+    *  ``memory_mib``: Optional. Amount of RAM memory to allocate for each simulation in MiB.
+       Default: 1024
     *  ``boot_disk_mib``: Optional. Extra boot disk size in MiB for each task. This affects how
        large the boot disk will be (see the `Batch OS environment docs`_ for details) of the
        machine(s) running simulations (which is the disk used by simulations). This will likely need
        to be set to at least 2,048 if more than 8 simulations will be run in parallel on the same
        machine (i.e., when vCPUs per machine_type ÷ vCPUs per sim > 8). Default: None (which should
        result in a 30 GB boot disk according to the docs linked above).
-    *  ``machine_type``: GCP Compute Engine machine type to use. If omitted, GCP Batch will
+    *  ``machine_type``: Optional. GCP Compute Engine machine type to use. If omitted, GCP Batch will
        choose a machine type based on the requested vCPUs and memory. If set, the machine type
        should have at least as many resources as requested for each simulation above. If it is
        large enough, multiple simulations will be run in parallel on the same machine. Usually safe
        to leave unset.
-    *  ``use_spot``: true or false. This tells the project whether to use
-       `Spot VMs <https://cloud.google.com/spot-vms>`_ for data simulations, which can reduce
-       costs by up to 91%. Default: false
+    *  ``use_spot``: Optional. Whether to use `Spot VMs <https://cloud.google.com/spot-vms>`_
+       for data simulations, which can reduce costs by up to 91%. Default: false
 *  ``postprocessing_environment``: Optional. Specifies the Cloud Run computing environment for
    postprocessing.
 
-    *  ``cpus``: `Number of CPUs`_ to use. Default: 2.
-    *  ``memory_mib``: `Amount of RAM`_ needed in MiB. 2048 MiB per CPU is recommended. Default:
-       4096.
+    *  ``cpus``: Optional. `Number of CPUs`_ to use. Default: 2.
+    *  ``memory_mib``: Optional. `Amount of RAM`_ needed in MiB. At least 2048 MiB per CPU is recommended.
+       Default: 4096.
 
 .. _GCP's default behavior: https://cloud.google.com/python/docs/reference/batch/latest/google.cloud.batch_v1.types.TaskGroup
 .. _job limits: https://cloud.google.com/batch/quotas

diff --git a/docs/run_sims.rst b/docs/run_sims.rst
@@ -117,31 +117,31 @@ on S3 and queryable in Athena.
 Google Cloud Platform
 ~~~~~~~~~~~~~~~~~~~~~
 
-Running a batch on GCP is done by calling the ``buildstock_gcp`` command line
-tool.
+Run a project on GCP by calling the ``buildstock_gcp`` command line tool.
 
 .. command-output:: buildstock_gcp --help
    :ellipsis: 0,8
 
 The first time you run ``buildstock_gcp`` it may take several minutes,
 especially over a slower internet connection as it is downloading and building a docker image.
 
-GCP Specific Project configuration
+GCP specific project configuration
 ..................................
 
-For the project to run on GCP, you will need to add a ``gcp`` section to your config
+For the project to run on GCP, you will need to add a ``gcp`` section to your project
 file, something like this:
 
 .. code-block:: yaml
 
     gcp:
       job_identifier: national01
+      # The project, Artifact Registry repo, and GCS bucket must already exist.
       project: myorg_project
       region: us-central1
       artifact_registry:
-        repository: buildstockbatch
+        repository: buildstockbatch-docker
       gcs:
-        bucket: mybucket
+        bucket: buildstockbatch
         prefix: national01_run01
       use_spot: true
       batch_array_size: 10000
@@ -154,18 +154,33 @@ You can optionally override the ``job_identifier`` from the command line
 quickly assign a new ID with each run without updating the config file.
 
 
-List existing jobs
+Show existing jobs
 ..................
 
 Run ``buildstock_gcp your_project_file.yml [job_identifier] --show_jobs`` to see the existing
 jobs matching the project specified. This can show you whether a previously-started job
 has completed, is still running, or has already been cleaned up.
 
 
+Post-processing only
+.....................
+
+If ``buildstock_gcp`` is interrupted after the simulations are kicked off (i.e. the Batch job is
+running), the simulations will finish, but post-processing will not be started. You can run only
+the post-processing steps later with the ``--postprocessonly`` flag.
+
+
 Cleaning up after yourself
 ..........................
 
 When the simulations and postprocessing are complete, run ``buildstock_gcp
 your_project_file.yml [job_identifier] --clean``. This will clean up all the GCP resources that
 were created to run the specified project, other than files in Cloud Storage. If the project is
 still running, it will be cancelled. Your output files will still be available in GCS.
+
+You can clean up files in Cloud Storage from the `GCP Console`_.
+
+If you make code changes between runs, you may want to occasionally clean up the docker
+images created for each run with ``docker image prune``.
+
+.. _GCP Console: https://console.cloud.google.com/storage/browser