Merge pull request #502 from NREL/gb/revs3

Gb/revs3
NREL · Jan 10, 2025 · 68431d3 · 68431d3
2 parents 7efea50 + f94659c
commit 68431d3
Show file tree

Hide file tree

Showing 11 changed files with 111 additions and 21 deletions.
diff --git a/.github/workflows/pull_request_tests.yml b/.github/workflows/pull_request_tests.yml
@@ -19,12 +19,12 @@ jobs:
             python-version: 3.8
 
     steps:
-    - uses: actions/checkout@v2
+    - uses: actions/checkout@v4
       with:
         ref: ${{ github.event.pull_request.head.ref }}
         fetch-depth: 1
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v2
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ matrix.python-version }}
     - name: Install dependencies

diff --git a/.github/workflows/s3_tests.yml b/.github/workflows/s3_tests.yml
@@ -0,0 +1,34 @@
+name: s3 fsspec tests
+
+on: pull_request
+
+jobs:
+  build:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest, windows-latest]
+        python-version: ["3.10"]
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+          ref: ${{ github.event.pull_request.head.ref }}
+          fetch-depth: 1
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install dependencies
+        shell: bash
+        run: |
+          python -m pip install --upgrade pip
+          python -m pip install pytest
+          python -m pip install .[s3]
+      - name: Run pytest and Generate coverage report
+        shell: bash
+        run: |
+          python -m pytest -v tests/s3_tests.py --disable-warnings
diff --git a/README.rst b/README.rst
@@ -126,8 +126,11 @@ Option 1: Install from PIP (recommended for analysts):
 3. Install reV:
     1) ``pip install NREL-reV`` or
 
+       - NOTE: If you install using conda and want to run from files directly on S3 like in the `running reV locally example <https://nrel.github.io/reV/misc/examples.running_locally.html>`_
+         you will also need to install S3 filesystem dependencies: ``pip install NREL-reV[s3]``
+
        - NOTE: If you install using conda and want to use `HSDS <https://github.com/NREL/hsds-examples>`_
-         you will also need to install h5pyd manually: ``pip install h5pyd``
+         you will also need to install HSDS dependencies: ``pip install NREL-reV[hsds]``
 
 Option 2: Clone repo (recommended for developers)
 -------------------------------------------------

diff --git a/examples/aws_pcluster/README.rst b/examples/aws_pcluster/README.rst
@@ -1,9 +1,9 @@
 Running reV on AWS Parallel Cluster HPC Infrastructure
 ======================================================
 
-reV was originally designed to run on the NREL high performance computer (HPC), but you can now run reV on AWS using the NREL renewable energy resource data (the NSRDB and WTK) that lives on S3. This example will guide you through how to set up reV on an AWS HPC environment with dynamically scaled EC2 compute resources and input resource data sourced from S3 via HSDS.
+reV was originally designed to run on the NREL high performance computer (HPC), but you can now run reV on AWS using the NREL renewable energy resource data (the NSRDB and WTK) that lives on S3. This example will guide you through how to set up reV on an AWS HPC environment with dynamically scaled EC2 compute resources and input resource data sourced from S3 (optionally via HSDS).
 
-If you plan on only running reV for a handful of sites (less than 100), first check out our `running with HSDS example <https://github.com/NREL/reV/tree/main/examples/running_with_hsds>`_, which will be a lot easier to get started with. Larger reV jobs require you stand up your own AWS parallel cluster and HSDS server. Very small jobs can be run locally using the NREL HSDS developer API.
+If you plan on only running reV for a handful of sites (less than 100), first check out our `running reV locally example <https://nrel.github.io/reV/misc/examples.running_locally.html>`_ or `running with HSDS example <https://github.com/NREL/reV/tree/main/examples/running_with_hsds>`_, which will be a lot easier to get started with. Larger reV jobs require you stand up your own AWS parallel cluster and HSDS server. Very small jobs can be run locally using the NREL HSDS developer API.
 
 Note that everything should be done in AWS region us-west-2 (Oregon) since this is where the NSRDB and WTK data live on S3.
 
@@ -29,7 +29,7 @@ Setting up an AWS Parallel Cluster
     #. ``sh Miniconda3-latest-Linux-x86_64.sh``
     #. ``source ~/.bashrc``
 
-#. Set up an HSDS service. At this time, it is recommended that you use HSDS Local Servers on your compute cluster. See `the HSDS instructions below <https://github.com/NREL/reV/tree/main/examples/aws_pcluster#setting-up-hsds-local-servers-on-your-compute-cluster>`_ for details.
+#.  Decide on if you want to access `resource files on S3 <https://github.com/NREL/reV/tree/main/examples/aws_pcluster#using-rev-directly-with-s3-files>`_ for easy setup or `HSDS <https://github.com/NREL/reV/tree/main/examples/aws_pcluster#setting-up-hsds-local-servers-on-your-compute-cluster>`_ for better performance. At this time, it is recommended that you try using S3 files first and then use HSDS Local Servers if you want better performance.
 #. Install reV
 
     #. You need to clone the reV repo to get the ``aws_pcluster`` `example files <https://github.com/NREL/reV/tree/main/examples/aws_pcluster>`_. reV example files do not ship with the pypi package.
@@ -38,7 +38,7 @@ Setting up an AWS Parallel Cluster
     #. ``cd /shared/``
     #. ``git clone [email protected]:NREL/reV.git``
     #. ``cd /shared/reV/``
-    #. ``pip install -e .``
+    #. ``pip install -e .[s3]`` if you're using s3 filepaths or ``pip install -e .[hsds]`` if you're setting up an HSDS local server
 
 #. Try running the reV ``aws_pcluster`` example:
 
@@ -50,19 +50,27 @@ Setting up an AWS Parallel Cluster
 Notes on Running reV in the AWS Parallel Cluster
 ------------------------------------------------
 
-#. If you don't configure a custom HSDS Service you will almost certainly see 503 errors from too many requests being processed. See the instructions below to configure an HSDS Service.
+#. If you use the NREL developer API key for HSDS and don't configure a custom HSDS Service you will almost certainly see 503 errors from too many requests being processed. See the instructions below to configure an HSDS Service.
 #. AWS EC2 instances usually have twice as many vCPUs as physical CPUs due to a default of two threads per physical CPU (at least for the c5 instances) (see ``disable_hyperthreading = false``). The pcluster framework treats each thread as a "node" that can accept one reV job. For this reason, it is recommended that you scale the ``"nodes"`` entry in the reV generation config file but keep ``"max_workers": 1``. For example, if you use two ``c5.2xlarge`` instances in your compute fleet, this is a total of 16 vCPUs, each of which can be thought of as a HPC "node" that can run one process at a time.
 #. If you setup an HSDS local server but the parallel cluster ends up sending too many requests (some nodes but not all will see 503 errors), you can try upping the ``max_task_count`` in the ``~/hsds/admin/config/override.yml`` file.
 #. If your HSDS local server nodes run out of memory (monitor with ``docker stats``), you can try upping the ``dn_ram`` or ``sn_ram`` options in the ``~/hsds/admin/config/override.yml`` file.
 #. The best way to stop your pcluster is using ``pcluster stop pcluster_name`` from the cloud9 IDE (not ssh'd into the pcluster) and then stop the login node in the AWS Console EC2 interface (find the "master" node and stop the instance). This will keep the EBS data intact and not charge you for EC2 costs. When you're done with the pcluster you can call ``pcluster delete pcluster_name`` but this will also delete all of the EBS data.
 
 
+Using reV Directly with S3 Files
+--------------------------------
+
+You can now point reV directly to a list of files on S3. This is recommended before starting with HSDS services because it is much more simple and doesn't require any HSDS setup. This will be slow but a good starting point. See the `running reV locally example <https://nrel.github.io/reV/misc/examples.running_locally.html>`_ for an example of this.
+
+If you want to use S3 files, find the file paths using the AWS CLI or a similar utility, replace the ``resource_file`` entry in ``config_gen.json`` with an appropriate file list (there should be a list of s3 filepaths there by default as example). If you need better performance than the basic S3 file setup, read on below for how to setup an HSDS local server.
+
+
 Setting up HSDS Local Servers on your Compute Cluster
 -----------------------------------------------------
 
 The current recommended approach for setting up an HSDS service for reV is to start local HSDS servers on your AWS parallel cluster compute nodes. These instructions set up a shell script that each reV compute job will run on its respective compute node. The shell script checks that an HSDS local server is running, and will start one if not. These instructions are generally copied from the `HSDS AWS README <https://github.com/HDFGroup/hsds/blob/master/docs/docker_install_aws.md>`_ with a few modifications.
 
-Note that these instructions were originally developed and tested in February 2022 and have not been maintained. The latest instructions for setting up HSDS local servers can be found in the rex docs page: `HSDS local server instructions <https://nrel.github.io/rex/misc/examples.hsds.html#setting-up-a-local-hsds-server>`_. The best way to run reV on an AWS PCluster with HSDS local servers may be a combination of the instructions below and the latest instructions from the rex docs page. 
+Note that these instructions were originally developed and tested in February 2022 and have not been maintained. The latest instructions for setting up HSDS local servers can be found in the rex docs page: `HSDS local server instructions <https://nrel.github.io/rex/misc/examples.hsds.html#setting-up-a-local-hsds-server>`_. The best way to run reV on an AWS PCluster with HSDS local servers may be a combination of the instructions below and the latest instructions from the rex docs page. You may have to modify the ``start_hsds.sh`` script with the latest guidance on running HSDS local servers.
 
 #. Make sure you have installed Miniconda but have not yet installed reV/rex.
 #. Clone the `HSDS Repository <https://github.com/HDFGroup/hsds>`_. into your home directory in the pcluster login node: ``git clone [email protected]:HDFGroup/hsds.git`` (you may have to set up your ssh keys first).
@@ -90,6 +98,8 @@ Note that these instructions were originally developed and tested in February 20
 
 #. Make sure this key-value pair is set in the ``execution_control`` block of the ``config_gen.json`` file: ``"sh_script": "sh ~/start_hsds.sh"``
 #. Optional, copy the config override file: ``cp ~/hsds/admin/config/config.yml ~/hsds/admin/config/override.yml``, update any config lines in the ``override.yml`` file that you wish to change, and remove all other lines (see notes on ``max_task_count`` and ``dn_ram``).
+#. Add the following to ``config_gen.json``: ``config_gen["execution_control"]["sh_script"] = "sh ~/start_hsds.sh"`` this will start the HSDS server on each compute node before running reV.
+#. Set the resource file paths in ``config_gen.json`` to the appropriate file paths on HSDS: ``config_gen["resource_file"] = "/nrel/wtk/conus/wtk_conus_{}.h5"`` (the curly bracket will be filled in automatically by reV). To find the appropriate HSDS filepaths, see the instruction set `here <https://nrel.github.io/rex/misc/examples.nrel_data.html#data-location-external-users>`_.
 #. You should be good to go! The line in the generation config file makes reV run the ``start_hsds.sh`` script before running the reV job. The script will install docker and make sure one HSDS server is running per EC2 instance.
 
 

diff --git a/examples/aws_pcluster/config_gen.json b/examples/aws_pcluster/config_gen.json
@@ -8,8 +8,7 @@
     "nodes": 16,
     "option": "slurm",
     "sites_per_worker": 20,
-    "max_workers": 1,
-    "sh_script": "sh ~/start_hsds.sh"
+    "max_workers": 1
   },
   "log_level": "INFO",
   "output_request": [
@@ -18,7 +17,10 @@
     "lcoe_fcr"
   ],
   "project_points": "./wtk_points_front_range.csv",
-  "resource_file": "/nrel/wtk/conus/wtk_conus_{}.h5",
+  "resource_file": [
+      "s3://nrel-pds-wtk/conus/v1.0.0/wtk_conus_2007.h5",
+      "s3://nrel-pds-wtk/conus/v1.0.0/wtk_conus_2008.h5"
+  ],
   "sam_files": {
     "def": "./windpower.json"
   },

diff --git a/examples/running_locally/README.rst b/examples/running_locally/README.rst
@@ -5,7 +5,7 @@ Run reV locally
 and `reV Econ <https://nrel.github.io/reV/_autosummary/reV.econ.econ.Econ.html#reV.econ.econ.Econ>`_
 can be run locally using resource .h5 files stored locally.
 
-For users outside of NREL: you can now point reV directly to filepaths on S3! This will stream small amounts of data from S3 directly to your computer without having to setup an IO server like HSDS. See the example for reading data directly from S3 `here <https://nrel.github.io/rex/misc/examples.fsspec.html>`_ and try the example below with resource file paths from S3. 
+For users outside of NREL: you can now point reV directly to filepaths on S3! This will stream small amounts of data from S3 directly to your computer without having to setup an IO server like HSDS. See the example for reading data directly from S3 `here <https://nrel.github.io/rex/misc/examples.fsspec.html>`_ and try the example below with resource file paths from S3. You will need to do an extra install ``pip install NREL-reV[s3]``.
 
 reV Gen
 -------
@@ -23,17 +23,19 @@ coordinates:
 .. code-block:: python
 
     import os
+    import numpy as np
     from reV import TESTDATADIR
     from reV.config.project_points import ProjectPoints
     from reV.generation.generation import Gen
 
     lat_lons = np.array([[ 41.25, -71.66],
-                            [ 41.05, -71.74],
-                            [ 41.97, -71.78],
-                            [ 41.65, -71.74],
-                            [ 41.25, -71.7 ],
-                            [ 41.05, -71.78]])
+                         [ 41.05, -71.74],
+                         [ 41.97, -71.78],
+                         [ 41.65, -71.74],
+                         [ 41.25, -71.7 ],
+                         [ 41.05, -71.78]])
 
+    # res_file could also be 's3://nrel-pds-wtk/conus/v1.0.0/wtk_conus_2007.h5'
     res_file = os.path.join(TESTDATADIR, 'wtk/ri_100_wtk_2012.h5')
     sam_file = os.path.join(TESTDATADIR,
                              'SAM/wind_gen_standard_losses_0.json')
@@ -68,6 +70,7 @@ Compute pvcapacity factors for all resource gids in a Rhode Island:
 
     regions = {'Rhode Island': 'state'}
 
+    # res_file could also be 's3://nrel-pds-nsrdb/current/nsrdb_2018.h5'
     res_file = os.path.join(TESTDATADIR, 'nsrdb/', 'ri_100_nsrdb_2012.h5')
     sam_file = os.path.join(TESTDATADIR, 'SAM/naris_pv_1axis_inv13.json')
 

diff --git a/examples/running_with_hsds/README.rst b/examples/running_with_hsds/README.rst
@@ -11,8 +11,12 @@ You can use the NREL developer API as the HSDS endpoint for small workloads
 or stand up your own HSDS local server (instructions further below) for an
 enhanced parallelized data experience.
 
+For general information on where to get started accessing NREL data from outside of NREL, see the `rex docs <https://nrel.github.io/rex/misc/examples.nrel_data.html#data-location-external-users>`_.
+
 You might also be interested in these examples of how to set up your own `local HSDS server <https://nrel.github.io/rex/misc/examples.hsds.html#setting-up-a-local-hsds-server>`_ and how to run reV on an `AWS parallel cluster <https://nrel.github.io/reV/misc/examples.aws_pcluster.html>`_.
 
+Note that running directly from S3 files will be an easier solution although not as performant. For more details on running directly from S3 files see `running reV locally <https://nrel.github.io/reV/misc/examples.running_locally.html>`_ and the `rex s3 example <https://nrel.github.io/rex/misc/examples.fsspec.html>`_
+
 Setting up HSDS
 ---------------
 
@@ -149,4 +153,4 @@ Command Line Interface (CLI)
 
 `reV-gen <https://nrel.github.io/reV/_cli/reV-gen.html#rev-gen>`_
 can also be run from the command line and will output the results to an .h5
-file that can be read with `rex.resource.Resource <https://nrel.github.io/rex/rex/rex.resource.html#rex.resource.Resource>`_.
+file that can be read with `rex.resource.Resource <https://nrel.github.io/rex/rex/rex.resource.html#rex.resource.Resource>`_.
diff --git a/reV/version.py b/reV/version.py
@@ -2,4 +2,4 @@
 reV Version number
 """
 
-__version__ = "0.9.6"
+__version__ = "0.9.7"
diff --git a/requirements.txt b/requirements.txt
@@ -1,7 +1,7 @@
 NREL-gaps>=0.6.11
 NREL-NRWAL>=0.0.7
 NREL-PySAM~=4.1.0
-NREL-rex>=0.2.89
+NREL-rex>=0.2.97
 numpy~=1.24.4
 packaging>=20.3
 plotly>=4.7.1

diff --git a/setup.py b/setup.py
@@ -106,6 +106,8 @@ def run(self):
     extras_require={
         "test": test_requires,
         "dev": test_requires + ["flake8", "pre-commit", "pylint"],
+        "s3": ['fsspec', 's3fs'],
+        "hsds": ["hsds>=0.8.4"],
     },
     cmdclass={"develop": PostDevelopCommand},
 )
diff --git a/tests/s3_tests.py b/tests/s3_tests.py
@@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+"""
+PyTest file wind generation directly from s3 file
+
+Note that this directly tests the example here:
+    https://nrel.github.io/reV/misc/examples.running_locally.html
+
+Note that this file cannot be named "test_*.py" because it is run with a
+separate github action that sets up a local hsds server before running the
+test.
+"""
+
+import os
+import numpy as np
+from reV import TESTDATADIR
+from reV.config.project_points import ProjectPoints
+from reV.generation.generation import Gen
+
+
+def test_windpower_s3():
+    lat_lons = np.array([[41.25, -71.66]])
+
+    res_file = 's3://nrel-pds-wtk/conus/v1.0.0/wtk_conus_2007.h5'
+    sam_file = os.path.join(TESTDATADIR, 'SAM/wind_gen_standard_losses_0.json')
+
+    pp = ProjectPoints.lat_lon_coords(lat_lons, res_file, sam_file)
+    gen = Gen('windpower', pp, sam_file, res_file,
+              output_request=('cf_mean', 'cf_profile'))
+    gen.run(max_workers=1)
+
+    assert isinstance(gen.out['cf_profile'], np.ndarray)
+    assert gen.out['cf_profile'].sum() > 0