Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates for CWL CUDA, Resources, NetworkAccess requirements #506

Merged
merged 17 commits into from
Dec 5, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,23 @@ Changes

Changes:
--------
- No change.
- Add `Job` log message size checks to better control what gets logged during the `Application Package` execution to
avoid large documents causing problems when attempting save them to storage database.
- Update documentation with examples for ``cwltool:CUDARequirement``, ``ResourceRequirement`` and ``NetworkAccess``.
- Improve schema definition of ``ResourceRequirement``.
- Deprecate ``DockerGpuRequirement``, with attempts to auto-convert it into corresponding ``DockerRequirement``
combined with ``cwltool:CUDARequirement`` definitions. If this conversion does not work transparently for the user,
explicit `CWL` updates with those definitions should be made.
- Ensure that validation check finds exactly one provided `CWL` requirement or hint to represent the application type.
In case of missing requirement, the `Process` deployment will fail with a reported error that contains a documentation
link to guide the user in adjusting its `Application Package` accordingly.

Fixes:
------
- Fix ``distutils.version.LooseVersion`` marked for deprecation for upcoming versions.
Use ``packaging.version.Version`` substitute whenever possible, but preserve backward
compatibility with ``distutils`` in case of older Python not supporting it.
- Fix ``cli._update_files`` so there are no attempts to upload remote references to the `Vault`.
- No change.

.. _changes_4.27.0:

Expand Down
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ Weaver
**Implementations**

* |ogc-proc-long|
* |wps|
* |esgf| processes
* |cwl| for |ogc-apppkg|_
* |ems| for Workflows
* |ades|

Expand Down Expand Up @@ -108,7 +111,7 @@ the application definition provided by |cwl| configuration. It can then directly
a registered process |ogc-apppkg|_ with received inputs from a WPS request to expose output results for a
following `ADES` in a `EMS` workflow execution chain.

`Weaver` **extends** |ogc-proc-api|_ by providing additional functionalities such as more detailed job logs
`Weaver` **extends** |ogc-api-proc|_ by providing additional functionalities such as more detailed job logs
endpoints, adding more process management and search request options than required by the standard, and supporting
*remote providers* registration for dynamic process definitions, to name a few.
Because of this, not all features offered in `Weaver` are guaranteed to be applicable on other similarly
Expand Down Expand Up @@ -283,9 +286,9 @@ It is part of `PAVICS`_ and `Birdhouse`_ ecosystems and is available within the
.. |wps| replace:: `Web Processing Services` (WPS)
.. |ogc| replace:: Open Geospatial Consortium
.. _ogc: https://www.ogc.org/
.. |ogc-proc-api| replace:: `OGC API - Processes`
.. _ogc-proc-api: https://github.com/opengeospatial/ogcapi-processes
.. |ogc-proc-long| replace:: |ogc-proc-api|_ (WPS-REST bindings)
.. |ogc-api-proc| replace:: `OGC API - Processes`
.. _ogc-api-proc: https://github.com/opengeospatial/ogcapi-processes
.. |ogc-proc-long| replace:: |ogc-api-proc|_ (WPS-REST bindings)
.. |ogc-tb14| replace:: OGC Testbed-14
.. _ogc-tb14: https://www.ogc.org/projects/initiatives/testbed14
.. |ogc-tb14-platform-er| replace:: ADES & EMS Results and Best Practices Engineering Report
Expand Down
1 change: 1 addition & 0 deletions docs/examples/docker-python-script-report.cwl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand:
Expand Down
4 changes: 3 additions & 1 deletion docs/examples/docker-shell-script-cat.cwl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: cat
Expand All @@ -13,4 +14,5 @@ outputs:
- id: output
type: File
outputBinding:
glob: stdout.log
glob: output.txt
stdout: output.txt
19 changes: 19 additions & 0 deletions docs/examples/requirement-cuda.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: nvidia-smi
requirements:
cwltool:CUDARequirement:
cudaVersionMin: "11.2"
cudaComputeCapability: "7.5"
cudaDeviceCountMin: 1
cudaDeviceCountMax: 4
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
inputs: {}
outputs:
output:
type: File
outputBinding:
glob: output.txt
stdout: output.txt
16 changes: 16 additions & 0 deletions docs/examples/requirement-network.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
fmigneault marked this conversation as resolved.
Show resolved Hide resolved
class: CommandLineTool
baseCommand: curl
requirements:
NetworkAccess:
networkAccess: true
inputs:
url:
type: string
outputs:
output:
type: File
outputBinding:
glob: "output.txt"
stdout: "output.txt"
21 changes: 21 additions & 0 deletions docs/examples/requirement-resources.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/usr/bin/env cwl-runner
cwlVersion: v1.2
class: CommandLineTool
baseCommand: "<high-compute-algorithm>"
requirements:
ResourceRequirement:
coresMin: 8
coresMax: 16
ramMin: 1024
ramMax: 2048
tmpdirMin: 128
tmpdirMax: 1024
outdirMin: 1024
outdirMax: 2048
inputs: {}
outputs:
output:
type: File
outputBinding:
glob: output.txt
stdout: output.txt
9 changes: 7 additions & 2 deletions docs/source/appendix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,8 @@ Glossary
input/output chaining between operations.

.. seealso::
Refer to :ref:`proc_workflow`, :ref:`proc_workflow_ops` and :ref:`CWL Workflow` sections for more details.
Refer to :ref:`proc_workflow`, :ref:`proc_workflow_ops` and :ref:`app_pkg_workflow`
sections for more details.

WPS
| Web Processing Service.
Expand All @@ -193,6 +194,10 @@ Glossary
WPS-REST
Alias employed to refer to :term:`OGC API - Processes` endpoints for corresponding :term:`WPS` definitions.

WPS-T
Alias employed to refer to older revisions of :term:`OGC API - Processes` standard.
The name referred to :term:`WPS` *Transactional* operations introduced by the RESTful API.

XML
| Extensible Markup Language
| Alternative representation of some data object provided by the application. Requires appropriate ``Accept``
Expand All @@ -218,5 +223,5 @@ Useful Links
- |iana-link|_
- |oas|_
- |ogc|_ (:term:`OGC`)
- |ogc-proc-api|_
- |ogc-api-proc|_
- |weaver-issues|_
2 changes: 1 addition & 1 deletion docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ they are optional and which default value or operation is applied in each situat
| (default: *path* ``/``)
|
| Endpoint that will be employed as prefix to refer to WPS-REST requests
| (including but not limited to |ogc-proc-api|_ schemas).
| (including but not limited to |ogc-api-proc|_ schemas).
|
| It can either be the explicit *full URL* to use or the *path* relative to ``weaver.url``.
| Setting ``weaver.wps_restapi_path`` is ignored if its URL equivalent is defined.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ Problem connecting workflow steps together

.. seealso::

- :ref:`CWL Workflow`
- :ref:`app_pkg_workflow`
- :ref:`Output File Format`


Expand Down
137 changes: 131 additions & 6 deletions docs/source/package.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,15 @@ definition available with |pkg-req|_ request.
.. note::

The package request is a `Weaver`-specific implementation, and therefore, is not necessarily available on other
:term:`ADES`/:term:`EMS` implementation as this feature is not part of |ogc-proc-api|_ specification.
:term:`ADES`/:term:`EMS` implementation as this feature is not part of |ogc-api-proc|_ specification.

.. _app_pkg_types:

Typical CWL Package Definition
===========================================

.. _app_pkg_cmd:

CWL CommandLineTool
------------------------

Expand All @@ -46,7 +49,7 @@ Following :term:`CWL` package definition represents the :py:mod:`weaver.processe
:linenos:

The first main components is the ``class: CommandLineTool`` that tells `Weaver` it will be an *atomic* process
(contrarily to `CWL Workflow`_ presented later).
(contrarily to :ref:`app_pkg_workflow` presented later).

The other important sections are ``inputs`` and ``outputs``. These define which parameters will be expected and
produced by the described application. `Weaver` supports most formats and types as specified by |cwl-spec|_.
Expand Down Expand Up @@ -159,6 +162,128 @@ whenever required for launching new :term:`Job` executions.
.. versionadded:: 4.5
Specification and handling of the ``X-Auth-Docker`` header for providing an authentication token.

.. _app_pkg_resources:

GPU and Resource dependant Applications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When an :term:`Application Package` requires GPU or any other minimal set of hardware capabilities, such as in the
case of machine learning or high-performance computing tasks, the submitted :term:`CWL` must explicitly indicate
those requirements to ensure they can be met for performing its execution. Similarly, an :term:`Application Package`
that must obtain external access to remote contents must not assume that the connection would be available, and
must therefore request network access. Below are examples where such requirements are demonstrated and how to
define them.

.. literalinclude:: ../examples/requirement-cuda.cwl
:language: yaml
:caption: Sample CWL definition with CUDA capabilities
:name: example_app_pkg_cuda

.. literalinclude:: ../examples/requirement-resources.cwl
:language: yaml
:caption: Sample CWL definition with computing resources
:name: example_app_pkg_resources

.. literalinclude:: ../examples/requirement-network.cwl
:language: yaml
:caption: Sample CWL definition with network access
:name: example_app_pkg_network

Above requirements can be combined in any fashion as needed. They can also be combined with any other requirements
employed to define the core components of the application.

Whenever possible, requirements should be provided with values that best match the minimum and maximum amount of
resources that the :term:`Application Package` operation requires. More precisely, over-requesting resources should
be avoided as this could lead to failing :term:`Job` execution if the server or worker node processing it deems it
cannot fulfill the requirements because they are too broad to obtain proper resource assignation, because it has
insufficient computing resources, or simply for rate-limiting/fair-share reasons.

Although definitions such as |cwl-resource-req|_ and |cwl-cuda-req|_ are usually applied for atomic operations,
they can also become relevant in the context of :ref:`app_pkg_workflow` execution. Effectively, providing the
required hardware capabilities for each atomic application can allow the :term:`Workflow` engine to better schedule
:term:`Job` steps. For example, if two computationally heavy steps happened to have no restriction for parallelization
based on the :term:`Workflow` steps definition alone, but that running both of them simultaneously on the same machine
would necessarily end up causing an ``OutOfMemory`` error due to insufficient resources, those requirements could help
preemptively let the engine know to wait until *reserved* resources become available. As a result, execution of the
second task could be delayed until the first task is completed, therefore avoiding the error.

.. versionadded:: 4.17
Support of |cwl-resource-req|_.

.. versionadded:: 4.27
Support of |cwl-network-req|_ and |cwl-cuda-req|_.

.. versionchanged::
Deprecated ``DockerGpuRequirement``.

.. warning::
Any :term:`Application Package` that was making use of ``DockerGpuRequirement`` should be updated to employ
the official |cwl-docker-req|_ in combination with |cwl-cuda-req|_. For backward compatibility, any detected
``DockerGpuRequirement`` definition will be updated automatically with a minimalistic |cwl-cuda-req|_ definition
using a very lax set of CUDA capabilities. It is recommended to provide specific configurations for your needs.

.. _app_pkg_remote:
.. _app_pkg_wps1:
.. _app_pkg_ogc_api:
.. _app_pkg_esgf_cwt:

Remote Applications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To define an application that refers to a :ref:`proc_remote_provider`, an :ref:`proc_wps_12`, an :ref:`proc_ogc_api`
or an :ref:`proc_esgf_cwt` endpoint, the corresponding `Weaver`-specific :term:`CWL`-like requirements must be employed
to indicate the URL where that remote resource is accessible. Once deployed, the contained :term:`CWL`
package and the resulting :term:`Process` will be exposed as a :ref:`proc_ogc_api` resource.

Upon reception of a :ref:`Process Execution <proc_op_execute>` request, `Weaver` will take care of resolving
the indicated process URL from the :term:`CWL` requirement and will dispatch the execution to the resource
after applying any relevant I/O, parameter and Media-Type conversion to align with the target server standard
for submitting the :term:`Job` requests.

Below are examples of the corresponding :term:`CWL` requirements employed for each type of remote application.

.. code-block:: yaml
:caption: WPS-1/2 Package Definition

cwlVersion: "v1.0"
class: CommandLineTool
hints:
WPS1Requirement:
provider: "https://example.com/ows/wps/catalog"
process: "getpoint"

.. code-block:: yaml
:caption: OGC API Package Definition

cwlVersion: "v1.0"
class: CommandLineTool
hints:
OGCAPIRequirement:
process: "https://example.com/ogcapi/processes/getpoint"

.. code-block:: json
:caption: ESGF-CWT Package Definition

{
"cwlVersion": "v1.0",
"class": "CommandLineTool",
"hints": {
"ESGF-CWTRequirement": {
"provider": "https://edas.nccs.nasa.gov/wps/cwt",
"process": "xarray.subset"
}
}
}


.. seealso::
- :ref:`proc_remote_provider`
- :ref:`proc_wps_12`
- :ref:`proc_ogc_api`
- :ref:`proc_esgf_cwt`

.. _app_pkg_workflow:

CWL Workflow
------------------------

Expand Down Expand Up @@ -671,10 +796,10 @@ validate the full process integrity before it can be executed, this means that o
permitted in its context (providing many will raise a validation error when parsing the :term:`CWL` definition).

To ensure compatibility with multiple *supported formats* outputs of :term:`WPS`, any output that has more that one
format will have its ``format`` field dropped in the corresponding :term:`CWL` definition. Without any ``format`` on
the :term:`CWL` side, the validation process will ignore this specification and will effectively accept any type of
file. This will not break any execution operation with :term:`CWL`, but it will remove the additional validation layer
of the format (which especially deteriorates process resolution when chaining processes inside a :ref:`CWL Workflow`).
format will have its ``format`` field dropped in the corresponding :term:`CWL` definition. Without any ``format`` on the
:term:`CWL` side, the validation process will ignore this specification and will effectively accept any type of file.
This will not break any execution operation with :term:`CWL`, but it will remove the additional validation layer of the
format (which especially deteriorates process resolution when chaining processes inside a :ref:`app_pkg_workflow`).

If the :term:`WPS` output only specifies a single MIME-type, then the equivalent format (after being resolved to a valid
ontology) will be preserved on the :term:`CWL` side since the result is ensured to be the unique one provided. For this
Expand Down
Loading