Skip to content

Commit

Permalink
Merge pull request #48 from lsst/tickets/DM-47673-main
Browse files Browse the repository at this point in the history
DM-47673: v28 release notes (cherry-picked)
  • Loading branch information
timj authored Nov 21, 2024
2 parents 9503fe4 + 51609d2 commit a76c719
Show file tree
Hide file tree
Showing 21 changed files with 78 additions and 54 deletions.
8 changes: 7 additions & 1 deletion .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,18 @@ jobs:
- name: Run tests
run: |
pytest -r a -v --cov=python --cov=tests --cov-report=xml --cov-report=term --cov-branch
pytest -r a -v --cov=python --cov=tests --cov-report=xml --cov-report=term --cov-branch \
--junitxml=junit.xml -o junit_family=legacy
- name: Upload coverage to codecov
uses: codecov/codecov-action@v4
with:
files: ./coverage.xml
token: ${{ secrets.CODECOV_TOKEN }}
- name: Upload test results to Codecov
if: ${{ !cancelled() }}
uses: codecov/test-results-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}

pypi:

Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v5.0.0
hooks:
- id: check-yaml
args:
Expand All @@ -9,7 +9,7 @@ repos:
- id: trailing-whitespace
- id: check-toml
- repo: https://github.com/psf/black
rev: 24.4.2
rev: 24.10.0
hooks:
- id: black
# It is recommended to specify the latest version of Python
Expand All @@ -24,10 +24,10 @@ repos:
name: isort (python)
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.4.7
rev: v0.7.4
hooks:
- id: ruff
- repo: https://github.com/numpy/numpydoc
rev: "v1.7.0"
rev: "v1.8.0"
hooks:
- id: numpydoc-validation
1 change: 0 additions & 1 deletion doc/changes/DM-35145.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-38538.doc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-42579.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-43932.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-44107.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-44110.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-44457.misc.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-44668.bugfix.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-44668.feature.rst

This file was deleted.

1 change: 0 additions & 1 deletion doc/changes/DM-45654.misc.rst

This file was deleted.

2 changes: 0 additions & 2 deletions doc/changes/DM-46046.misc.rst

This file was deleted.

30 changes: 30 additions & 0 deletions doc/lsst.ctrl.bps.htcondor/CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,33 @@
lsst-ctrl-bps-htcondor v28.0.0 (2024-11-21)
===========================================

New Features
------------

- Implemented basic ping method for HTCondor plugin that checks Schedd and Collector are running and user can authenticate to them.
It does not check that there are compute resources that can run the user's jobs. (`DM-35145 <https://rubinobs.atlassian.net/browse/DM-35145>`_)
- Added ability for the plugin to call ``allocateNodes.py`` during workflow execution in order to manage required computational resources automatically. (`DM-42579 <https://rubinobs.atlassian.net/browse/DM-42579>`_)
- Updated plugin to use ``retryUnlessExit`` values so WMS won't rerun some failures that will just fail every time. (`DM-44668 <https://rubinobs.atlassian.net/browse/DM-44668>`_)


Bug Fixes
---------

- Fixed status when job held and released. (`DM-44107 <https://rubinobs.atlassian.net/browse/DM-44107>`_)
- Fixed report listing auto-memory retry as failed when actually successful. (`DM-44668 <https://rubinobs.atlassian.net/browse/DM-44668>`_)


Other Changes and Additions
---------------------------

- Reported better error message when failed submission from ``/tmp``. (`DM-43932 <https://rubinobs.atlassian.net/browse/DM-43932>`_)
- Provided a default value for the ``memoryLimit`` parameter so it will be set automatically for the users if this plugin is used. (`DM-44110 <https://rubinobs.atlassian.net/browse/DM-44110>`_)
- Fixed held and deleted ``state_counts`` for reporting. (`DM-44457 <https://rubinobs.atlassian.net/browse/DM-44457>`_)
- Updated plugin to allow spaces in job submit file path. (`DM-45654 <https://rubinobs.atlassian.net/browse/DM-45654>`_)
- Updated ``bps restart`` to work with relative path as id.
Updated ``bps report --id <relpath>`` to display absolute path. (`DM-46046 <https://rubinobs.atlassian.net/browse/DM-46046>`_)
- Added a section describing how to release held jobs to the package documentation. (`DM-38538 <https://rubinobs.atlassian.net/browse/DM-38538>`_)

lsst-ctrl-bps-htcondor v27.0.0 (2024-06-04)
===========================================

Expand Down
2 changes: 1 addition & 1 deletion tests/data/bad_submit.dag.dagman.out
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
07/25/24 20:05:11 Node Name: one
07/25/24 20:05:11 Noop: false
07/25/24 20:05:11 NodeID: 0
07/25/24 20:05:11 Node Status: STATUS_ERROR
07/25/24 20:05:11 Node Status: STATUS_ERROR
07/25/24 20:05:11 Node return val: -1
07/25/24 20:05:11 Error: Job submit failed
07/25/24 20:05:11 Job Submit File: bad_submit2.sub
Expand Down
16 changes: 8 additions & 8 deletions tests/data/test_no_messages.dag.dagman.out
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
07/23/24 17:24:38 Result of reading /etc/issue: \S

07/23/24 17:24:38 Result of reading /etc/redhat-release: AlmaLinux release 9.4 (Seafoam Ocelot)

07/23/24 17:24:38 Using IDs: 20 processors, 10 CPUs, 10 HTs
07/23/24 17:24:38 Enumerating interfaces: lo 127.0.0.1 up
07/23/24 17:24:38 Enumerating interfaces: enp11s0 10.0.0.33 up
Expand All @@ -22,7 +22,7 @@
07/23/24 17:24:38 ** Log last touched time unavailable (No such file or directory)
07/23/24 17:24:38 ******************************************************
07/23/24 17:24:38 Using config source: /work/lsst_stack/w_2024_29/conda/envs/lsst-scipipe-8.0.0/etc/condor/condor_config
07/23/24 17:24:38 Using local config sources:
07/23/24 17:24:38 Using local config sources:
07/23/24 17:24:38 /etc/condor/condor_config
07/23/24 17:24:38 /etc/condor/config.d/00-htcondor-9.0.config
07/23/24 17:24:38 /etc/condor/config.d/00-minicondor
Expand Down Expand Up @@ -77,7 +77,7 @@
07/23/24 17:24:38 DAGMAN_MAX_JOB_HOLDS setting: 100
07/23/24 17:24:38 DAGMAN_HOLD_CLAIM_TIME setting: 20
07/23/24 17:24:38 ALL_DEBUG setting: D_FULLDEBUG
07/23/24 17:24:38 DAGMAN_DEBUG setting:
07/23/24 17:24:38 DAGMAN_DEBUG setting:
07/23/24 17:24:38 DAGMAN_SUPPRESS_JOB_LOGS setting: False
07/23/24 17:24:38 DAGMAN_REMOVE_NODE_JOBS setting: True
07/23/24 17:24:38 DAGMAN will adjust edges after parsing
Expand Down Expand Up @@ -289,7 +289,7 @@
07/23/24 17:25:45 Node Name: pipetaskInit
07/23/24 17:25:45 Noop: false
07/23/24 17:25:45 NodeID: 0
07/23/24 17:25:45 Node Status: STATUS_ERROR
07/23/24 17:25:45 Node Status: STATUS_ERROR
07/23/24 17:25:45 Node return val: -1002
07/23/24 17:25:45 Error: HTCondor reported ULOG_JOB_ABORTED event for job proc (1101.0.0)
07/23/24 17:25:45 Job Submit File: jobs/pipetaskInit/pipetaskInit.sub
Expand Down Expand Up @@ -399,7 +399,7 @@
07/23/24 17:25:58 Node Name: pipetaskInit
07/23/24 17:25:58 Noop: false
07/23/24 17:25:58 NodeID: 0
07/23/24 17:25:58 Node Status: STATUS_ERROR
07/23/24 17:25:58 Node Status: STATUS_ERROR
07/23/24 17:25:58 Node return val: -1002
07/23/24 17:25:58 Error: HTCondor reported ULOG_JOB_ABORTED event for job proc (1101.0.0)
07/23/24 17:25:58 Job Submit File: jobs/pipetaskInit/pipetaskInit.sub
Expand All @@ -409,13 +409,13 @@
07/23/24 17:25:58 Node Name: finalJob
07/23/24 17:25:58 Noop: false
07/23/24 17:25:58 NodeID: 4
07/23/24 17:25:58 Node Status: STATUS_ERROR
07/23/24 17:25:58 Node Status: STATUS_ERROR
07/23/24 17:25:58 Node return val: 2
07/23/24 17:25:58 Error: Job failed due to DAGMAN error 0 and POST Script failed with status 2
07/23/24 17:25:58 Job Submit File: jobs/finalJob/finalJob.sub
07/23/24 17:25:58 POST Script: /work/mgower/gen3work/summary-report-held-44457/ctrl_bps_htcondor/python/lsst/ctrl/bps/htcondor/final_post.sh finalJob $DAG_STATUS $RETURN
07/23/24 17:25:58 HTCondor Job ID: (1102.0.0)
07/23/24 17:25:58 PARENTS: WAITING: 0 CHILDREN:
07/23/24 17:25:58 PARENTS: WAITING: 0 CHILDREN:
07/23/24 17:25:58 --------------------------------------- <END>
07/23/24 17:25:58 Aborting DAG...
07/23/24 17:25:58 Writing Rescue DAG to /work/mgower/gen3work/summary-report-held-44457/submit/u/mgower/pipelines_check/20240723T222426Z/u_mgower_pipelines_check_20240723T222426Z.dag.rescue001...
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
07/26/24 19:35:18 Result of reading /etc/issue: \S

07/26/24 19:35:18 Result of reading /etc/redhat-release: AlmaLinux release 9.4 (Seafoam Ocelot)

07/26/24 19:35:18 Using IDs: 20 processors, 10 CPUs, 10 HTs
07/26/24 19:35:18 Enumerating interfaces: lo 127.0.0.1 up
07/26/24 19:35:18 Enumerating interfaces: enp11s0 10.0.0.33 up
Expand All @@ -22,7 +22,7 @@
07/26/24 19:35:18 ** Log last touched time unavailable (No such file or directory)
07/26/24 19:35:18 ******************************************************
07/26/24 19:35:18 Using config source: /work/lsst_stack/w_2024_30/conda/envs/lsst-scipipe-8.0.0/etc/condor/condor_config
07/26/24 19:35:18 Using local config sources:
07/26/24 19:35:18 Using local config sources:
07/26/24 19:35:18 /etc/condor/condor_config
07/26/24 19:35:18 /etc/condor/config.d/00-htcondor-9.0.config
07/26/24 19:35:18 /etc/condor/config.d/00-minicondor
Expand Down Expand Up @@ -77,7 +77,7 @@
07/26/24 19:35:18 DAGMAN_MAX_JOB_HOLDS setting: 100
07/26/24 19:35:18 DAGMAN_HOLD_CLAIM_TIME setting: 20
07/26/24 19:35:18 ALL_DEBUG setting: D_FULLDEBUG
07/26/24 19:35:18 DAGMAN_DEBUG setting:
07/26/24 19:35:18 DAGMAN_DEBUG setting:
07/26/24 19:35:18 DAGMAN_SUPPRESS_JOB_LOGS setting: False
07/26/24 19:35:18 DAGMAN_REMOVE_NODE_JOBS setting: True
07/26/24 19:35:18 DAGMAN will adjust edges after parsing
Expand Down
40 changes: 20 additions & 20 deletions tests/data/test_pipelines_check_20240727T003507Z.dag.nodes.log
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@
0 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 76 2 471265
Memory (MB) : 3 2048 2048
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 76 2 471265
Memory (MB) : 3 2048 2048

Job terminated of its own accord at 2024-07-27T00:35:37Z with exit-code 0.
...
Expand All @@ -45,10 +45,10 @@
0 - Run Bytes Received By Job
11282 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 88 2 471265
Memory (MB) : 1088 2048 2048
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 88 2 471265
Memory (MB) : 1088 2048 2048

Job terminated of its own accord at 2024-07-27T00:36:09Z with exit-code 0.
...
Expand All @@ -72,10 +72,10 @@
0 - Run Bytes Received By Job
20819 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1.00 1 1
Disk (KB) : 103 2 471265
Memory (MB) : 356 2048 2048
Partitionable Resources : Usage Request Allocated
Cpus : 1.00 1 1
Disk (KB) : 103 2 471265
Memory (MB) : 356 2048 2048

Job terminated of its own accord at 2024-07-27T00:37:51Z with exit-code 0.
...
Expand All @@ -99,10 +99,10 @@
0 - Run Bytes Received By Job
11375 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 88 2 471265
Memory (MB) : 382 2048 2048
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 88 2 471265
Memory (MB) : 382 2048 2048

Job terminated of its own accord at 2024-07-27T00:38:27Z with exit-code 0.
...
Expand All @@ -126,10 +126,10 @@
214 - Run Bytes Received By Job
6733 - Total Bytes Sent By Job
214 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 83 1 471265
Memory (MB) : 0 2048 2048
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 83 1 471265
Memory (MB) : 0 2048 2048

Job terminated of its own accord at 2024-07-27T00:38:36Z with exit-code 0.
...
Expand Down
2 changes: 1 addition & 1 deletion tests/data/test_pipelines_check_20240727T003507Z.info.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"acws02": {"1163.0": {"ClusterId": 1163, "GlobalJobId": "acws02#1163.0#1722040518"}}}
{"acws02": {"1163.0": {"ClusterId": 1163, "GlobalJobId": "acws02#1163.0#1722040518"}}}
4 changes: 2 additions & 2 deletions tests/test_htcondor_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ def testCounts(self):


class GetInfoFromPathTestCase(unittest.TestCase):
"""Test _get_info_from_path function"""
"""Test _get_info_from_path function."""

def test_tmpdir_abort(self):
with temporaryDirectory() as tmp_dir:
Expand Down Expand Up @@ -472,7 +472,7 @@ def test_relative_path(self):


class WmsIdToDirTestCase(unittest.TestCase):
"""Test _wms_id_to_dir function"""
"""Test _wms_id_to_dir function."""

@unittest.mock.patch("lsst.ctrl.bps.htcondor.htcondor_service._wms_id_type")
def testInvalidIdType(self, _wms_id_type_mock):
Expand Down
2 changes: 1 addition & 1 deletion tests/test_lssthtc.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""Unit tests for classes and functions in lssthtc.py"""
"""Unit tests for classes and functions in lssthtc.py."""

import logging
import os
Expand Down

0 comments on commit a76c719

Please sign in to comment.