Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3 sarra_get_cis_rcm crashes after watch_ice path or post_baseDir change #1216

Open
robjarawan opened this issue Sep 10, 2024 · 4 comments
Open
Labels
bug Something isn't working crasher Crashes entire app. likely-fixed likely fix is in the repository, success not confirmed yet.

Comments

@robjarawan
Copy link

robjarawan commented Sep 10, 2024

I changed the watch config to use the user sarra and point to its homedir in order to test for issues; during that time it seems these whatver i did was causing the sarra to crash - 7+ times, see acdc 22877

watch settings i used ( i think I changed post_baseDir a couple times to see what it would do the messages) but i was playing around with the slashes because i noticed the sarra was showing chdir local/home... without an additional slash in front (relative?) but i did not know it would eventually crash before leaving it until next day and causing some pager ruckus

post_baseUrl sftp://sarra@${HOSTNAME}
post_baseDir /local/home/sarra/ice/
path /local/home/sarra/ice/rcm/

[ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file

Log dump (/local/home/sarra/.cache/sr3/log/sarra_get_cis_rcm_01.log):
raise TimeoutException("signal alarm timed out")
sarracenia.transfer.TimeoutException: signal alarm timed out
2024-09-10 01:55:07,468 [INFO] sarracenia.flow metricsFlowReset looking for old metrics for /local/home/sarra/.cache/sr3/metrics/sarra_get_cis_rcm_01.json
2024-09-10 01:55:07,485 [INFO] sarracenia.moth.amqp putSetup exchange declared: xpublic (as: amqp://feeder@localhost/)
2024-09-10 01:55:07,508 [INFO] sarracenia.moth.amqp _queueDeclare queue declared q_feeder.sarra.get_cis_rcm.ddsr-shared (as: amqp://[email protected]/), (messages waiting: 0)
2024-09-10 01:55:07,508 [INFO] sarracenia.moth.amqp getSetup binding q_feeder.sarra.get_cis_rcm.ddsr-shared with v02.post.# to xs_MSC-ICE (as: amqp://[email protected]/)
2024-09-10 02:00:10,496 [INFO] sarracenia.flow _runHousekeeping on_housekeeping pid: 38385 sarra/get_cis_rcm instance: 1
2024-09-10 02:00:10,496 [INFO] sarracenia.flowcb.gather.message on_housekeeping messages: good: 0 bad: 0 bytes: 0 Bytes average: 0 Bytes
2024-09-10 02:00:10,497 [INFO] sarracenia.diskqueue on_housekeeping work_retry_01 Number of messages in retry list 1
2024-09-10 02:00:10,498 [INFO] sarracenia.flowcb.housekeeping.resources on_housekeeping Current cpu_times: user=0.64 system=0.04
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.housekeeping.resources on_housekeeping Current mem usage: 136.1 MiB, accumulating count (0 or 0/100 so far) before self-setting threshold
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats version: 3.00.54p1, started: 5 minutes ago, last_housekeeping: 303.0 seconds ago 
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats messages received: 0, accepted: 0, rejected: 0 rate accepted: 0.0% or 0.0 m/s
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log stats files transferred: 0 bytes: 0 Bytes rate: 0 Bytes/sec
2024-09-10 02:00:10,499 [INFO] sarracenia.flow metricsFlowReset looking for old metrics for /local/home/sarra/.cache/sr3/metrics/sarra_get_cis_rcm_01.json
2024-09-10 02:00:10,499 [INFO] sarracenia.flowcb.log after_accept accepted: (lag: 2360.08 ) sftp://[email protected] /local/home/sarra/ice/rcm/RCM_test.zip
2024-09-10 02:00:11,951 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,951 [INFO] sarracenia.flow do_download attempt 1 failed to download sftp://[email protected]/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,951 [WARNING] sarracenia.flow do_download downloading again, attempt 2
2024-09-10 02:00:11,952 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,952 [INFO] sarracenia.flow do_download attempt 2 failed to download sftp://[email protected]/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,952 [WARNING] sarracenia.flow do_download downloading again, attempt 3
2024-09-10 02:00:11,954 [ERROR] sarracenia.flow download chdir local/home/sarra/ice/rcm: [Errno 2] No such file
2024-09-10 02:00:11,954 [INFO] sarracenia.flow do_download attempt 3 failed to download sftp://[email protected]/local/home/sarra/ice/rcm/RCM_test.zip to /apps/sarra/public_data/20240910/MSC-ICE/MSC-PRODUCTS/RCM/01/RCM_test.zip
2024-09-10 02:00:11,954 [ERROR] sarracenia.flow do_download gave up downloading for now, appending to retry queue
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/sarracenia/instance.py", line 249, in 
i.start()
File "/usr/lib/python3/dist-packages/sarracenia/instance.py", line 240, in start
self.running_instance.run()
File "/usr/lib/python3/dist-packages/sarracenia/flow/__init__.py", line 672, in run
time.sleep(increment)
File "/usr/lib/python3/dist-packages/sarracenia/transfer/__init__.py", line 62, in alarm_raise
raise TimeoutException("signal alarm timed out")
sarracenia.transfer.TimeoutException: signal alarm timed out
@petersilva
Copy link
Contributor

That's this: #1208
It should be already fixed on dev.

@petersilva
Copy link
Contributor

need a / at the end of post_baseUrl... then the chdir will be /local/home... and should succeed.

@petersilva petersilva added bug Something isn't working crasher Crashes entire app. labels Sep 10, 2024
@petersilva
Copy link
Contributor

fixed release 3.0.55 (and all release candidates)

@petersilva
Copy link
Contributor

petersilva commented Sep 20, 2024

@robjarawan can you try v3.00.55 and see if it fixes it?

@petersilva petersilva added the likely-fixed likely fix is in the repository, success not confirmed yet. label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crasher Crashes entire app. likely-fixed likely fix is in the repository, success not confirmed yet.
Projects
None yet
Development

No branches or pull requests

2 participants