Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle gateway timeout #80

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

jo-mueller
Copy link
Collaborator

@jo-mueller jo-mueller commented Jan 10, 2025

Fixes #71

This pull request introduces a connection watchdog to the QGateWay class to monitor the connection status and handle disconnections more gracefully. Basically, the socket module is used to check whether a connection to gateway._host and gateway._port can be established. If it can't, the disconnect signal is emitted, triggering the built-in teardown of the treeview and displaying an error message.

Connection monitoring:

  • src/napari_omero/widgets/gateway.py: Introduced a _connection_watchdog function to periodically check the connection status and emit a disconnect signal if the connection times out. Added a _check_connection helper function to attempt socket connections.
  • src/napari_omero/widgets/gateway.py: Added a _connection_watchdog_timout attribute to set the watchdog timeout interval.

Session handling:

Disconnection handling:

Tested

I tested the approach by connecting to the OMERO instance I have access to and then cutting the VPN tunnel. It takes a few seconds for the plugin to fall back to the login view, but it does it :) That behavior can probably be tuned by setting better timeout values.

@jo-mueller jo-mueller added bug Something isn't working enhancement New feature or request labels Jan 10, 2025
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

Attention: Patch coverage is 17.85714% with 23 lines in your changes missing coverage. Please review.

Project coverage is 32.64%. Comparing base (1d53966) to head (1df6613).

Files with missing lines Patch % Lines
src/napari_omero/widgets/gateway.py 16.66% 20 Missing ⚠️
src/napari_omero/widgets/main.py 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #80      +/-   ##
==========================================
- Coverage   33.13%   32.64%   -0.50%     
==========================================
  Files          13       13              
  Lines         845      873      +28     
==========================================
+ Hits          280      285       +5     
- Misses        565      588      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@psobolewskiPhD psobolewskiPhD self-requested a review January 10, 2025 22:31
@psobolewskiPhD
Copy link
Collaborator

I pulled this down to test.
Surprisingly, with--or without--this PR, if I flip my wifi or VPN off, there is a very long blocker until the
ConnectionLostException: Ice.ConnectionLostException:

Copy link
Collaborator

@psobolewskiPhD psobolewskiPhD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked at the code closely: the networking/watchdog stuff is new to me, but the event handling seems fine.
However, what I see when using this PR is that if I connect to OMERO and then close napari, napari closes are normal, but the process stays alive and I need to Control-C (or kill from task manager).
I suspect this is due to the thread worker and the thread/connection not being closed properly.

Edit:
If I disconnect in the plugin, the hung state still occurs after quitting napari.

@psobolewskiPhD
Copy link
Collaborator

@will-moore maybe you have some feedback on the Gateway side of things in this PR?

@jo-mueller
Copy link
Collaborator Author

Thanks for testing! I think it should be relatively easy to kill the thread upon clicking the disconnect button. I would have to look a bit on how to connect the napari-is-destroyed-event something that would terminate the thread.

@tlambert03
Copy link
Owner

You could also try using a daemon thread (set daemon=True when you create it)

@psobolewskiPhD
Copy link
Collaborator

Reading up further, as I'm rather threading naive, it looks like FunctionWorkers don't have a way to stop something until it finishes, and you have a while True, so closing napari wouldn't kill the thread?
See: https://pyapp-kit.github.io/superqt/utilities/threading/

Using a yield in the loop would fix it?
https://napari.org/stable/guides/threading.html#flow-control-and-escape-hatches
and
https://napari.org/stable/guides/threading.html#graceful-exit
But I could be totally off base here.

@tlambert03
Copy link
Owner

FunctionWorkers don't have a way to stop something until it finishes

that's correct. there's no real way in python to "interact" with a running function. so it's best to use a generator or anything that can yield periodically to ask whether it should stop

@jo-mueller
Copy link
Collaborator Author

Ok, I made some changes to address this:

  • kill connection watchdog thread on button disconnect
  • Make sure that only one watchdog thread is ever spawned
  • Use Generator thread to allow graceful exit upon napari destruction. About this, though: I pretty much used the syntax from the napari guide, but it only really works well if napari is started from CLI - if I open it from, say, a Jupyter notebook, closing napari does not kill the thread. I'm not sure whether that's a problem in my code 🤔
  • As for the long wait until the connection is lost: The watchdog is mostly there to catch a timed out connection, which should occur primarily if the plugin is left idle - in that case a few seconds to react shouldn't hurt ^^

@psobolewskiPhD
Copy link
Collaborator

@jo-mueller
I tested this again, the issue when closing napari (launched from CLI) is resolved!
I still observe the >1 min beachball (hung process, viewer totally frozen) when i turn off my WiFi before the disconnect fires -- with or without this PR.

When I launch from jupyter I think I see the extra process and indeed it's not shutdown when napari closes.
I think that it's expect, due to the event loop belonging to jupyter?
https://napari.org/stable/guides/event_loop.html#starting-the-event-loop
If i restart the kernel or close jupyter it does stop though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Connection Lost Exception: no clear way to reconnect
3 participants