Add `EndpointAPI.wait_until_connected_to` #86

pipermerriam · 2019-05-24T11:52:57Z

builds on Combine inbound and outbound connection concepts #85

What was wrong?

Needed a non-polling mechanism to wait for a connection.

How was it fixed?

Additive changes to our connections are now guarded with an asyncio.Condition. This allows implementation of wait_until_connected_to in a non-polling manner. After the initial connection check, the API waits for an update to the connection before checking again.

Cute Animal Picture

pipermerriam · 2019-05-24T11:53:24Z

I've got some PR stacking going on here. This contains the diff from 3 PRs, only latest commit is relevant.

lithp · 2019-05-24T20:05:20Z

Don't know whether you intentionally didn't use this, but if you push the first two commits to this repo under a branch you can then set the base branch of this PR so that the diff only shows the last commit. Just have to remember to set the base back to master once the preceding PRs have been merged!

pipermerriam · 2019-05-24T20:55:27Z

@lithp since this stretches across forks, that results in the PR being only open in my fork which lacks visibility. Maybe I should just open these as drafts so that there's less confusion and they can be reviewed sequentially as they get merged and rebased.

lithp · 2019-05-24T21:53:37Z

tests/core/asyncio/test_asyncio_wait_until_connected_to.py

+@pytest.mark.asyncio
+async def test_connect_to_endpoint(ipc_base_path):
+    config = ConnectionConfig.from_name("server", base_path=ipc_base_path)
+    async with AsyncioEndpoint.serve(config) as server:


Love this API 😍

lithp · 2019-05-24T22:12:06Z

lahja/asyncio/endpoint.py

        finally:
-            self._half_connections.remove(remote)
+            if remote in self._half_connections:
+                self._half_connections.remove(remote)


It might be a little cleaner to replace this finally block with another task.add_done_callback(self._half_connections.remote(remote)) inside _accept_conn.

lithp · 2019-05-24T22:18:17Z

lahja/asyncio/endpoint.py

+                try:
+                    await remote.wait_stopped()
+                except asyncio.CancelledError:
+                    return


Why does this suppress CancelledError? I think that in this situation it's okay but letting it propagate up is safer.

From the docs:

cancelled() can be used to check if the Task was cancelled. The method returns True if the wrapped coroutine did not suppress the CancelledError exception and was actually cancelled.

Suppressing the error prevents callers of _handle_client which didn't use ensure_future from knowing they've been cancelled, and it also prevents other callers of _handle_client from determining whether the coro stopped because of cancellation. Neither kind of caller exists right now but this still seems like unexpected behavior.

lithp

Left some comments, esp one about suppressing CancellationError, but no major complaints here (only looking at the last commit). One potential source of errors here is that the code which adds to _half_connections is fairly removed from the code which removes from it (and same with _full_connections). It'd be nice if there was some way of bringing them closer together.

pipermerriam · 2019-05-26T13:25:21Z

@lithp I had to make some structural changes to fix a race condition where subscription updates could end up not transmitting if they occured at a specific timing during the connection. Would like one more pass if you will.

pipermerriam · 2019-05-26T13:29:10Z

lahja/asyncio/endpoint.py

+        return f"RemoteEndpoint[{self.name if self.name is not None else id(self)}]"
+
+    def __repr__(self) -> str:
+        return f"<{self}>"


can/should extract to standalone.

pipermerriam · 2019-05-26T13:30:02Z

lahja/asyncio/endpoint.py

                    self._received_subscription.notify_all()
-                if message.response_expected:
-                    await self.send_message(SubscriptionsAck())
+                    if message.response_expected:


@lithp can you confirm this should be how this is structured or do you think it's ok to send the Ack after we've released the lock...

I think it's fine to send the Ack from outside the lock's block, and maybe even preferable! The Condition is there to coordinate the local process, since the remote process doesn't even know about this Condition I don't think that releasing the lock first could create any race conditions.

pipermerriam · 2019-05-26T13:31:35Z

lahja/asyncio/endpoint.py

-        while event not in self.subscribed_messages:
-            await self.wait_until_subscription_received()
+        async with self._received_subscription:
+            while True:


@lithp can you confirm this is ok according to your understanding of conditions. rather than release and then re-acquire at the entrance to each loop, we eliminate this and only switch on the release/acquire that happens as part of the wait call.

This looks good to me!

pipermerriam · 2019-05-26T13:32:50Z

lahja/asyncio/endpoint.py

+            )
+        )
+
+    async def _connect_receiving_queue(self) -> None:


Didn't intend to relocate this but I think I like it here better. Need to add some comments to deliniate the body of this class more clearly but I'll do that as a cleanup PR since re-organizing makes code hard to audit.

TODO: revert this move.

pipermerriam · 2019-05-26T13:34:14Z

lahja/asyncio/endpoint.py

+    def _notify_subscriptions_changed_nowait(self) -> None:
+        self._subscription_updates_queue.put_nowait(None)
+
+    async def _do_notify_subscriptions_changed(self) -> None:


TODO: remove this method and inline it to _process_subscription_updates since we never want anyone to call this directly.

pipermerriam · 2019-05-26T13:35:43Z

lahja/asyncio/endpoint.py

+
+    async def _process_subscription_updates(self) -> None:
+        self._subscription_updates_running.set()
+        async with self._subscription_updates_condition:


@lithp Needed this to ensure we don't end up with concurrent subscription updates being sent out since it's possible that they arrive out of order in that case and then the other side will have potentially in-accurate subscription records.

TOOD: add a code comment explaining this.

How would they arrive out of order? Domain sockets always deliver messages in order and _notify_lock ensures there's only one update per socket happening at any time.

I started undoing this but then stopped as I'm inclined to leave it this way. Here's my reasoning.

we end up needing to do subscription updates from synchronous methods with the addition of subscribe_nowait which seems to be a necessary API.

I'm much more comfortable with that method doing a Queue.put_nowait than I am adding an additional call to asyncio.ensure_future(self._notify_subscription_updates()).

I think that the asyncio.Condition based approach is more appropriate and under that approach we still need a background process to listen for changes. The various branches that I have in progress have made good use of that Condition API.

I'm fine unrolling this if you're still not convinced at the point where the trio based endpoint is being added to the library.

pipermerriam · 2019-05-26T13:37:05Z

lahja/tools/benchmark/process.py

+        if self._process.pid is not None:
+            os.kill(self._process.pid, signal.SIGINT)
+        else:
+            self._process.terminate()


TODO: extract to standalone.

lithp · 2019-05-28T23:11:46Z

lahja/asyncio/endpoint.py

@@ -216,9 +222,13 @@ def __init__(
        endpoint has acknowledged the new subscription set. If ``block`` is ``False`` then this
        function will return immediately after the send finishes.
        """


The comment just above this line isn't quite right anymore: "Alert the Endpoint which has connected to us that our subscription set has changed." Since RemoteEndpoint now handles both inbound and outbound this should be something like "Alert the remote that this endpoint's subscription set has changed"

lithp · 2019-05-28T23:39:02Z

lahja/asyncio/endpoint.py

        self._receiving_loop_running = asyncio.Event()
        self._receiving_queue = asyncio.Queue()

+        self._subscription_updates_running = asyncio.Event()
+        self._subscription_updates_condition = asyncio.Condition()
+        self._subscription_updates_queue = asyncio.Queue()


These could be initialized in __init__ which would save you from needing to declare them at the beginning of this class and would also make it easier to reason about the potential for None dereferences.

The await *.running() pattern sounds like it could be pulled out into a helper which removes the potentially None instance attribute entirely. Something like await self.start_coro(self._process_subscription_updates) which doesn't return until the coro has started.

I'm going to leave this for now and open an issue to track as I've seen this issue arising as well but I don't want to conflate this PR with an extra fix that I'm not sure I know the right solution for.

lithp · 2019-05-28T23:43:21Z

lahja/asyncio/endpoint.py

+            while True:
+                if event in self.subscribed_messages:
+                    return
+                await self._received_subscription.wait()


 @asynccontextmanager  # type: ignore


As of this change _handle_client and _handle_server are the same method and together they constitute the only callers of run_remote_endpoint. Maybe they could all be merged into one method (untested):

async def run_remote_endpoint(remote: RemoteEndpoint) -> None await remote.start() try: await remote.wait_stopped() finally: await remote.stop()

good catch.

lithp

I can't figure out from these comments what the problem with missed updates was so it's possible that this is the best solution and I'm willing to be talked into approving, but this solution is very heavyweight!

This method with _subscription_updates_condition which uses a single coro to do the notifications has the effect of serializing all updates across all remote endpoints. It's not only a lot of code but it's also much slower than it needs to be! I think that in most cases the endpoint doesn't need to block at all, it's okay if we take a while to inform the remotes of the messages we're expecting, because endpoints already drop all messages which occurred before the endpoint started listening for them.

Blocking before returning from connect_to_endpoint and stream was added to make it easier to write tests. Once those methods return the caller is able to call broadcast() on other endpoints and be sure the message will be sent.

I'm not sure what was causing the dropped updates but I'm hoping there's a different way to fix it, and adding sleep() calls to the tests sounds preferable to the complication of putting this condition everywhere the endpoint might update a subscription, and adding a queue and a coro to allow asynchronously updating subscriptions. I think this change tips this class over into the territory where it's too scary to touch, which will slow down the API improvements I think we plan on adding in the future.

lithp · 2019-05-29T00:49:25Z

lahja/asyncio/endpoint.py

+        # potentially being redundant.
+        async with self._subscription_updates_condition:
+            await remote.notify_subscriptions_updated(self.subscribed_events)
+            await self._add_half_connection(remote)


Ahh, okay, I see the problem now, _add_half_connection blocks when it tries to acquire connections_changed so there's a period in time where the remote isn't receiving subscription updates. Double-sending the same set of subscriptions has a very small cost though! And I think that trying not to send it introduced greater inefficiencies elsewhere, serializing all updates isn't cheap!

(I thought I responded to this somewhere but I don't see it)

I'm inclined to leave it this way because it correctly manages the appropriate locks on the resource, where-as the other mechanism solves the problem through indirection. First, I think the overall cost here is low because our usage patterns for the library don't involve adding new connections at a high frequency. Second, doing it the alternate way of inverting these statements seems like a solution that could easily be undone incidentally in a refactor since it's an implicit solution.

pipermerriam · 2019-05-29T02:30:28Z

I've been futzing with this code for days so it's a bit muddled in my head but one thing that makes me prefer this approach is that it I'd "correct" in the sense that it properly maintains the necessary resource locks. The alternative one of double sending is likely also adequate in fixing the bug but it does so through some insurrection which I think means more mental overhead since it's no longer trivial to do things like grep the code for all of the places the resource is modified by looking for the places it gets locked. Sent from ProtonMail mobile

…

-------- Original Message --------

On May 28, 2019, 6:49 PM, Brian Cloutier wrote: @lithp commented on this pull request. --------------------------------------------------------------- In [lahja/asyncio/endpoint.py](#86 (comment)): > self._server_tasks.add(task) - # the Endpoint on the other end blocks until it receives this message - await remote.notify_subscriptions_updated(self.subscribed_events) + await remote.wait_started() + + # we **must** ensure that the subscription updates are locked between + # the time that we manually update this individual connection and that + # we place it within the set of tracked connections, otherwise, a + # subscription update from elsewhere can occur between the time these + # two statements execute resulting in the remote missing a new + # subscription update. Note that inverting these statements should + # also mitigate this, but it has the downside of the manual update + # potentially being redundant. + async with self._subscription_updates_condition: + await remote.notify_subscriptions_updated(self.subscribed_events) + await self._add_half_connection(remote) Ahh, okay, I see the problem now, _add_half_connection blocks when it tries to acquire connections_changed so there's a period in time where the remote isn't receiving subscription updates. Double-sending the same set of subscriptions has a very small cost though! And I think that trying not to send it introduced greater inefficiencies elsewhere, serializing all updates isn't cheap! — You are receiving this because you authored the thread. Reply to this email directly, [view it on GitHub](#86?email_source=notifications&email_token=AAGJHAWLNRGDNTX7BZ3TAQDPXXHJNA5CNFSM4HPO2UA2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOBZ542WQ#pullrequestreview-242994522), or [mute the thread](https://github.com/notifications/unsubscribe-auth/AAGJHAWPH3UM3ZG5XLWJNC3PXXHJNANCNFSM4HPO2UAQ).

pipermerriam · 2019-05-29T13:49:50Z

@lithp lets do a call later today when we're both online and see if we can find something that works for both of us.

We need something that can work from synchronous methods as both Subscription.remove and EndpointAPI.subscribe_nowait need to be able to trigger subscription updates.
I don't like using asyncio.ensure_future(...) as the mechanism for handling the synchronous context problem.
I am pretty sure that a Condition based approach is justified as I've spent some serious time debugging race conditions over the last week around subscriptions not being properly updated and while I know we can achieve working code without the locks, I'm not excited about such solutions.

pipermerriam · 2019-05-30T14:24:33Z

replaced by #110 and #108

pipermerriam requested a review from lithp May 24, 2019 11:52

pipermerriam mentioned this pull request May 24, 2019

Make AsyncioEndpoint establish reverse connections #88

Merged

pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch 2 times, most recently from 45159e5 to 840ff41 Compare May 24, 2019 21:03

lithp reviewed May 24, 2019

View reviewed changes

lithp approved these changes May 24, 2019

View reviewed changes

pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch 3 times, most recently from 4133f0f to 4469d20 Compare May 26, 2019 13:23

pipermerriam requested a review from lithp May 26, 2019 13:25

pipermerriam commented May 26, 2019

View reviewed changes

pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch from 4469d20 to 3f30e62 Compare May 27, 2019 09:49

lithp reviewed May 28, 2019

View reviewed changes

lithp suggested changes May 29, 2019

View reviewed changes

lithp reviewed May 29, 2019

View reviewed changes

Add EndpointAPI.wait_until_connected_to API

9804743

PR feedback

4a631bc

pipermerriam force-pushed the piper/asyncio-endpoint-establish-reverse-connection branch from 3988c00 to 4a631bc Compare May 29, 2019 14:52

This was referenced May 30, 2019

Generic wait for subscriptions and connection change #108

Merged

Cleanup local api for sending subscription updates to remotes #110

Merged

pipermerriam closed this May 30, 2019

pipermerriam deleted the piper/asyncio-endpoint-establish-reverse-connection branch May 30, 2019 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `EndpointAPI.wait_until_connected_to` #86

Add `EndpointAPI.wait_until_connected_to` #86

pipermerriam commented May 24, 2019

pipermerriam commented May 24, 2019

lithp commented May 24, 2019

pipermerriam commented May 24, 2019

lithp May 24, 2019

lithp May 24, 2019

lithp May 24, 2019

lithp left a comment

pipermerriam commented May 26, 2019

pipermerriam May 26, 2019

pipermerriam May 26, 2019

lithp May 28, 2019

pipermerriam May 26, 2019

lithp May 28, 2019

pipermerriam May 26, 2019

pipermerriam May 26, 2019

pipermerriam May 26, 2019

lithp May 29, 2019

pipermerriam May 29, 2019

pipermerriam May 26, 2019

lithp May 28, 2019

lithp May 28, 2019

pipermerriam May 29, 2019

pipermerriam May 29, 2019

lithp May 28, 2019

pipermerriam May 29, 2019

lithp left a comment

lithp May 29, 2019

pipermerriam May 29, 2019

pipermerriam commented May 29, 2019 via email

pipermerriam commented May 29, 2019

pipermerriam commented May 30, 2019

Add EndpointAPI.wait_until_connected_to #86

Add EndpointAPI.wait_until_connected_to #86

Conversation

pipermerriam commented May 24, 2019

What was wrong?

How was it fixed?

Cute Animal Picture

pipermerriam commented May 24, 2019

lithp commented May 24, 2019

pipermerriam commented May 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp left a comment

Choose a reason for hiding this comment

pipermerriam commented May 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pipermerriam commented May 29, 2019 via email

pipermerriam commented May 29, 2019

pipermerriam commented May 30, 2019

Add `EndpointAPI.wait_until_connected_to` #86

Add `EndpointAPI.wait_until_connected_to` #86